|Author's Email Address
||This thesis had been viewed 5359 times. Download 450 times.|
||Computer Science and Engineering|
|Type of Document
||A Study of the Limits of Parallelism Available in SIMD Processors Through Register Packing|
|Date of Defense
SIMD and vector processing
||This thesis designed an instruction-level-parallelism processor for the embedded system with general purpose computations. The hardware of the embedded system is small-scalar then currently popular CPU or GPU. We exploit some techniques to enhance the instruction scheduling time of our SIMD processor. |
By applying branch-and-bound ways to modify algorithm that maintain optimality includes PRSR (pseudo random shift register), memorization, and register grouping. And we also support heuristic ways that is a mental shortcut that allow us to solve exhaustive searching quickly and efficiently such as unrolling optimization, instruction distribution, and sign constraint.
Through register packing and loop unrolling, we applied our SIMD processor on Mibench and have a compatible performance with VLIW processor; moreover, our register packing allows for a vector-wide load from the SRAM. Such a load is a natural fit to a SIMD and achieves significant speedups, when our allocator is used.
||Chungnan Lee - chair|
Chun-Hung Lin - co-chair
Tong-Yu Hsieh - co-chair
Steve W. Haga - advisor
Indicate in-campus at 1 year and off-campus access at 2 year.|
|Date of Submission