|Author's Email Address
||This thesis had been viewed 5559 times. Download 674 times.|
|Type of Document
||Design of the Optimized Group Management Unit by Detecting Thread Parallelism on the Hyperscalar Architecture|
|Date of Defense
||Current trends in processor design have migrated toward chip multiprocessors (CMPs). CMPs are designed to exploit both instruction-level parallelism (ILP) within processors and thread-level parallelism (TLP) within and across processors. However, the conventional design of current CMPs is forced to make a choice between high single-thread performance and high peak throughput. This inability to adjust to varying levels of ILP and TLP results in processor inefficiency.|
Therefore, this paper is based on the hyperscalar architecture which is a chip multiprocessor. The hyperscalar concept enables the multi-core architectures to dynamically group many scalar in-order cores as a superscalar processor to accelerate a sequential thread. The reconfigure feature of hyperscalar architecture contributes to the high flexibility in adapting different types of applications, providing high single-thread performance when thread level parallelism (TLP) is low and high throughput when TLP is high.
In order to increase the efficient of the processors, the system will dynamically detect the ILP of the thread. And according to the difference of the ILP, it will group or release the processors. Based on the hyperscalar architecture, this thesis adds the mechanism which can detect the ILP of thread. And the two new instructions CRM (Core Register Move) and RelC (Release Core) can release the processors of the group. To ensure the data accuracy within the group after release the core, CRM instruction move the information from the core which is released to the other core in this group; RelC instruction indicates to release the core. When this instruction executes in the WB stage, it will send a release signal to Group-Management-Unit (GMU) to notify the data has been completely transferred and the core is empty. After GMU dispatches these two instructions, the system will release or group the cores according to the ILP. Simulation results show that the proposed architecture can increase the use of the processors and improve the work efficiency.
||Da-Wei Chang - chair|
Tong-Yu Hsieh - co-chair
Shiarn-Rong Kuang - co-chair
Jih-ching Chiu - advisor
indicate access worldwide|
|Date of Submission