||In an age of multi-core computing architecture, exploiting ILP of loops can enhance the computing efficiency in the multi-core computing architecture since loop structure is the main construction of the program with high performance computing needs. The characteristics of the loop structure for the program are as follows: (1) Instruction will be fetched from cache and be decoded repeatedly. (2) The limit of instructions issue number of the loop body. (3) There are dependence relations between iterations. These factors result in the poor ILP in the implementation of the loop. In order to develop the maximum benefit of the usage of the cores computing resources, the application of the Hyperscalar architecture should be emphasized.|
Because there is a specific ordering pattern in machine codes which produced by compiling the loop structure, we can formulate the semantic of the loop with the observations of this pattern. In this thesis, we propose an architecture called semantic-based loop unrolling mechanism in the Hyperscalar architecture. This architecture unrolls the loop in the instruction analyzer (IA) by analyzing the information gathered after finding the closed interval of loop body instructions by parsing the semantic of instructions, which is identical to what we formulate.
Proposed architecture consists of three unit: loop detect unit (LDU), loop unrolling unit (LUU), and loop controller. Loop detect unit will find the closed interval of the loop body instructions by parsing the semantic of instructions, which is identical to what we formulate, and collecting the information of this closed interval. Loop unrolling unit will unroll the loop based on the information collected by LDU. The unrolling procedures of LUU are as follows: (1) Decide loop unrolling times by the resources of core numbers, and add the SEQ tag to these instructions. (2) Register renaming and eliminate iteration dependence of the unrolled loop. (3) Generate tag to these instructions and add compensate tag to make sure the accuracy of data. (4) Rearrange the issue order of these instructions to issue the instructions which have been eliminated iteration dependence in advance, and generate instruction tag dispatch table, loop VSRF mapping table, loop memory tag mapping table, and loop specific instruction flush table. Loop controller will depend on the branch instruction with wrong prediction result and the loop which finish the unrolling procedures to decide the dispatch right. If this branch instruction identical to the unrolled loop’s conditional check branch instruction, and then the dispatch right will be handed over to LUU. When the execution of the unrolled loop is finish, loop controller will hand over the dispatch right to IA.
In this paper, the verify ARM instructions is generated by Keil μVision5 compiler. The results show that eliminating iteration dependence can improve ILP by 20% to 100%, and flushing specific instruction can decrease the total execution time of the loop whose loop body contains the internal branch instructions.