||Multi-core systems in single chip exploit ILP (Instruction-Level Parallelism) and TLP (Thread-Level Parallelism) to improve the system performance. Therefore, efficiency of transferring data among cores dominates the multi-core system performance. This work proposed a fair arbitration strategy to improve starvation and hotspot problems for multi-core systems in on-chip networks. On the other hands, to reduce the gap between the traditional memory hierarchy and processors, a novel buffering mechanism is proposed to improve the data fetch for network-on-chip nodes.|
On multi-core systems in on-chip networks, the global fairness, scalability, and simplicity of the strategy used to arbitrate the communications of collisions among cores have substantial effects. An unfair strategy causes starvation and hotspot problems, especially under heavy loads. In addition, the complexity of the hardware of the arbitration strategy that is involved in the on-chip environment must also be considered. To address these issues, this paper presents a simple and fair strategy that involves properly adjusting priorities of nodes. In the initial states of transferring data, each node has unique priorities. When competition among nodes occurs at a particular network, the loser swaps their priority with the priority of the winner. This principle guarantees that the opportunities of winners to decrease for the subsequent connection, whereas the priorities of winners increase. Using simple comparing and exchanging operations, the proposed arbitration strategy is an efficient global fairness strategy. Moreover, considering the speed and clock skew, asynchronous circuits are used for implementations. Simulation results demonstrate that by applying a fair strategy, the proposed scheme alleviates starvation, guarantees deadlock freedom, and improves hotspot problems. In a large system, this approach efficiently provides experience of service.
The traditional memory hierarchy design can smooth the data stream and instruction stream. However, the bandwidth of the instruction stream and data stream are still the main challenge for high-performance microprocessor systems. To improve the data and instruction fetchers, the proposed buffering architecture can exploits both the temporal and spatial localities with a relation-exchanging buffering mechanism. On buffers hit, the instruction or data can be reused. At the same time, the prefetching mechanism will be enabled to prefetch the instruction/data being used in the near future. According to the simulation results, the proposed buffering mechanism with the depth 3 and 64-byte line size, which only needs extra 4% hardware cost, is a cost-effectiveness choice. The hit rate of the proposed buffer mechanism can 22% outperforms that of loop buffer architecture to fetch instruction stream and 7% outperform that of First-In-First-Out (FIFO) strategy to fetch data stream.