||As the feature size of transistors becomes smaller, chip is more sensitive to external disturbances such as defects during the manufacturing process and process variation. These disturbances may result in low yields of chips.|
Processor designs are one of the main driving forces for electronic products nowadays. Among the components containing a processor design caches are quite critical for enhancing the computation performance. For cache designs any faults in the data storage cells are quite likely to contaminate the data, which may lead to wrong computation results of processors. How to effectively improve the yield and reliability of cache has been one of important research topics in the academic and industry. In the literature many fault-tolerance methods have been developed to prevent from operation errors of caches. However when the chip yield is low, the effectiveness of these methods may become limited.
In recent years a new notion to improve yield is proposed, which is called performance degradation tolerance. The focus of this notion is on a particular type of faults, called performance degrading faults. This type of faults can only result in some performance degradation without any computation errors. Therefore as long as the defective chips contain only this type of faults and the resulting degraded performance is acceptable, these chips are still marketable for certain lower-end applications. This notion is quite applicable to the components that are dedicated for enhancing the computation performance of processors, such as branch predictors. Our prior research results have shown that all faults in a branch predictor are performance degrading faults, and the induced performance degradation for over 99% of faults is less than 1%.
Under the notion of performance degradation tolerance, the fraction of performance degrading faults and the resulting degree of performance degradation of the target circuit are quite critical issues that are worthy to investigate. It is important to point out that most hardware faults in cache are not performance degrading faults. Although there have been some fault tolerance methods developed in the literature, the number of the resulting performance degrading faults by these methods may be limited. Also these methods may require large hardware overhead and induce significant performance loss.
This thesis focuses on proposing a new cache design that can support performance degradation tolerance. In this design, all faults in data storage cells are performance degrading faults. In order to evaluate the resulting performance degradation of faults, we use the SimpleScalar processor simulation tool to implement the proposed cache design. We then randomly inject multiple faults with various fault densities into the data cells of the cache and employ several CPU2000 benchmark programs to perform a large number of simulations. The experimental results show that when the fault density is less than 1%, the performance degradation is less than 1% as well. The performance degradation is less than 16% when the fault density is 20%.