MIRA: Micro-Architectural Reliability Analysis for Deep Submicron Technology
The last few decades have witnessed continuous scaling of CMOS technology, guided by Moore's Law, to support devices with higher speed, less area and less power. Though there is varying arguments on how long the scaling can be continued, it is undisputed that there is a reach of classical physics on supporting deterministic circuit behavior, which is limited by the thickness of an atom. The current deep submicron CMOS technology generation is already facing several challenges, resulting in a broad class of problems known as reliability. According to International Technology Roadmap for Semiconductors (ITRS), reliability and resilience across all design layers constitute a long-term grand challenge. The unreliability of a device can be masked by conservative design decisions, which affects the achievable performance. An alternative to such performance degradation is to accept and expose the unreliability to all the layers of computing. For example, an aggressive voltage scaling of the device may lead to higher runtime performance at the cost of timing errors, which can be corrected by circuit or micro-architectural techniques.
A key ingredient of successful exploration of reliability against other performance constraints (e.g. power, temperature, speed) is to accurately model the faults prevalent in deep submicron technologies and develop a smooth tool-flow at high-level design platform to analyze the effect of such faults.A conceptual tool-flow, under development, is shown in the figure.
We are tackling multiple challenges for developing the reliability-estimation and exploration framework. These are identified as following.
- Generic, technology-independent, parameterizable fault library construction: The state-of-the-art fault characterization is not sufficient in view of emerging devices and better understanding of their behavior is required. Furthermore, the architectural reliability estimation needs a corresponding logical representation of the physical defects, which is a challenging problem.
- Fast Reliability Estimation Flow: Reliability estimation at high-level design abstraction is approximate but, fast. Whereas detailed circuit-level simulations are painfully slow with the advantage of accurate estimation. This accuracy-performance trade-off is well-known and acceptable. However, with increasing design complexity, even the fast simulation set up of high-level architecture description takes significantly long time (hours) for providing a reliability estimate. This challenge can be approached by first, analytical modeling of architectural reliability and second, determining representative simulation vectors instead of using full applications.
- High-level Estimation of Physical Parameters: The dependence of several reliability measures (e.g. Mean-Time-To-Failure) with physical system parameters such as die area and temperature are well-known. To have an accurate estimate of the reliability, a prerequisite is to have accurate estimation of such physical parameters.
Zheng Wang, Anupam Chattopadhyay
Wang, Z., Xie, H., Chafekar, S., Rama Usha A, S. and Chattopadhyay, A.: Architectural Error Prediction using Probabilistic Error Masking Matrices, in Asia Symposium on Quality Electronic Design (ASQED) , p.p. 31 - 36 , (Malaysia) 2015, 10.1109/ACQED.2015.7274003 ©2015 IEEE
Cosmin, C.-G., Marcu, M., Wang, Z., Chattopadhyay, A., Amaricai, A., Fedeac, S., Ghenea, M., Weinstock, J. H. and Leupers, R.: Direct FPGA-based Power Profiling for a RISC Processor, in IEEE International Instrumentation and Measurement Technology Conference (I2MTC) , p.p. 1578 - 1583 , (Pisa, Italy) May/2015, 10.1109/I2MTC.2015.7151514 ©2015 IEEE
Song, B., Michihiro, S., Wang, Z., Masayuki, H., Chattopadhyay, A. and Takashi, S.: A Processor-Level NBTI Mitigation Technique of Applying Anti-Aging Gate Control through Instruction Set Architecture, in In IEICE Conference of VLSI Design Technologies (VLD) , p.p. 49-54 , (Okinawa Prefecture, Japan) , IEICE, Mar/2015
Wang, Z., Chafekar, S., Xie, H., Rama Usha A, S. and Chattopadhyay, A.: Fast, Approximate Error Prediction for Unreliable Embedded Processors, in HiPEAC Workshop on Approximate Computing (WAPCO) , p.p. 1-6 , (Amsterdam, The Netherlands) Jan/2015
Wang, Z., Yang, L. and Chattopadhyay, A.: Architectural Reliability Estimation using Design Diversity, in International Symposium on Quality Electronic Design (ISQED) , p.p. 112 - 117 , (San Jose, USA) 2015, ISBN: 978-1-47997-580-8, 10.1109/ISQED.2015.7085409
Wang, Z., Paul, G. and Chattopadhyay, A.: Processor Design with Asymmetric Reliability, in IEEE Computer Society Annual Symposium on VLSI (ISVLSI) , p.p. 565 - 570 , (Tampa, Florida, USA) Jul/2014, 10.1109/ISVLSI.2014.63 ©2014 IEEE
Wang, Z., Chen, C., Sharma, P. and Chattopadhyay, A.: System-level Reliability Exploration Framework for Heterogeneous MPSoC, in ACM Great Lakes Symposium on VLSI (GLSVLSI) , p.p. 9--14 , (Houston, USA) , ACM, May/2014, ISBN: 978-1-45032-816-6, 10.1145/2591513.2591519 ©2014 IEEE
Wang, Z., Wang, L., Xie, H. and Chattopadhyay, A.: Power Modeling and Estimation during ADL-driven Embedded Processor Design, in 4th International Conference on Energy Aware Computing Systems & Applications (ICEAC) , (Istanbul, Turkey) 2013, ISBN: 978-1-47992-543-8
Wang, Z., Singh, K., Chen, C. and Chattopadhyay, A.: Accurate and Efficient Reliability Estimation Techniques during ADL-Driven Embedded Processor Design, in Design Automation and Test in Europe (DATE) , p.p. 547--552 , (Grenoble, France) , EDA Consortium, 2013, ISBN: 978-1-45032-153-2 ©2013 IEEE
Wang, Z., Chen, C. and Chattopadhyay, A.: Fast Reliability Exploration for Embedded Processors via High-level Fault Injection, in International Symposium on Quality Electronic Design (ISQED) , p.p. 265-272 , (San Jose, CA, USA) 2013, ISBN: 978-1-46734-951-2, 10.1109/ISQED.2013.6523621
Kammler, D., Guan, J., Ascheid, G., Leupers, R. and Meyr, H.: A fast and flexible Platform for Fault Injection and Evaluation in Verilog-based Simulations, in Proceedings of the IEEE International Conference on Secure Software Integration and Reliability Improvement (SSIRI) , p.p. 309--314 , (Shanghai, China) Jul/2009