# Reconfigurable Duplex System Increasing Fault Tolerance for Circuits Based on FPGAs

Pavel Kubalík, Hana Kubátová Department of Computer Science and Engineering Czech Technical University Karlovo nam. 13, 121 35 Prague 2 e-mail: xkubalik@fel.cvut.cz, kubatova@fel.cvut.cz

# **1. Introduction**

Nowadays when the circuit integration increases, the importance of radiation impact on integrated circuits grows. FPGA circuits are more sensitive to radiation than ASICs. Concurrent error detection (CED) techniques allows faster detection of soft errors (errors which can be corrected by reconfiguration) caused by Single Event Upsets (SEU) [1, 2]. SEUs can also change values in the embedded memory used in the design. These changes are not detectable by off-line tests and some CED techniques have to be used. The probability of a SEU appearing in random access memory (RAM) is described in [3].

Our paper describes a new structure of design for FPGAs which improves reliability parameters and preserves lower area overhead than the classical methods such as duplication or triplication.

Our solution assumes the possibility of dynamic reconfiguration of the faulty part. The most important criterion is the speed of the fault detection and the safety of the whole circuit with respect to the surrounding environment. Our methodology enables cooperation between on-line methods and off-line BIST methods for fault detection and localization.

Our previous research shows the relation between the area overhead and the fault coverage [4]. Due to small area overhead requirements the fault coverage for most circuit is less than 100%. The fault coverage varies typically from 75% to 95%. To ensure 100% of the fault coverage and to increase reliability parameters, additional methods must be used.

There are three basic terms in a field of CED and on-line testing: fault security (FS), self-testing (ST) and totally self-checking (TSC).

Detectable faults have to be assorted to four groups A, B, C, D [5] to determine whether the circuit satisfies TSC properties. E.g., the hidden faults belong to the class A. This fault classification can be used to calculate how much the circuit is FS or ST and than

calculate TSC properties. Typical results of ST and FS properties are shown in table 1.

| Table ' | 1. | Single | even | parity - | PLA |
|---------|----|--------|------|----------|-----|
|---------|----|--------|------|----------|-----|

| Circuit | Parity<br>nets | Original<br>[LUT] |    | Overhead<br>[%] | ST   | FS   |
|---------|----------------|-------------------|----|-----------------|------|------|
| apla    | 1              | 46                | 23 | 50              | 99,5 | 82,6 |
| b11     | 1              | 37                | 3  | 8               | 89,9 | 77,3 |
| br1     | 1              | 54                | 10 | 19              | 86,9 | 62,1 |
| al2     | 1              | 52                | 4  | 8               | 97,3 | 91,7 |
| alu3    | 1              | 26                | 32 | 123             | 100  | 92   |

In our research, parity predictors are used to generate proper output code of the circuits. These techniques ensure small area overhead with higher fault coverage but the fault coverage is not 100% [6, 7, 8].

#### 2. Proposed structure

Due to our previous results showing that it is difficult to fully satisfy TSC properties (100%), we proposed a new structure based on two FPGAs, Fig.1. This structure can increase reliability parameters even though the circuit is not fully TSC. Each FPGA contains a TSC circuit and a comparator. The TSC circuit is composed of small blocks where every block also satisfies the TSC property. The methodology of satisfying the TSC property for the compound design is described in [9].

Every FPGA has one primary input, one primary output and two pairs of checking signals (OK/FAIL). The checking signal generated by the TSC circuit serves as additional information of the proper function.

The probability of information correctness depends on the TSC properties. When the TSC property is satisfied only to 75%, the correctness of checking information is also 75%. It means that signal OK is correct for 75% of occurred errors (the same probabilities hold for both signals OK and FAIL). To increase the reliability parameters, two comparators

must be added, one for every FPGA. A comparator compares outputs from both FPGAs. When the outputs from both FPGAs are different, the fail signal is generated. But this information is not sufficient to mark out, which TSC circuit is wrong. Additional information for selection of the wrong circuit is generated by the original TSC circuit. The probability of the information correctness depends on the TSC properties and in many cases is higher than 75%. In a case when outputs are different and one of the circuits generates a fail signal, the wrong circuit is correctly detected. Correct outputs can be processed by the next circuit. The reconfiguration process is initiated after a fault is detected. The reconfiguration solves two problems: localization and correction of the faulty part. The time needed to localize the faulty part is not negligible and must be included in a calculation of the reliability parameters.



Figure 1. Reconfigurable duplex system

When the outputs are different and both circuits signalize correct function, we must stop the function and fault detection must be processed for both circuits.

### 4. Conclusion and future work

This proposed structure can increase the reliability parameters. Due to using the output comparators, we can use circuits where TSC property is satisfied on less than hundred percent. Our structure will increase the reliability parameters due to duplication and detection of the faulty circuit. By our solution smaller area overhead can be achieved than by the triplex system (TMR), which is obviously applied to improve reliability properties. The reconfiguration process allows a correction of the faulty part and increases the reliability parameters, too. We have implemented the proposed structure in one FPGA. ATMEL FPSLIC was used. Our future work is focused to the physical implementation of our structure in two FPGAs and the calculation of its reliability parameters.

# 5. Acknowledgement

This research has been in part supported by the GA102/03/0672 grant and MSM6840770014 research program.

# 6. References

[1] QuickLogic Corporation.: Single Event Upsets in FPGAs, 2003, <u>www.quicklogic.com</u>

[2] Bellato, M., Bernardi, P., Bortalato, D., Candelaro, A., Ceschia, M., Paccagnella, A., Rebaudego, M., Sonza Reorda, M., Violante, M., Zambolin, P.: Evaluating the effects of SEUs affecting the configuration memory of an SRAM-based FPGA Design Automation Event for Electronic System in Europe 2004, pp. 584-589.

[3] E. Normand, "Single Event Upset at Ground Level," IEEE Transactions on Nuclear Science, vol. 43, pp. 2742-2750, 1996.

[4] Kubalík, P. and H. Kubátová. "Minimization of the Hamming Code Generator in Self Checking Circuits", In Proceedings of the International Workshop on Discrete-Event System Design - DESDes'04. Zielona Gora: University of Zielona Gora, 2004, s. 161-166.

[5] Kafka L., Kubalík P., Kubátová H., Novák O., "Fault Classification for Self-checking Circuits Implemented in FPGA", In Proceedings of IEEE Design and Diagnostics of Electronic Circuits and Systems Workshop. Sopron: University of Western Hungary, 2005, s. 228-231.

[6] Drineas, P., Y. Makris (2003). "Concurrent Fault Detection in Random Combinational Logic, In: Proceedings of the IEEE International." Symposium on Quality Electronic Design (ISQED), pp. 425-430.

[7] Mitra, S., and E. J. McCluskey (2000a). "Which Concurrent Error Detection Scheme To Choose?" Proc. International Test Conf., pp. 985-994.

[8] Mohanram, K., E. S. Sogomonyan, M. Gössel, N. A. Touba (2003). "Synthesis of Low-Cost Parity-Based Partially Self-Cheking Circuits," Proceeding of the 9th IEEE International On-Line Testing Symposium, pp. 35.

[9] Kubalik, P., Kubatova, H.: High Reliable FPGA Based System Design Methodology, In Work in Progress Session of 30th EUROMICRO and DSD 2004. Universitat Linz 2004 pp. 30-31