# DESIGN AND DEVELOPMENT OF ENHANCED MEMORY RELIABILITY AGAINST MULTIPLE CELL UPSETS USING DMC

S Gurudas Singh<sup>1</sup>,R Arun Kumar<sup>2</sup>

<sup>1,2</sup>Assistant Professor Department Of ECE Sree Chaitanya College of Engineering, Karimnagar

#### **ABSTRACT:**

Transient multiple cell upsets (MCUs) are beginning to seriously affect the reliability of memories when they are exposed to radiation settings. More complex error correction codes (ECCs) are often used to protect memory and prevent data corruption caused by MCUs; nevertheless, their primary drawback is an increase in delay overhead. Recently, matrix codes (MCs) based on Hamming codes have been proposed for memory protection. The two error correction codes and the fact that not all situations lead to an improvement in mistake correction skills are the main issues. This work proposes a revolutionary decimal matrix code (DMC) based on the divide-symbol to provide better memory reliability with minimal delay overhead. The proposed DMC uses the decimal algorithm to provide the maximum amount of error detection capability. Furthermore, to minimize the area overhead of extra circuits without interfering with the encoding and decoding processes, it is advised to implement the encoder-reuse method (ERT)

### INTRODUCTION

### 1.1 Motivation:

The general idea for achieving error detection and correction is to add some redundancy (i.e., some extra data) to a message, which receiver can use to check consistency of the delivered message, and to pick up data determined to be corrupt. Error-detection and correction scheme can be either systematic or non-systematic: In a systematic scheme, the transmitter sends the unique data, and attaches a fixed number of check bits (or parity data), which are derived from the data bits by some deterministic algorithm. If only the error detection is required, a receiver can simple apply the same algorithm to the received data bits and compare its output with the receive check bits; if the values do not match, an error has occurred at some point throughout the transmission. Error-correcting regularly used in lower-layer communication, as well as for reliable storage in media such as CDs, DVDs, hard disks and RAM. In a system to uses a non-systematic code, the unique message is transformed into an encoded message

that has at least as many bits as the unique message. The major challenges posed for future memory design is the problem of soft errors [1]–[4] and high power consumption [5]–[7]. As process technology scales to small nanometers, high-density, low cost, high performance integrated circuits, characterized by high operating frequencies, low voltage levels and small noise margins will be increasingly susceptible to temporary faults [8]. In very deep sub-micron technologies single-event upsets like atmospheric neutrons and alpha particles severely impact field-level product reliability, not only for memory, but for logic also. When these particles hit the silicon bulk, they create minority carriers which if collected by the source/drain diffusions, could change the voltage level of the node. Transient faults are also a major concern in space applications, with potentially serious consequences for the spacecraft, including loss of information, functional failure or loss of control [9]. Although SEU is the major concern in space and terrestrial applications, multiple bit upsets (MBU) have also

became important problems in designing memories because of the following: 1) The error rate of memories increased due to the continuing technology shrinkage [10,11]. Therefore the probability of having multiple errors increases. 2) MBUs can be induced by direct ionization or nuclear recoil after passing a high-energy ion [12]. 3) The experiments in memories under proton and heavy ions fluxes in [13,14] show that the probability of having multiple errors is increased when the size of memory is increased. Unfortunately, packaging and shielding cannot effectively be used to shield against SEUs and MBUs since they may be caused by neutrons which can be easily penetrate through packages [10, 15]. In order to maintain a good level of reliability, it is necessary to protect memory cells with protection codes. Hamming code and Odd Weight code are largely used to protect memories against SEU because of their efficient ability to correct single upsets with a reduced area and performance overhead [16]. However, multiple upsets cause by a single charged particle can provoke errors in the system protected by these single-error correcting codes. In the other hand, Reed-Muller is another error correcting code able to cope with multiple upsets. It has a wide range of digital applications including: storage systems, wireless or mobile communications and highspeed modems.

### 1.2 Error Detection and Existing Codes:

The aim of error detection and correction code is to provide against soft errors that manifest themselves as bit-flips in memory. Several techniques are used present to midi gate upsets in memories. For example, the Bose–Chaudhuri–Hocquenghem codes , Reed–Solomon codes , punctured difference set (PDS) codes , and matrix codes has been used to contact with MCUs in memories. But the codes require more area, power, and delay overheads since the encoding and decoding circuits are more complex in these complicated codes.

Reed-Muller code is another protection code that is able to detect and correct additional error

than a Hamming code. The main drawback of this protection code is its more area and power penalties.

Hamming Codes are more used to correct Single Error Upsets (SEU's) in memory due to their ability to correct single errors through reduced area and performance overhead. Though brilliant for correction of single errors in a data word, they cannot correct double bit errors caused by single event upset. An extension of the basic SEC-DED Hamming Code has been proposed to form a special class of codes known as Hsiao Codes to increase the speed, cost and reliability of the decoding logic .

One more class of SEC-DED codes known as Single-error-correcting, Double-error-detecting Single-byte-error-detecting SEC-DED-SBD codes be proposed to detect any number of errors disturbing a single byte. These codes are additional suitable than the conventional SEC-DED codes for protecting the byte-organized memories. Though they operate through lesser overhead and are good for multiple error detection, they cannot correct multiple errors. There are additional codes such as the single-byte-error-correcting, double-byte-errordetecting (SBC-DBD) codes. double-errorcorrecting, triple error-detecting (DEC-TED) codes that can correct multiple errors as discussed.

The Single-error-correcting, Double-error-detecting and Double-adjacent-error-correcting (SEC-DED-DAEC) code provides a low cost ECC methodology to correct adjacent errors as proposed in [12]. The only drawback through this code is the possibility of miss-correction for a small subset of many errors.

### 1.3 Challenges Of Memory Cells:

AS CMOS technology scales down to nanoscale and memories are combined with an increasing number of electronic systems, the soft error rate in memory cells is rapidly increasing, especially when memories operate in space environments due to ionizing effects of atmospheric neutron, alpha-particle, and cosmic rays.

Although single bit upset is a major concern about memory reliability, multiple cell upsets

Research Article

(MCUs) have become a serious reliability concern in some memory applications. In order to make memory cells as fault-tolerant as possible, some error correction codes (ECCs) have been widely used to protect memories against soft errors for years. For example, the Bose-Chaudhuri-Hocquenghem codes, Reed-Solomon codes, and punctured difference set (PDS) codes have been used to deal with MCUs in memories. But these codes require more area, power, and delay overheads since the encoding and decoding circuits are more complex in these complicated codes.

Interleaving technique has been used to restrain MCUs, which rearrange cells in the physical arrangement to separate the bits in the same logical word into different physical words. However, interleaving technique may not be practically used in content-addressable memory (CAM), because of the tight coupling of hardware structures from both cells and comparison circuit structures.

Built-in current sensors (BICS) are proposed to assist with single-error correction and double-error detection codes to provide protection against MCUs. However, this technique can only correct two errors in a word.

More recently, in 2-D matrix codes (MCs) are proposed to efficiently correct MCUs per word with a low decoding delay, in which one word is divided into multiple rows and multiple columns in logical. The bits per row are protected by Hamming code, while parity code is added in each column. For the MC based on Hamming, when two errors are detected by Hamming, the vertical syndrome bits are activated so that these two errors can be corrected. As a result, MC is capable of correcting only two errors in all cases. In an approach that combines decimal algorithm with Hamming code has been conceived to be applied at software level. It uses addition of integer values to detect and correct soft errors. The results obtained have shown that this approach have a lower delay overhead over other codes.

### 1.4 Increment in Reliability Using DMC

In this project, novel decimal matrix code (DMC) based on divide-symbol is proposed to provide enhanced memory reliability. The proposed DMC utilizes decimal algorithm (decimal integer addition and decimal integer subtraction) to detect errors. The advantage of using decimal algorithm is that the error detection capability is maximized so that the reliability of memory is enhanced. Besides, the encoder-reuse technique (ERT) is proposed to minimize the area overhead of extra circuits (encoder and decoder) without disturbing the whole encoding and decoding processes, because ERT uses DMC encoder itself to be part of the decoder.



Fig: 1.1 proposed schematic of fault-tolerant memory protected with DMC.

This Project is divided into the following sections. The proposed DMC is introduced and its encoder and decoder circuits are present. This section also illustrates the limits of simple binary error detection and the advantage of decimal error detection with some examples. The reliability and overheads analysis of the proposed code are analyzed. In the implementation of decimal error detection together with BICS for error correction in CAM is provided. Finally, some conclusions of this paper are discussed and shared.

### 1.5 Language and Tools Used:

Here the total project work is done by using ISE Simulator and XST Synthesis tool. The Xilinx Software is a logic editor, synthesizer and simulator. Xilinx is used to validate the architecture of the logic circuit before the microelectronics design is started. Xilinx provides a user friendly nature type environment for hierarchical logic design, and simulation with delay analysis, which

allows the design and validation of complex logic structures. A key innovative feature is the possibility to estimate the power consumption of the circuit. Some techniques for low power design are described in the manual.

The ISE Simulator tool allows the student to design and simulate an integrated circuit at physical description level. The package contains a library of test bench requirements to view and simulate. You can gain access to Circuit Simulation by pressing one single key. The electric extraction of your circuit is automatically performed and the digital simulator produces logic level curves immediately.

#### 2.LITERATURE REVIEW

## Improved decoding algorithm for high reliable reed muller coding

### C. Argyrides &D. K. Pradhan Sep 2007

The CMOS technology scaling to nm, low cost, high density, high speed integrated circuits with low supply voltage has increased the probability of fault occurrence in the memories. This lead to the major reliability concern especially increases SRAM memory failure rate. Some commonly used mitigation techniques are triple modular redundancy, and error correction codes (ECCs). Soft errors are the major issue in the reliability of memories. Soft error will not damage the hardware, they only damage the data that is being processed. If detected, soft errors are corrected by rewriting corrected data in the place of erroneous data. Highly reliable system uses error correction approach, however in many systems it is difficult to correct data, or even impossible to detect error. To prevent soft errors from causing corruption in the data stored error correction codes are used such as matrix code, hamming etc. when ECC is used, data are encoded when written in the memory and data are decoded when read from the memory. Thus the encoding and decoding process possess a vital impact on the memory access time and complexity. Multiple cell upsets have become the reliability concern in some application apart from single cell upset. The BCH code, reed Solomon code etc are used to deal with MCUs,but the area,power and delay overhead of these codes are high due to the complex encoding and decoding architecture. The decimal matrix code uses encoder reuse technique which uses encoder as apart of the decoder and thus reduces the area overhead and complexity.DMC enhances the reliability of the memory by improving the error correction capability.

### Parallel double error correcting code design to mitigate multi-bit upsets in srams

### R.Naseer and J. Draper, Sep 2008

During transmission of information via communication networks data may get corrupted due to physical/logical faults which would bring the whole system down to destructive failures. So, every communication system has to be facilitated with testing and fault tolerance equipments, to provide safe and sound communication streamlines. So far, many error detection and error correction codes, for different purposes, have been developed. To name some, Parity codes, Burger codes and Checksums for error detection, Cyclic Redundancy codes, Hamming codes, Residue codes, Nordstrom-Robinson codes and Turbo codes for error correction, and BCH codes and modified Residue codes for multi error correction were developed. These codes may perform well in some cases, but not in all conditions and environments. However, due to steady increase of size, speed and complexity of data transmission the total efficiency has been reduced. Therefore, vital need for creation of new methods and revising the old techniques is commonly sensed.

Generally, some of the faults are due to magnetic fields, electrical influences and climate impacts, such as thunders, hurricane, solar rays and etc. They can appear in both internal (e.g., inter node computer communications) and external communications (e.g., satellite communications, digital telecommunications or wireless networks). Traditionally memories employ Single-Error Correcting and Double-Error Detecting (SEC-DED) methods. But in telecommunications with large

data-packets, systems need error correcting methods along with multi-error detecting techniques. Hence the task of every receiver system is to check errors and then fixing the problem by requesting for re transmission, correction or using other means.

In this regard, many methods have been developed by different designers, which were good only in specific conditions and environments.

### Content addressable memory (cam) circuits and architectures:

### K. Pagiamtzis and A. Sheikholeslami, Mar 2003

This paper survey recent developments in the design of large-capacity content-addressable memory (CAM). A CAM is a memory that implements the lookup-table function in a single

### 3. PROPOSED METHOD

This chapter explains the way of Generating Decimal matrix code and the overall idea behind this project with the ERT for 32-bit word. ERT uses DMC encoder itself to be part of the decoder which reduces the area of the circuit.

### 3.1 Introduction:

Transient multiple cell upsets (MCUs) are becoming major issues in the reliability of memories exposed to radiation environment. To prevent MCUs from causing data corruption, more complex error correction codes (ECCs) are widely used to protect memory, but the main problem is that they would require higher delay overhead. Recently, matrix codes (MCs) based on Hamming codes have been proposed for memory protection. The main issue is that they are double error correction codes and the error correction capabilities are not improved in all cases. In this project, novel decimal matrix code (DMC) based on divide-symbol is proposed to enhance memory reliability with lower delay overhead. The proposed DMC utilizes decimal algorithm to obtain the maximum error detection capability. Moreover, the encoder-reuse technique (ERT) is proposed to minimize the area overhead of extra circuits without disturbing the whole encoding and decoding processes. ERT uses DMC encoder itself to be part of the decoder.

### 3.2 Schematic of Fault-Tolerant Memory

clock cycle using dedicated comparison circuitry. CAMs are especially popular in network routers for packet forwarding and packet classification, but they are also beneficial in a variety of other applications that require high-speed table lookup. The main CAM-design challenge is to reduce power consumption associated with the large amount of parallel active circuitry, without sacrificing speed or memory density. In this paper, we review CAM-design techniques at the circuit level and at the architectural level. At the circuit level, we review low-power match line sensing techniques and search line driving approaches. At the architectural level we review three methods for reducing power consumption

The proposed schematic of fault-tolerant memory is depicted in Fig. First, during the encoding (write) process, information bits D are fed to the DMC encoder, and then the horizontal redundant bits H and vertical redundant bits V are obtained from the DMC encoder. When the encoding process is completed, the obtained DMC codeword is stored in the memory. If MCUs occur in the memory, these errors can be corrected in the decoding (read) process. Due to the advantage of decimal algorithm, the proposed DMC has higher fault-tolerant capability with lower performance overheads. In the fault-tolerant memory, the ERT technique is proposed to reduce the area overhead of extra circuits and will be introduced in the following sections.



Fig: 3.1 schematic of 32 bit with DMC.
32- Rit DMC Encoder:

### 3.3 32- Bit DMC Encoder:

In the proposed DMC, first, the divide-symbol and arrange-matrix ideas are performed, i.e., the N-bit word is divided into k symbols of m bits (N = k  $\times$  m), and these symbols are arranged in a  $k_1 \times k_2$  2-

D matrix ( $k = k_1 \times k_2$ , where the values of  $k_1$  and  $k_2$  represent the numbers of rows and columns in the logical matrix respectively). Second, the horizontal redundant bits H are produced by performing decimal integer addition of selected symbols per row. Here, each symbol is regarded as a decimal integer. Third, the vertical redundant bits V are obtained by binary operation among the bits per column. It should be noted that both divide-symbol and arrange-matrix are implemented in logical instead of in physical. Therefore, the proposed DMC does not require changing the physical structure of the memory.

To explain the proposed DMC scheme, we take a 32-bit word as an example, as shown in Fig.4.2. The cells from  $D_0$  to  $D_{31}$  are information bits. This 32-bit word has been divided into eight symbols of 4-bit.  $k_1 = 2$  and  $k_2 = 4$  have been chosen simultaneously.  $H_0$ – $H_{19}$  are horizontal check bits;  $V_0$  through  $V_{15}$  are vertical check bits. However, it should be mentioned that the maximum correction capability (i.e., the maximum size of MCUs can be corrected) and the number of redundant bits are different when the different values for k and m are chosen. Therefore, k and m should be carefully adjusted to maximize the correction capability and minimize the number of redundant bits.

For example, in this case, when  $k = 2 \times 2$  and m = 8, only 1-bit error can be corrected and the number of redundant bits is 80. When  $k = 4 \times 4$  and m = 2, 3-bit errors can be corrected and the number of redundant bits is reduced to 32. However, when  $k = 2 \times 4$  and m = 4, the maximum correction capability is up to 5 bits and the number of redundant bits is 72. In this paper, in order to enhance the reliability of memory, the error correction capability is first considered, so  $k = 2 \times 8$  and m = 4 are utilized to construct DMC.



Fig: 3.2 32 bit word can be divided as 8 symbols with k=2x4 and m=4.

Horizontal redundant bits are calculated as follows

$$H_9 H_8 H_7 H_6 H_5 = D_7 D_6 D_5 D_4 + D_{15} D_{14} D_{13} D_{12}$$

 $(1) \\ H_4 \ H_3 \ H_2 \ H_1 \ H_0 = \ D_3 \ D_2 \ D_1 \ D_0 \ + \ D_{11} \ D_{10} \ D_9 \ D_8$ 

(2)

Similarly for the horizontal redundant bits  $H_{14}$   $H_{13}H_{12}$   $H_{11}$   $H_{10}$  and  $H_{19}$   $H_{18}H_{17}H_{16}$   $H_{15}$ , where "+" represents decimal integer addition.

Vertical redundant bits are calculated as follows

(4)

And similarly for the rest vertical redundant bits. The encoding can be performed by decimal and binary addition operations from (1) to (4). The encoder that computes the redundant bits using multi bit adders and XOR gates is shown in Fig. In this figure,  $H_{19}-H_0$  are horizontal redundant bits,  $V_{15}-V_0$  are vertical redundant bits, and the remaining bits  $U_{31}-U_0$  are the information bits which are directly copied from  $D_{31}$  to  $D_0$ . The enable signal En will be explained in the next section.

The following fig shows the 32 Bit DMC Encoder which consists of adder circuits and EX-OR gates to calculate horizontal and vertical redundant bits.



Fig: 3.3 32 -bit DMC encoder structure using multi bit adders and XOR gates

Research Article

#### 3.4 32 Bit DMC decoder:

To obtain a word being corrected, the decoding process is required. For example, first, the received redundant bits  $H_4H_3H_2H_1H_0{}^{\prime}$  and  $V_0{}^{\prime}\text{-}V_3{}^{\prime}$  are generated by the received information bits D'. Second, the horizontal syndrome bits  $\Delta H_4H_3H_2H_1H_0$  and the vertical syndrome bits  $S_3-S_0$  can be calculated as follows:

$$\Delta H_4 H_3 H_2 H_1 H_0 = H_4 H_3 H_2 H_1 H_0' - H_4 H_3 H_2 H_1 H_0$$
(5)
$$S_0 = V_0' \wedge V_0$$
(6)

and similarly for the rest vertical syndrome bits, where "–" represents decimal integer subtraction. When  $\Delta H_4 H_3 H_2 H_1 H_0$  and  $S_3 - S_0$  are equal to zero, the stored codeword has original information bits in symbol 0 where no errors occur. When  $\Delta H_4 H_3 H_2 H_1 H_0$  and  $S_3 - S_0$  are nonzero, the induced errors (the number of errors is 4 in this case) are detected and located in symbol 0, and then these errors can be corrected by

$$D_{0correct} = D_0 \land S_0$$
 (7)

The following figure shows the 32-bit DMC decoder. The DMC decoder is depicted in Fig,4.4 which is made up of the following sub modules, and each executes a specific task in the decoding process: syndrome calculator, error locator, and error corrector. It can be observed from this figure that the redundant bits must be recomputed from the received information bits D' and compared to the original set of redundant bits in order to obtain the syndrome bits  $\Delta H$  and S. Then error locator uses  $\Delta H$  and S to detect and locate which bits some errors occur in. Finally, in the error corrector, these errors can be corrected by inverting the values of error bits.



Fig: 3.4 32 Bit DMC decoder

| Extra circuit | En signal   |              | Function              |
|---------------|-------------|--------------|-----------------------|
|               | Read signal | Write signal | runcuon               |
| Encoder       | 0           | 1            | Encoding              |
|               | 1           | 0            | Compute syndrome bits |

Fig: 3.5 Encoder Reuse Technique RESULTS

### 5.1 RTL Schematic diagram



Fig: 5.1 RTL Schematic diagram of 32 bit DMC

### **5.2** Technology schematic



Fig: 5.3 Technology schematic of 32 bit DMC





### 5.6 Applications

Computer memories: the codes used are extended decimal matrix code the latter being perfect single-error-correcting.

**Photographs from spacecraft**: the codes initially used were decimal matrix code, which can be constructed as the orthogonal extended Hamming codes.

**Compact discs**: the codes used are decimal code, constructed using certain finite fields of large prime-power order.

### **CONCLUSION**

As they are subjected to radiation, transient Multiple Cell Upsets (MCUs) are starting to seriously affect memory dependability. More complicated error correction codes (ECCs) are frequently utilized to safeguard memory and stop data corruption brought on by MCUs. Nonetheless, these codes' increased delay overhead is their main flaw. Matrix codes (MCs) based on the Hamming code have been presented recently as a memory security technique. The primary problem is that there are only two error correction codes, and the error correcting skills aren't enhanced in all circumstances. In this work, we design a novel decimal matrix code (DMC) based on the dividesymbol to provide improved memory reliability with low delay overhead. The suggested DMC offers the highest level of error detection capability by utilizing the decimal algorithm. Additionally, in order to reduce the area overhead of additional circuits without compromising the encoding and decoding processes, the encoder-reuse method (ERT) ought to be applied.

### **REFERENCES**

- [1] D. Radaelli, H. Puchner, S. Wong, and S. Daniel, "Investigation of multi-bit upsets in a 150 nm technology SRAM device," IEEE Trans. Nucl. Sci., vol. 52, no. 6, pp. 2433–2437, Dec. 2005.
- [2] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba, "Impact of scaling on neutron induced soft error in SRAMs from an 250 nm to a 22 nm design rule," IEEE Trans. Electron Devices, vol. 57, no. 7, pp. 1527–1538, Jul. 2010.
- [3] C. Argyrides and D. K. Pradhan, "Improved decoding algorithm for high reliable reed muller coding," in Proc. IEEE Int. Syst. On Chip Conf., Sep. 2007, pp. 95–98.

- [4] A. Sanchez-Macian, P. Reviriego, and J. A. Maestro, "Hamming SEC-DAED and extended hamming SEC-DED-TAED codes through selective shortening and bit placement," IEEE Trans. Device Mater. Rel., to be published.
- [5] S. Liu, P. Reviriego, and J. A. Maestro, "Efficient majority logic fault detection with difference-set codes for memory applications," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, pp. 148–156, Jan. 2012.
- [6] M. Zhu, L. Y. Xiao, L. L. Song, Y. J. Zhang, and H. W. Luo, "New mix codes for multiple bit upsets mitigation in fault-secure memories," Microelectron. J., vol. 42, no. 3, pp. 553–561, Mar. 2011.
- [7] R. Naseer and J. Draper, "Parallel double error correcting code design to mitigate multi-bit upsets in SRAMs," in Proc. 34th Eur. Solid-State Circuits, Sep. 2008, pp. 222–225.
- [8] G. Neuberger, D. L. Kastensmidt, and R. Reis, "An automatic technique for optimizing Reed-Solomon codes to improve fault tolerance in memories," IEEE Design Test Comput., vol. 22, no. 1, pp. 50–58, Jan.–Feb. 2005.
- [9] P. Reviriego, M. Flanagan, and J. A. Maestro, "A (32,45) triple error correction code for memory applications," IEEE Trans. Device Mater. Rel., vol. 12, no. 1, pp. 101–106, Mar. 2012.
- [10] S. Baeg, S. Wen, and R. Wong, "Interleaving distance selection with a soft error failure model," IEEE Trans. Nucl. Sci., vol. 56, no. 4, pp. 2111–2118, Aug. 2009.
- [11] K. Pagiamtzis and A. Sheikholeslami, "Content addressable memory (CAM) circuits and architectures: A tutorial and survey," IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2003.
- [12] S. Baeg, S. Wen, and R. Wong, "Minimizing soft errors in TCAM devices: A probabilistic approach to determining scrubbing intervals," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 814–822, Apr. 2010.
- [13] P. Reviriego and J. A. Maestro, "Efficient error detection codes for multiple-bit upset correction in

SRAMs with BICS," ACM Trans. Design Autom. Electron. Syst., vol. 14, no. 1, pp. 18:1–18:10, Jan. 2009.