### Harsh Environments: Space Radiation Environment, Effects, and Mitigation

Richard H. Maurer, Martin E. Fraeman, Mark N. Martin, and David R. Roth

adiation effects in solid-state microelectronics can be split into two general categories: cumulative effects and single-event effects (SEEs). Cumulative effects produce gradual changes in the operational parameters of the devices, whereas SEEs cause abrupt changes or transient behav-

ior in circuits. The space radiation environment provides a multitude of trapped, solar, and cosmic ray charged particles that cause such effects, interfere with space-system operation, and, in some cases, threaten the survival of such space systems. This article will describe these effects and how their impact may be mitigated in silicon-based microcircuits.

### SPACE RADIATION ENVIRONMENT

The one outstanding element that distinguishes the space environment is the presence of radiation. For the purposes of this discussion, we will primarily confine ourselves to natural space radiation.

The natural environment consists of electrons and protons trapped by planetary magnetic fields (Earth, Jupiter, etc.), protons and a very small fraction of heavier nuclei produced in energetic solar events, and cosmic rays (very energetic atomic nuclei) produced in supernova explosions within and outside of our galaxy. Inside large spacecraft structures such as the International Space Station, the primary cosmic beam of approximately 85% protons and 15% heavy nuclei is partially converted into secondary neutrons by collisions with the tens of grams

per square centimeter of material areal density. These secondary neutrons can present an additional threat via single-event effects (SEEs) in electronics.

The space environment has a low dose rate of  $\sim 10^{-4}$  to  $10^{-2}$  rad/s. But mission durations may be in years, thus resulting in large accumulated doses. Over the life of a spacecraft mission, total ionizing dose (TID) levels on the order of  $10^5$  rad are easily accumulated. Candidate devices need to be characterized and qualified against the requirements of a spacecraft mission.

For charged particles, the amount of energy that goes into ionization is given by the stopping power or linear energy transfer (LET) function, commonly expressed in units of MeV•cm²/g or more transparently as energy per

unit length (dE/dx) in kiloelectronvolts per micrometer. The absorbed ionizing dose is the integral of the product of the particle energy spectrum and the stopping power of each particle type as a function of incident energy.

Absorbed ionizing dose is commonly measured in rad, an absorbed energy of 100 ergs/g of material. Because the energy loss per unit mass differs from one material to another, the material in which the dose is deposited is always specified [e.g., rad (Si) or rad (GaAs)]. The Système International (SI) unit for dose is the gray, which is equivalent to 100 rad.

The LET or the rate of energy loss, dE/dx, for a charged particle passing through matter can be expressed approximately by  $dE/dx = f(E) MZ^2/E$ , where x is the distance traveled in units of mass/area or density times distance, f(E) is a very slowly varying function of the ion energy E, M is the mass of the ionizing particle, and Z is the charge of the ionizing particle. Thus, for a given energy, the greater the mass and charge of the incident particle, the greater the amount of deposited charge or energy produced over a path length inside the solid-state material. For relativistic ions, the mass factor in the above equation becomes almost constant and the ion charge dominates. The intensity of heavy cosmic rays as a function of Z peaks at iron (Z = 26), abruptly decreasing thereafter. A very energetic 1 GeV per atomic mass unit iron nucleus will deposit ~0.14 pC in each 10 μm of silicon traversed (in silicon, 22.5 MeV deposits 1 pC of charge).

### **CUMULATIVE EFFECTS**

### **Ionization**

When incident radiation enters a semiconductor solid material such as silicon, an electron—hole pair may be created if an electron in the valence band is excited across the band gap into the material's conduction band. The excited electron thus also leaves a hole behind in the valence band. If an electric field is present, the electrons are readily swept away because their mobility in silicon is much greater than that of the holes. Except for some small fraction of pairs that undergoes recombination immediately, the created electrons and holes are free to drift and diffuse in the material until they undergo recombination or are trapped.

Electron—hole pairs generated in the gate oxide of a metal-oxide semiconductor (MOS) device such as a transistor are quickly separated by the electric field within the space charge region (Fig. 1). The electrons quickly drift away while the lower-mobility holes drift slowly in the opposite direction. Oxides contain a distribution of sites such as crystalline flaws that readily trap the slow holes. Portions of the positively charged holes are trapped at the sites as they slowly flow by. Dangling bonds at the oxide—bulk material interface also trap charge. The response of MOS devices to TID is complex because of the competing effects of the oxide trap- and interface



**Figure 1.** Schematic of an *n*-channel MOSFET illustrating the basic effect of total ionization-induced charging of the gate oxide. Normal operation (a) and postirradiation (b) show the residual trapped positive charge (holes) that produces a negative threshold voltage shift.

trap-induced threshold voltage shifts, which can change over time. The net result is that the integrated circuitlevel behavior is changed because of the induced charge buildup.

Digital microcircuits are affected because trapped charge may shift MOS transistor threshold voltage, a key device parameter that is directly related to digital circuit power consumption and speed. As a result, supply current may increase (Fig. 2), and timing margins may be degraded. In the worst case, functionality may cease because of high leakage current and inability to shut off current between transistor source and drain. Changes in logic signal timing also may cause circuit failure as driving gate strength is reduced with total dose.

Linear microcircuits also may experience performance changes. Input bias current, offset, and drift will change, and voltage offset and drift also will be affected as transistor parameters such as threshold voltage are changed by radiation. Bias and quiescent currents also commonly increase over the time of a spacecraft mission because of TID. In some cases, increased leakage currents require designers to add significant margin to their power requirements. It is not uncommon for devices to show an order of magnitude increase in the leakage current as a result of TID while otherwise still functioning properly.



**Figure 2.** Increase in supply current versus TID for an Actel RTSX72SU FPGA. (Adapted from Ref. 2.) The  $I_{\rm CCI}$  curve (in red) is the current for the input/output (I/O) power supply; the  $I_{\rm CCA}$  curve (in blue) is the current for the logic gate power supply. These supplies are usually at different voltages.

### **Enhanced Low-Dose-Rate Sensitivity**

Satellite mission duration may extend over years, so a large TID may be accumulated. Integrated circuit fabrication changes over the last decade have led to some components with an enhanced sensitivity to radiation when exposed at low dose rate. This effect is called enhanced low-dose-rate sensitivity (ELDRS). The standard TID dose rate for ground testing is generally ~50 rad/s. This dose rate allows a qualification test to be run in an 8-hour shift. However, typical ELDRS testing is done with a dose rate of only 10-100 mrad/s; there is a requirement for test times on the order of weeks to months, which is obviously much closer to the rate at which TID will be accumulated during the mission. This extended but more realistic testing is expensive and can affect a spacecraft program schedule. Fortunately, some vendors producing radiation-hardened devices have determined the underlying cause of ELDRS for their parts and modified their manufacturing process to eliminate the problem.

### **Displacement Damage**

Devices that depend on bulk physics for operational characteristics, such as solar cells, particle detectors, photonic/electro-optic components, and even some linear regulators, have shown displacement damage sensitivity. Radiation particles such as neutrons, protons, and electrons scatter off lattice ions, locally deforming the material structure (Fig. 3). The band-gap structure may change, affecting fundamental semiconductor properties. For example, the output power of a spacecraft solar array degrades during the mission life of a spacecraft because of displacement damage. Another example of displacement



**Figure 3.** Schematic of atomic displacement damage in crystal-line solid. (a) Atomic displacement event. (b) Simple radiation-induced defects (vacancy and interstitial). Atomic displacements produce lattice defects that result in localized trap states (energy levels within the semiconductor band gap). Electrical parameters such as minority carrier lifetime and transistor gain are affected.

damage is an increase in recombination centers in a particle detector, ultimately leading to increased noise and consequent decreased energy resolution.

Displacement damage also is important for photonic and electro-optic integrated circuits such as charge-coupled devices (CCDs) and opto-isolators. Coulomb scattering with atomic electrons and elastic and inelastic nuclear scattering interactions produce vacancy/interstitial pair defects as the regular structure is damaged. The defects produce corrupting states in band gaps, leading to increased dark current and reducing gain and charge transfer efficiency (CTE). Traps and defects also serve as sinks and scattering centers, removing majority carriers, decreasing carrier mobility, and increasing junction leakage currents.

The amount of displacement damage is dependent on the incident particle type, incident particle energy, and target material. Displacement damage is similar to TID in that the effect is cumulative. Characterizing displacement damage is more complex than characterizing TID. The most commonly used method to quantify displacement damage is non-ionizing energy loss (NIEL). NIEL coefficients vary depending on radiation type, energy, and the target material. With a matrix of NIEL coefficients, the displacement damage can be estimated for an energy spectrum with mixed particles.

### **Single-Event Effects**

If the amount of charge collected at a junction exceeds a threshold, then an SEE can be initiated. An SEE can be destructive or nondestructive. Destructive effects result in catastrophic device failure. Nondestructive effects result in loss of data and/or control.

SEEs are generated through several mechanisms. The basic SEE mechanism occurs when a charged particle travels through the device and loses energy by ionizing the device material. Other physical charge generation mechanisms include elastic and inelastic nuclear reactions. The charge collection mechanisms are an interesting and complex set of subjects that are continuously refined in the literature.

The charge generated by this single strike is collected, producing spurious voltage on a "sensitive" node that causes a circuit-level effect (Fig. 4). The number of electron—hole pairs generated is proportional to the stopping power of the incident particle in the target material. In silicon, it takes 22.5 MeV of energy to generate 1 pC of charge. The generated charge recombines or is collected at the various nodes within the region of the ion strike. The charge collection threshold for the single event is called the critical charge or  $Q_{\rm crit}$ . If  $Q_{\rm crit}$  for a device is reduced, then its SEE rate is increased.

Although TID testing can be accomplished by using APL in-house facilities, access to off-site particle accelerators is required for SEE testing. SEE sensitivity is characterized as a function of LET versus equivalent

cross-sectional area. The LET can be varied at a particle accelerator by changing the incident particle mass, incident energy, and angle of strike. A particle entering a sensitive volume at 60° will deposit twice the energy of a particle entering at normal incidence; therefore, the LET is effectively doubled. The key measurement for these experiments is the number of single events that occur as function of the number of incident particles at a given LET. These data are combined with spacecraft trajectory information and used to predict a specific mission SEE rate.

### Latch-Up

Integrated circuits fabricated with complementary MOS (CMOS) fabrication processes are very widely used in space electronics. These chips inherently include parasitic bipolar junction transistors (BITs) formed by closely located CMOS structures that under normal conditions form the integrated circuit's n-channel and p-channel transistors (Fig. 5). The collector of each parasitic bipolar transistor forms the base of another parasitic device connected in a positive feedback loop. This circuit is equivalent to a four-layer diode device commonly known as a silicon-controlled rectifier (SCR). Under normal operation, no current flows through the parasitic base regions. However, if a small current is injected into a base region, perhaps because of the charge collected from a single-particle energy deposition, the positive feedback will cause the current to quickly become very large. The high current will continue to flow between the integrated circuit power supply pins until the voltage drops below a threshold called the holding voltage. This sustained high-current state induced by a single-particle interaction is referred to as single-event latch-up (SEL). A latched part can be permanently damaged as a result of thermal runaway or failure of on-chip metallization or packaging bond wires. However, if power is quickly

> removed or current is limited, damage to the integrated circuit can be avoided.

# Input Low Output Vdd High On P+ N+ P+ P+ P+ N+ P+ Drift NFunneling ---+ Diffusion

**Figure 4.** Schematic of a heavy ion strike on the cross-section of a bulk CMOS memory cell.

### **Other Destructive Effects**

Power devices may be sensitive to single-event burnout (SEB) and single-event gate rupture (SEGR). SEB is similar to SEL in that it generates high-current states that ultimately lead to catastrophic device failure. SEB is a high-current condition in a parasitic *npn* bipolar structure similar to latch-up. It is observed in vertical power MOS field-effect transistors (FETs) and some bipolar transistors. The charged particle strike induces current in the



**Figure 5.** Bulk CMOS inverter architecture cross-section showing the parasitic bipolar SCR structure that forms, making it susceptible to SEL.

*p*-structure forward-biased parasitic transistor. If the drain-source voltage is higher than the breakdown voltage of the parasitic *npn*, an avalanche occurs and high current flows. This effect can be permanently damaging to one or more of the parallel islands in the architecture of the power MOSFET by producing an uncontrolled short.

SEGR is initiated when the incident particle forms a conduction path in a gate oxide, resulting in device damage (Fig. 6). SEGR can occur when charge builds up in dielectric around the gate of a power MOSFET. The localized field builds up enough for the field across the dielectric to exceed the dielectric breakdown voltage, resulting in a low-resistance path across the dielectric. The conduction path in the oxide is an example of classic dielectric breakdown similar to lightning during a thunderstorm. Operating a power FET well below its specified limits greatly reduces the likelihood of a destructive event.



**Figure 6.** Photograph of a catastrophic SEGR in a power MOSFET causing functional failure.

### **Single-Event Upset**

A single-event upset (SEU) is the change of state of a bistable element, typically a flip-flop or other memory cell, caused by the impact of an energetic heavy ion or proton. The effect is nondestructive and may be corrected by rewriting the affected element. As with other SEEs, a single-particle strike may introduce enough charge to exceed a sensitive circuit node's  $Q_{\rm crit}$  and change the logic state of the element. The resulting change of state is often known as a bit-flip and can occur in many different semi-

conductor technologies.

The vulnerability of a device to SEU is determined by two parameters: (i) the threshold LET, which is the minimum amount necessary to produce upset; and (ii) the saturation LET cross-section in square centimeters, which is a function of the surface area of all of the SEUsensitive nodes.

Static random access memory (SRAM) and dynamic random access memory (DRAM) are two common integrated circuit memories that experience SEU. SRAMs have a structure consisting of an array of nearly identical memory cells. The cell is a cross-coupled inverter pair using four transistors in the inverters. An ion strike on the four transistor drains starts a mechanism potentially leading to upset (i.e., if the voltage pulse attributable to the ion strike is faster than the feedback loop between the two inverters, a change of logic state will occur until the next write to the cell).

DRAM structures have cells using charge storage in a capacitor to represent data. Typically, only one state is susceptible to SEU (i.e., 1s can be upset but not 0s). The storage mechanism is passive with no feedback loops, and cells must be refreshed regularly to continue to hold information. Ion strikes readily upset DRAMs, causing both cell storage errors and bit line errors (disturbance of pre-charged bit lines used in the read cycle).

Both types of memory circuits also include supporting circuitry such as sense amplifiers and control logic that also may be sensitive to SEEs or single-event transients (SETs) (see below). Very dense memory circuits also may have multiple bit upsets when one ion strike causes upsets in multiple bits. That may occur if the ion track is close to both bits or if the angle of incidence is close to parallel to the die. As fabrication feature sizes are decreased, multiple upsets are more common because sensitive circuit nodes are closer together and  $Q_{\rm crit}$  tends to be smaller.

### **Single-Event Transients**

SETs are momentary voltage excursions at a node in an integrated circuit caused by a transient current generated by the nearby passage of a charged particle. Most SETs are harmless and do not affect device operation. However, there are several types of SETs that can cause harm or corrupt data. Transients in logic gates may be captured into storage elements if clock edges line up with the transients; therefore, operating at higher clock speeds increases the chance of a logic-gate SET propagating through a storage element and affecting subsequent component behavior. This effect is observed during heavy ion testing when the SEU cross-section appears to increase (and hence the predicted SEU rate is increased) as the device being tested is operated at increasing clock speed.

Linear regulators and DC/DC converters are prone to SETs on their regulated output. Current radiation-tolerant field-programmable gate arrays (FPGAs) require a core logic supply voltage with tight tolerance because of the small feature size of the transistor in the logic array. Keeping SETs on FPGA core power within these limits is difficult, and testing has revealed that many DC/DC converters and linear regulators are not suitable for this application.

SETs also can appear on the input of an analog-to-digital converter (ADC), resulting in corrupted data at the output of an ADC. Often SETs can just be considered another noise source and handled as such during data processing. However, if the digitized data are used as an input to fault detection and correction processing, the algorithms should not take corrective action based on only a single sample that may have been corrupted by an SET.

### **Single-Event Functional Interrupt**

An SEU or SET may not be directly observable at the pins of a device. However, at some time after an SEU or SET occurs, the device may operate in an unpredictable manner. In complicated devices such as microprocessors or flash memories, classes of SEEs that have been named single-event functional interrupts (SEFIs) have been observed. An SEFI is an SEE that places a device in an unrecoverable mode, often stopping the normal operation of the device. It is usually caused by a particle strike but can be produced by other causes. SEFIs are not usually damaging but can produce data, control, or functional-interrupt errors that require a complex recovery action that may include reset of an entire spacecraft subsystem.

For example, an SEU in the program counter register of a microprocessor may cause the sequence of instruction execution to unexpectedly jump to a different portion of code leading to incorrect program behavior. Flash memories are nonvolatile memories that include complex internal sequencing logic with an internal state to operate. The device can be commanded to erase a block, program a page, and read a page at the external pins. The execution of these commands is controlled and sequenced with an internal state machine. While qualifying a flash memory at a particle accelerator, we noticed that the flash memory was executing erasures, programs, and reads without any external stimulus. This is another

example of an SEFI that can have drastic results by erasing random blocks and writing over random pages.

### **Stuck Bits**

Stuck bits are a permanent failure when the bistable element not only has been changed but is stuck in one of its two possible states. This effect can be serious if it occurs in operational instruction memory where the given instruction will always be incorrect. Stuck bits also defeat the error detection and correction (EDAC) mitigation technique for the same word because such routines normally correct single-bit errors but only detect and do not correct double-bit errors.

### MITIGATION OF RADIATION EFFECTS

### **Mitigation of Cumulative Effects**

Total dose effects are minimized by shielding, derating, and conservative circuit design. Radiation-hardened devices also may be used if available with suitable technical specifications. Dose-depth curves showing the ionizing dose at the range of shield depths for the spacecraft and radiation total dose testing are always necessary if parts without known total dose properties are used (Fig. 7). Flight part qualification testing is usually done to two to three times the expected mission dose to provide margin given the uncertainty in the prediction of expected dose. This conservatism is necessary because of the dynamic variability of the natural environment for which static models are used and because of the variation of the hardness levels of the individual parts in the flight lot from which only a small sample size is used in the qualification test.

### Shielding

Tantalum is commonly used for machined spot shields. Tungsten also can be used, especially when it doubles as a heat sink for a printed circuit board. Both of these high-electron number (commonly called "high-Z") shielding materials have approximately six times the density of aluminum; this allows thinner shields to be built, which is important for tightly packed printed circuit boards. If thick shields of these dense materials must be used in a high-radiation environment, a thin inner layer of aluminum often is applied at the integrated circuit die to reduce dose enhancement attributable to secondary electrons and photons produced in the high-Z shield.

Shielding incurs a small weight penalty when restricted to a few specific parts; it is very effective in reducing the impact of electron and low-energy proton dose but generally does not reduce the rate of SEEs caused by high-energy cosmic rays. In fact, thick shielding can increase the SEE rate because of the creation of multiple secondary particles attributable to interactions between the cosmic rays and the shield material.



**Figure 7.** Topography Experiment (TOPEX) mission dose–depth curve. The curve shows much information, including the effectiveness of shielding the electrons and the penetrating capability of the protons. The quasi-asymptotic total dose curve at the larger depths sets a floor of ~10,000 rad (Si) below which it is impractical to shield.

Shielding, conservative design, limited view angles, and thorough device characterization are necessary to cope with displacement damage; there also may be alternate MOS technologies that are less susceptible to displacement damage.

If mass for the mission is at a premium, then a more sophisticated ray trace analysis (such as the NOVICE code that requires a detailed geometric representation of the spacecraft) can be performed that takes incidental shielding from neighboring boxes and the spacecraft and inherent shielding inside electronics boxes into account, including other electronics boards and mechanical supports within the box. The box mass is usually smeared across its volume to give an average density. Most materials in these applications have a mass density similar to silicon or aluminum (2.4–2.7 g/cm<sup>3</sup>). Specific doses can be estimated at specific locations. The ray trace analysis usually produces lower dose estimates than the simplified generic geometries (e.g., sphere or slab) used in basic shielding routines (SHIELDOSE).

### **Derating and Conservative Circuit Design**

The system/subsystem design with its operational parameter space determines the possible derating to be applied to sensitive devices and circuits. In some cases, a device that is functional but has some parameters exceeding specifications after the total dose test can be derated if the out-of-spec parameters do not affect circuit function and are not radically increasing as the dose is increased.

For example, many MOS-based operational amplifiers have extremely low input bias currents that measurement shows are sensitive to total dose. If that current is increased orders of magnitude by radiation, acceptable operation may still be possible if modest value gain configuration resistors are used to minimize the bias current drop across the feedback network. If the other advantages of using the part, such as low power consumption, high bandwidth, and fast slew rate, are less sensitive to total dose, then using the part may still be beneficial.

Power supply current of many parts also commonly may increase with total dose. For example, the Actel 54SX72-SU FPGA shows <1-mA leakage current after exposure to <30 krad, but leakage will increase to 100 mA at 75 krad (Fig. 2). A design may be able to accommodate the dramatic increase in power if appropriate provisions are made.

### **Operating Conditions**

A good example of the use and control of operating conditions to mitigate the effects of radiation is a CCD. The basic equation governing the dark-current increase in one manufacturer's silicon CCDs subjected to high-energy proton damage is  $\Delta I = (9.3 \times 10^{-6}) \times (V_D) \times (F) \times (\text{NIEL}) \times T^2 \exp(-6628/\text{T})$ , where  $\Delta I$  is the mean dark-signal increase in electrons per pixel per second,  $V_D$  is the depletion volume of the pixel in  $\mu\text{m}^3$ , F is the proton fluence in cm<sup>-2</sup>, NIEL is the non-ionizing energy loss in keV•cm<sup>2</sup>/g, and T is the temperature in kelvin.

The temperature factor dominates the equation because it is the only nonlinear factor. For example, at a room temperature of 300 K, the  $T^2$  exp(-6628/T) factor equals  $2.29 \times 10^{-5}$ , and at 203 K ( $-70^{\circ}$ C), it equals  $2.72 \times 10^{-10}$ . Obviously, the dominance of the temperature factor is the reason that operating at cold temperature greatly reduces any radiation-induced dark-current increases.

The effective activation energy for dark-current generation centers is ~0.6 eV and is independent of temperature. The mean damage energy deposited by high-energy protons is in the range of hundreds of kiloelectronvolts for these silicon devices, much larger than any energy attributable to lattice vibration or valence/conduction states, which are governed by temperature. This latter energy is attributable to the kinematics (elastic) and dynamics (inelastic) of the proton–silicon nucleus

collision. Annealing of bulk damage in these silicon devices only occurs at temperatures much larger than 100°C and thus is usually not a factor in the testing or operating of the spacecraft CCDs.

Robbins<sup>1</sup> concludes that "A general expression for the mean bulk dark signal has been obtained as a function of NIEL, fluence, depletion volume and temperature and appears to be appropriate for neutron as well as proton irradiated devices, independent of silicon resistivity."

Sensitive measurements at APL have shown the advantage of low-temperature operation for bulk dark current and CTE.

### **Mitigation of SEEs**

Although using parts that are insensitive to SEE is obviously preferable, in many cases, mitigation measures can be taken to overcome the impact of a part's SEE sensitivity. Mitigation is particularly attractive for devices with increased capability (e.g., storage density, performance, or speed-power product) that are sensitive to SEE. Often these sensitive parts are several orders of magnitude more capable than radiation-tolerant equivalent devices, so successful mitigation of SEE can result in substantial system performance improvement and may even be mission enabling.

### **Latch-Up Protection Circuits**

The goal of latch-up mitigation is to allow proper system operation after a latch event. There are several questions that the circuit designer should be able to answer when developing an approach to latch-up mitigation of a specific device: (i) How often during the mission will the event occur? (ii) What is the impact on the system if a device latches? (iii) Can features of the system be used in conjunction with some additional circuitry to work around a latch? (iv) What are the detailed characteristics of the device?

If a latch-up is possible but unlikely during a mission, then mitigation using redundancy already designed into the system may be appropriate. If the latch-up is likely to occur frequently in a mission-critical circuit, then mitigation should include full protection against device damage, automatic recovery from latch-up, and resumption of normal system operation (Fig. 8). Solutions are frequently developed that fall in between these extremes.

Consider a mitigation scheme for a CMOS low-speed, low-power ADC used to acquire engineering data once per second. This type of data is important to monitor system operation, yet any single data point is not vital, and occasional missing samples can be tolerated. A typical ADC for this application uses <5 mA during normal operation and can draw >100 mA when latched (before its destruction as a result of excessive current).



**Figure 8.** Latch-up protection circuit. The telemetry device susceptible to latch-up is called the protected device in the diagram. The current sense, comparator, and control logic detect any overcurrent and remove the applied voltage. The crowbar is enabled after overcurrent detection to shunt any charge to ground that remains on the protected supply line.

Powering the ADC once per second to acquire data and then removing power can mitigate latch-up. If the ADC should latch while powered, then simple current-limiting resistors on its input and power-supply pins will prevent damage until the sampling interval is completed. Finally, power is removed from the device, which clears the latch.

Latch-up in a higher-power-consumption device requires more circuitry for mitigation. Simple resistor protection is not practical because a resistance value that does not cause excessive supply drop during normal operation also will permit damaging currents to flow during latch-up. Typically, a complex mitigation circuit that includes a switch in series with the device power supply along with circuitry to sense excessive current caused by a latch-up is developed. Frequently, an additional parallel crowbar switch also is needed to shunt current from bypass capacitance from passing through the protected device while the series switch is being turned off. Finally, the components in the mitigation circuit also must be tolerant of the mission radiation environment, and testing must be performed to verify that the susceptible device is protected when a radiation-induced latch-up occurs.

As an example, latch-up protection is being flown for the AD5326 12-bit Quad-DAC on the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) spectral imager now in operation at Mars. Five latch-ups have occurred in ~18 months of the mission to date. Testing at Brookhaven National Laboratory in May 2004 led to a prediction of one latch-up every 70 days or seven to eight latch-ups after 18 months. The conservatism in the prediction is normal for our SEE investigations.

### **SEU/SET Mitigation**

As with SEL, parts that are immune to SEU/SET are easiest to use. However, vastly improved performance of SEU/SET-sensitive parts may mean that, in many cases, far more capable systems can be designed by using sensitive parts with appropriate mitigation circuits than if a design is restricted to using only SEU/SET-immune components.

SEUs in arrays of memory elements can often be mitigated by using some redundant cells and one of a large variety of error-detecting and -correcting codes. For example, the well known Hamming code can use a modest number of extra bits (for example, 21 bits to store 16 bits of data) to detect all possible single- and double-bit errors. In addition, the code adds enough extra information so that any possible single-bit error can be corrected.

Sequential logic such as finite state machines and counters also contain memory elements that may be susceptible to SEUs. These memory elements continuously drive logic, and an upset can easily propagate widely through a circuit. Mitigation through coding is conceptually still feasible but is seldom used because encoding/decoding and signal routing overhead is substantial. More commonly, logic memory elements such as flip-flops are triplicated, and a voting circuit is used to continuously detect and correct any SEU. Recent families of radiation-tolerant Actel FPGAs, widely used in space electronics, implement this mitigation within the core logic array so that the user does not need to explicitly include redundant logic. Design techniques such as using only fully decoded finite-state machines to ensure that an SEU does not cause a transition to an illegal state and extra logic to detect SEU-caused illegal values also may be appropriate. One FPGA vendor, Xilinx, has developed a software tool to automatically triplicate a design's logic. That mitigation can detect many SETs as well as SEUs, although with an obvious logic density penalty.

Redundancy and voting techniques also can be used to mitigate SEUs in noncustomizable integrated circuits such as microprocessors. For example, multiple microprocessors can be run in lockstep, with all outputs compared and voted to ensure that only proper values are used. Resynchronization of a processor affected by an upset is complex and is an area of active research in industry. Processor boards with very high throughput and modest power consumption compared to designs using fully SEU-immune components have been developed based on these principles.

Simpler techniques also can improve a design's tolerance to SEUs. Some examples include watchdog timer, state verification, and redundant calculations. A watchdog timer is a continuously operating counter that never overflows in normal operation because it is periodically restarted. If an SEU delays the restart, then an overflow of

the counter can be used to reset the system. Control registers can be scanned both before and after an event to verify that they contain the expected values. If an SEU affected the value in a register, then the operation controlled by the register can be flagged as questionable. Redundancy in time rather than circuitry also can be used to process data multiple times to detect a processing error.

The impact of SETs is more difficult to assess. Most parts tested to date are more insensitive to SETs than other SEE phenomena. In addition, digital logic circuits tend to be insensitive to an SET because the transient is short compared to the system clock frequency. Usually, an SET will only propagate in an observable way through a circuit if a transient on a gate output happens to get sampled while it is active. As a result, SET sensitivity in logic circuits also is related to operating frequency as well as input radiation. Most digital circuits for space systems designed to date operate at relatively modest speed, and hence SET tolerance has not yet been a major design driver.

SETs also can affect analog circuits. A recent area of concern has been SETs on the output of power conditioning integrated circuits for the core voltage supply of recentgeneration FPGAs. An absolute maximum supply of 1.5  $\pm$  0.150 V is required for several vendors' highest-density FPGAs. Data have been reported for several potential regulators for this application that show SETs that exceed this requirement. Testing of other power conditioning integrated circuits has induced some organizations to use larger and less efficient discrete regulator designs.

### **Power Devices**

SEB and SEGR cause high currents in a variety of semiconductor power devices. For power-MOSFET applications, current can be limited by putting a resistor in series with the drain to reduce the current to a value low enough to prevent damage. Susceptibility to SEB depends on the magnitude of the drain-source voltage. In many cases, operating below 50% of rated breakdown voltage is sufficient to prevent burnout.

For SEGR, the mitigation technique is to define a safe operating range by defining a relationship between the gate-source voltage and the drain-source voltage that includes critical device parameters such as the oxide thickness. This safe operating range should be experimentally verified.

Most study of gate rupture has been done with power MOSFETs that have thick oxides and large dimensions. In contrast, submicrometer structures such as high-density memory cells use very thin oxides and potentially also may be susceptible to gate rupture. Early predictions based on extending power MOSFET measurements to high-density logic and memory devices were that the higher fields associated with smaller dimensions would make gate rupture sensitivity worse as oxides' thickness

scaled down. Fortunately, measurements on modern parts show that SEGR sensitivity has decreased instead. Improved oxide purity required to build thinner layers with high yield has essentially increased gate breakdown voltage so that SEGR is not yet a major issue for small-feature-size MOS technology.

### RADIATION-HARDENED BY DESIGN DEVICES

### **Radiation Response of Modern Commercial CMOS Transistors**

The commercial electronic market is largely CMOS in nature. As market forces drive manufacturers toward CMOS processes with ever-decreasing size and area, the density of the complex electronic components is increased. However, the push for greater density and speed also has resulted in thinner gate oxides of greater purity than previous generations. This change results in smaller volumes with lower defect densities that collect less charge. A direct consequence is a negligible threshold voltage shift in the CMOS transistor in processes with gate lengths  $\leq 0.5~\mu m$ . Although the issue of threshold voltage shift is not significant, there are still significant leakage currents that adversely affect the operation of both digital and analog circuits.

Two parasitic transistors are formed in parallel with the channel of a modern MOS transistor. These parasitic devices are formed at the edge of the transistor channel where the gate region extends beyond the channel boundary. Doping implants during device manufacturing and oxide layer thickness cause the parasitic device to have threshold voltages that exceed the maximum voltage rating of the process. For example, in a 0.5- $\mu$ m CMOS process, the threshold of the parasitic device is >15 V, and the maximum operating voltage of the process will be 5 V. Prior to any total dose, the parasitic devices are strongly cut off and do not affect normal transistor operation.

However, the parasitic device thick oxide provides a large volume for charge collection. The thick oxide also has more defects and greater potential for chargetrap sites. These two conditions result in an oxide that is an excellent collector for stray holes and, thus, not very radiation tolerant. Parasitic devices frequently experience significant threshold voltage shift after moderate radiation exposure. For *n*-type MOS devices, the threshold may easily be reduced to a level that falls within the normal operating voltage of the device. This threshold shift can be large enough to result in an offset in the current under all bias conditions. p-type MOS devices also experience a large threshold voltage shift. However, the accumulation of holes causes the threshold of the p-type MOS devices to become more negative. Therefore, the magnitude of the threshold actually increases, and the parasitic devices along a p-type MOS channel do not contribute excess leakage.

### **Enclosed Drain Devices**

Because the dominant cause of radiation-induced leakage is at the parasitic transistors at the edge of the device, minimizing these devices would result in a more total-dose-tolerant transistor. Minimization can be accomplished by turning the transistor channel back around itself, resulting in an annular device structure where the gate and the edge of the channel enclose the drain, thus eliminating the edge leakage path. This method, referred to as a reentrant drain topology, has very good radiation performance. Figure 9 plots data from a normally structured and an annular device manufactured on the same chip. It can be easily seen that the annular device shows little change at 100 krad of total dose exposure.

Although the radiation response can be improved by use of the annular device layout style, the method does come with some penalties. The most obvious penalty for using the reentrant design is area. The minimum size of the reentrant device is two to three times the area of the minimum-size transistor. The annular device also has much larger gate capacitance and greater drive strength, resulting in increased power consumption for some designs.

The decision to use an annular geometry device presents a number of modeling issues for the transistor-level designer. Every MOS transistor model in standard circuit simulators assumes that the transistor channel will be rectangular with some defined width and length. With the reentrant design, the width along the inside of the gate is smaller than the width along the outside of the gate. Furthermore, the width differential changes with the choice of channel length. For a given drawn channel length, an effective gate width must be determined. There have been a number of solutions proposed to this



**Figure 9.** Drain current as a function of gate-source voltage for straight and annular *n*-channel MOSFETs. By enclosing the drain, we can eliminate the increase in leakage current attributable to the parasitic edge transistor.

problem. A common estimate is to take the average of the internal and external perimeters. Other solutions use conformal mapping or break the gate into regions of constant electric field to arrive at more complex solutions.

The asymmetric design of the annular device gives the designer a choice of designating the inner node as the drain or the source of the transistor. Choosing the center node as the drain is an excellent choice for digital circuits because the smaller area means less load capacitance needs to be switched. If the designer chooses the outer region as the drain, the transistor will have a smaller output conductance because current density is lower, making it a good choice for analog applications.

We have developed techniques at APL to model both DC and transient effects of reentrant devices. Transistors with a few common geometries were characterized, and then model parameters were determined to fit the measured data. However, the gate area computed in this model is less than the actual area and underestimates the actual gate capacitance. The reduced capacitance can adversely affect simulation of the AC and transient performance of the circuit. To compensate, we use a macro model that adds an explicit additional extra gate capacitance for circuits in which the capacitance estimates are critical.

### Latch-Up in CMOS

There are several approaches to mask level design that can prevent latch-up. For latch-up to occur, the power supply voltage must exceed the holding voltage of the parasitic SCR maintaining a latch, and the loop gain within the parasitic SCR must exceed unity. Although lowering the power supply below the holding voltage would remove the possibility of latch-up occurring, the holding voltage for most CMOS technologies is well below the power supply. Therefore, the solution requires limiting the gain of the stray bipolar devices forming the SCR to prevent latch-up from starting.

Two design methods are possible to reduce the gain: (i) the retardation of the parasitic BJT gain or (ii) the decoupling of the parasitic BJTs. The current gain of a BJT is related to the amount of base current needed to maintain a given collector current. Increasing the path length for carriers across the base increases the base resistance and also the recombination of minority carriers, thus reducing the gain of a bipolar device. Increasing path length is easily accomplished by increasing the spacing between drain/source regions for the nFETs and n-well regions that hold the pFETs. By increasing the spacing, the distance between the base-collector junction of the PNP and the base-emitter junction of the NPN also is lengthened. Although this design approach does reduce the potential for latch-up, circuit area is increased.

Second, decoupling the parasitic NPN and PNP transistors will reduce latch-up sensitivity. Current flow in

the parasitic devices follows the path of lowest resistance. To decouple the collector of the PNP from the base of the NPN, carriers moving toward the base region can be collected or deflected before reaching that area. Interposing well and substrate contacts along the path of current flow can accomplish this goal. The placement of the contacts provides a low-resistance path and effectively shunts any stray current away from the parasitic SCR structure preventing its activation. In our work at APL, we have adopted a very conservative design style whereby areas containing any type of transistor are completely enclosed with a continuous band of well/substrate contacts. This guard-banding methodology has proven highly effective in eliminating latch-up in our custom application-specific integrated circuits (ASICs). Furthermore, this conservative layout style also helps to minimize noise coupling between digital and analog portions of mixed-signal designs by minimizing variations in the local well/substrate voltage. Using these radiation-hardening design methods, we have designed ASICs that have been fabricated on commercial fabrication lines with >300-krad TID tolerance and no latch-up sensitivity.

### **SEU/SET-Immune Circuits**

As with FPGA or other digital system designs, coding and voting methodologies are a viable technique for SEU/SET mitigation in a custom ASIC. However, because the designer is no longer limited to logical function blocks (i.e., gates and flip-flops), it is possible to formulate new techniques by developing functional blocks that are inherently SEU-tolerant.

The latch is a common candidate for an SEU hardened by design block. The typical conventional latch stores data as complementary signals on two internal nodes. The data are maintained by using positive feedback via an inverter to provide a stable configuration. If a particle strike occurs, one of these signals may be altered, forcing the cell into an unstable arrangement. The internal gain and feedback of the cell forces the two nodes back into a stable configuration, but the resultant value may not be the original data because the initial value was lost. If the data also were stored on additional nodes, then the cell would have enough information to restore to the previous value. A variety of latch designs have been developed that make use of the principle of multiple internal storage nodes. Because of the nature of the latch circuit, at least two storage nodes are added (four total) so the cell can be configured into 16 possible states, with only two of these states being stable. If the cell ever enters one of the invalid states, the SEU-immune circuit will force it back to the predisturbed state.

A popular technique is the dual-interlocked cell (DICE) topology. The DICE has four tri-state inverting stages connected in a loop, resulting in four internal nodes that store the data. Each stage of the latch has two inputs that must be in agreement for the valid output to

be generated. If the input signals differ, the cell enters a high impedance state. The control signals come from the previous and following stages and are identical in normal operating mode. The output of each stage drives the input of the subsequent and previous stages, which form the interlocked feedback paths within the DICE. If one node is altered by an ion strike, the output of that stage enters a high impedance state, preventing a false signal from propagating as the second feedback path forces restoration of the disturbed node.

Dual-rail signal encoding can be extended to nonsequential logic to provide SET immunity. SETs occur on a short time scale and have not been a major problem in older designs. However, as clock speeds and edge rates approach time scales of SET, the likelihood of an SET propagating and affecting proper operation of subsequent logic is more probable.

The above techniques assume that any disturbances induced by a particle strike will affect only a single node in the circuit. This assumption is based on the low probability of a particle striking multiple nodes within the small area that a typical logic gate or latch occupies. However, as transistor sizes shrink and circuitry density increases, the area influenced by an ion strike is becoming comparable to the area of the gate itself. As a result, a single-particle strike can disturb multiple nodes within a single gate simultaneously, making the redundancy described above ineffective for SEU/SET mitigation. Researchers are starting to see such effects in fine-feature-size designs. In the near future, radiation-hardened by design (RHBD) techniques will have to include new methods to circumvent this problem with new circuit topologies or will have to use more creative physical layouts of transistors to spatially separate critical nodes within a logic cell.

### **CONCLUSIONS**

Designing and fabricating electronics for harsh radiation environments is mitigated by a combination of shielding, derating, and controlling operating conditions for cumulative ionization and displacement damage effects that cause gradual degradation in electronic devices. Radiation-hardened devices can be used if available.

For SEEs, shielding is only minimally effective. Mitigation is achieved by a combination of EDAC, anomaly detection and reboot, and redundancy. The latter often implements voting techniques.

APL RHBD hardware efforts are critically important because there are only a limited number of devices with the requisite radiation hardness in today's commercial market. Functions specific for space application can be integrated onto a single chip or ASIC for performance, mass, and power optimization. This integration is important, in general, because radiation mitigation reduces performance parameters such as speed when implemented for commercial devices.

**ACKNOWLEDGMENTS:** We acknowledge the support of many APL Space Department spacecraft and space instrument programs and independent research and development projects. We also acknowledge many discussions with colleagues about these sometimes vexing issues in the qualification of flight hardware.

### REFERENCES

<sup>1</sup>Robbins, M. S., "High-Energy Proton-Induced Dark Signal in Silicon Charge Coupled Devices," *IEEE Trans. Nucl. Sci.* **47**, 2473–2479 (2000).

<sup>2</sup>Wang, J. J., *RTSX72SU-D1N8A1 TID Test Report*, Actel Corporation, Mountain View, CA, No. 05T-RTSX72SU-D1N8A1, http://www.actel.com/documents/RTSX72SU-D1N8A1-r1.pdf (21 Sept 2005).

## The Authors

**Richard H. Maurer** is a Principal Professional Staff Physicist in APL's Space Department. He has been the radiation environment and effects engineer on many APL spacecraft missions since 1981, including Active Magnetospheric Particle Tracer







Martin E. Fraeman



Mark N. Martin



David R. Roth

Explorer (AMPTE); Geodetic Earth Orbiting Satellite (GEOSAT); Midcourse Space Experiment (MSX); Near Earth Asteroid Rendezvous (NEAR); Mercury Surface, Space Environment, Geochemistry and Ranging (MESSENGER); and presently Radiation Belt Storm Probes (RBSP). Dr. Maurer's expertise is in the radiation environment, detection, and total dose effects. Martin E. Fraeman is a member of the Principal Professional Staff in the Space Electronics Group at APL. He has worked on a wide range of assignments since joining APL in 1981, including the Space Shuttle-based Hopkins Ultraviolet Telescope, the Ion and Neutral Camera (INC) sensor on the Magnetospheric Imaging Instrument (MIMI) on Cassini, MESSENGER's x-ray solar monitor, and the Mini-RF Synthetic Aperture Radar (SAR). Mr. Fraeman also has led numerous independent research and development projects at APL that have resulted in RHBD techniques, including the development of a language-directed 32-bit microprocessor. Mark N. Martin is a member of the Principal Professional Staff in APL's Space Department. He received a Ph.D. in electrical engineering from The Johns Hopkins University in 2000. Since 1999, he has worked on various RHBD ASICs, such as the Quad-DAC and Power Remote I/O (PRIO). Additionally, Dr. Martin holds an appointment as an assistant research professor in the Electrical and Computer Engineering Department of The Johns Hopkins University and teaches digital and analog integrated circuit design in the university's Engineering Programs for Professionals. David R. Roth is a Principal Professional Staff Physicist in APL's Space Department. He has been responsible for the SEE testing and predictions on several recent APL spacecraft and instrument missions, including MESSENGER, New Horizons, and CRISM. Presently, he is the radiation effects engineer on the Europa mission study. Dr. Roth's expertise is in developing the hardware and software for sophisticated SEE testing executed at off-site particle-accelerator facilities. For further information on the work reported here, contact Richard Maurer. His e-mail address is richard.maurer@jhuapl.edu.