

### Memories and NASA Spacecraft: A Description of Memories, Radiation Failure Modes, and System Design Considerations

Kenneth A. LaBel

Co- Manager,

NASA Electronic Parts and Packaging (NEPP) Program

NASA/GSFC

ken.label@nasa.gov

301-286-9936

http://nepp.nasa.gov

Ray Ladbury - NASA/GSFC Timothy Oldham, Dell/PSGS – NASA/GSFC

## Abstract



 As NASA has evolved it's usage of spaceflight computing, memory applications have followed as well. In this talk, we will discuss the history of NASA's memories from magnetic core and tape recorders to current semiconductor approaches. We will briefly describe current functional memory usage in NASA space systems followed by a description of potential radiation-induced failure modes along with considerations for reliable system design.



## **Outline of Presentation**

- Introduction The Space Memory Story
  - A look at how we got here
- General Applications of Memories in Space Systems
- Requirements and Desirements
- Example: SDRAMs
  - Radiation Failure Modes Single Events
  - Design Approaches
- Reliability Considerations
- Summary





- There once was a fledgling memory used for space
  - It started out as core memory (60's-70's)
  - Grew into magnetic tape (70's-80's)
  - And has settled into "silicon" solid state recorders or SSRs (90's and beyond)
    - While this is true for mass data storage, silicon has been used since the 70's for some memory applications such as computer programs and data buffers
    - Both volatile and non-volatile memories (NVMs) are used



Apollo Guidance Computer - 4 kB of Magnetic core r/w memory



P87-2 circa 1990 - 1<sup>st</sup> known spaceflight SSR



# Sample Single Event Upset (SEU) hiccups along the way

- An original space SEU detector, the 93L422 bit errors in space
  - TDRS-1 anomalies, for example
    - "Solved" by use of error detection and correction (EDA codes
    - Used as "gold standard" on multiple flight experiments (CRRES and MPTB)
- Single event functional interrupts SEFIs
  - Device has a functional anomaly
- Stuck bits
- Multi-bit/multi-cell upsets
- Block errors
- Small probability events
  - Proton ground test of 3 samples
  - Flight SSR had > 1000
  - Anomaly in-flight traced to low-probability event





## Categories of Memory Usage for Space

- Computer program storage
  - Boot, application, safehold
  - Often a mix of volatile and non-volatile memories
    - Store in NVM, download on boot to RAM, run out of RAM
      - Size, Weight, and Power (SWaP) RAM is faster than NVM
- Temporary data buffers
  - Accommodates burst operations
- Data Storage such as SSR
  - E.g. mass storage area for science or spacecraft telemetry
  - Usually write once an orbit, read once an orbit
  - Trend to want to use NVM for SSR
- Configuration storage for volatile Field Programmable Gate Arrays (FPGAs)
  - Becoming a *bigger* problem as FPGAs increase their needs



## **The Volatile Memory for Space**

- Rad Hard Offerings are limited to SRAM
  - 16 Mb maximum single die
  - These tend to be medium speed and relatively high power when compared to commercial equivalents
    - For comparison, first SSR in 1990 used 256 Mb commercial die
  - Still used extensively in rad hard computer offerings, but many designs have transitioned to DRAM options
- Mid-1990's = transition in SSRs from commercial SRAM to DRAM
  - SDRAM are in-flight and many current designs have begun to use dual data rate (DDR) and DDR2



## The SDRAM Quandary

- Many space designs are baselining/using DDR and DDR2 interfaces for hardware builds
  - Problem: <u>DDR3 expected to dominate commercial product</u> <u>starting in 2010!</u>
- Do we support current system designs or product development timelines?



Will DDR2 be obsolete by system readiness dates?



## **DDR Performance Metrics**

|                    | DDR                                            | DDR2                                         | DDR3                                           |
|--------------------|------------------------------------------------|----------------------------------------------|------------------------------------------------|
| Data Rate          | 200-400Mbps                                    | 400-800Mbps                                  | 800-1600Mbps                                   |
| Interface          | SSTL_2                                         | SSTL_18                                      | SSTL_15                                        |
| Source Sync        | Bidirectional<br>DQS<br>(Single ended default) | Bidirectional<br>DQS<br>(Single/Diff Option) | Bidirectional<br>DQS<br>(Differential default) |
| Burst Length       | BL= 2, 4, 8<br>(2bit prefetch)                 | BL= 4, 8<br>(4bit prefetch)                  | BL= 4, 8<br>(8bit prefetch)                    |
| CL/tRCD/tRP        | 15ns each                                      | 15ns each                                    | 12ns each                                      |
| Reset              | No                                             | No                                           | Yes                                            |
| ODT                | No                                             | Yes                                          | Yes                                            |
| Driver Calibration | No                                             | Off-Chip                                     | On-Chip with ZQ pin                            |
| Leveling           | No                                             | No                                           | Yes                                            |
|                    | 1.5V                                           | 1.25V                                        | 1.0V                                           |

## **The Non-Volatile Memory for Space**



- Rad Hard offerings are limited to small SONOS or CRAM devices
  - Used in many RH processor systems that do not have large program memory space requirement – 4-16 Mb per die maximum
- Evolution of commercial NVM in space
  - PROMS
    - Older commercial PROMS were reasonably good, but one-time programmable (OTP)
  - EPROMs
    - Used in a few systems in the 90's, but had TID issues
  - EEPROMs
    - In use from the 90's to today, despite both TID and SEE (write mode) concerns
      - SEEQ 256 Mb (now obsolete)
      - Hitachi 1 Mb (now sold through re-packaging/screening houses)
  - Flash
    - The latest "in vogue" commercial NVM due to density (32Gb die coming soon)
      - Much improved TID than older EEPROMs
      - SEFI and SEL still issues
    - Some space system primes are planning on using these in SSR applications

"In summary, the Signetics PROMs are recommended (given previous total dose studies) for usage as are the SEEQ EEPROMs during read operations. It is not recommended, pending further investigation, to use the SEEQ EEPROMs for in-flight programming. " - LaBel, Nov 1990 Test Report



## **Alternate Material NVMs**

- Alternate material NVMs evaluated as devices become available
  - Expect cell integrity to perform fairly well under irradiation on most NVMs
  - LaBel's Truism:
    - There are ALWAYS more challenges in "qualifying" a new technology device than expected
- Phase change memories (PCM)
  - Density, speed, and power look promising
    - Temperature is the challenge
  - Ex., Samsung, Numonyx initial data taken
- CNT
- MRAM
  - Spin Torque appears to improve SWaP metrics
  - Ex., Avalanche Technologies
- Resistive Memories
  - Ex., Unity Semiconductor, HP Labs
    - Unity's talking about a 64Gb device by next summer!
- NVSRAMs
  - Ex. Cypress



Numonyx PCM – Tech transfer opportunity?

#### and length of test runs 1996 Speed

The Changing World of Radiation Testing of Memories -

Comparing SEE Testing of Commercial Memories – 1996 to 2006

- SRAM memory
  - 1 um feature size

**Device under test (DUTs):** 

For use in solid state

**Commercial Memory** 

- 4 Mbits per device
- <50 MHz bus speed</li>
- Ceramic packaged DIP or LCC or QFP

recorder (SSR) applications

- 2006
  - DUT: DDR2 SDRAM
    - 90 nm feature size
    - 1 Gbit per device
    - >500 MHz bus speed
    - Plastic FcBGA or TSOP
    - Hidden registers and modes
    - Built-in microcontroller

#### Sample Issues for SEE Testing

- Size of memory
  - Drives complexity on tester side for amount of storage, real time processing,
- - Difficult to test at high-speeds reliably
    - Need low-noise and high-speed test fixture
  - Classic bit flips (memory cell) extended to include transient propagation (used to be too slow a device to respond)
  - Thermal and mechanical issues (testing in air/vacuum)
- Packaging
  - Modern devices present problems for reliable test board fixture, die access (heavy ion tests) requiring expensive facility usage or device repackaging/thinning
  - Difficulty in high-temp testing (worstcase)
- Hidden registers and modes
  - Functional interrupts driving "anomalous data"
    - Not just errors to memory cells!
- Microcontroller
  - Not just a memory ٠

#### Commercial memory testing is a lot more complex than in the old days!



## Can we test anything completely?



#### Sample Single Event Effect Test Matrix

#### full generic testing

| Amount | Item                                |  |
|--------|-------------------------------------|--|
| 3      | Number of Samples                   |  |
| 68     | Modes of Operation                  |  |
| 4      | Test Patterns                       |  |
| 3      | Frequencies of Operation            |  |
| 3      | Power Supply Voltages               |  |
| 3      | lons                                |  |
| 3      | Hours per Ion per Test Matrix Point |  |

Hours

Days

| Commercial 1 Gb SDRAM             |  |  |
|-----------------------------------|--|--|
| 68 operating modes                |  |  |
| operates to >500 MHz              |  |  |
| Vdd 1.8V external, 1.25V internal |  |  |

**7.54 Years** and this didn't include temperature variations!!!

Test planning requires much more thought in the modern age as does understanding of data collected (be wary of databases). Only so much can be done in a 12 hour beam run – application-oriented

Memory for Space – Presented by Kenneth A. LaBel, Fault Tolerant Computing, Albuquerque, NM May 25, 2010

66096

2754



## **The "Perfect" Space Memory**

- SWaP rules!
  - No power, Infinite density, Fast (sub 2 ns R/W access)
    - Oh, and Rad Hard (RH)
- Okay, so this isn't happening!
  - Speed:
    - Needs to be fast enough for burst data capture and not a bottleneck for processor interfaces
  - Power:
    - This is a trade space that includes thermal (stacking, for example)
    - NVM is good since no power consumed when not being accessed
  - Density:
    - Gb regime per die anything beyond 100 Mb is acceptable!
    - Biggest RH devices currently ~16 Mb regime
      - Note: 1<sup>st</sup> SSR used 256 Mb commercial SRAMs 20 years ago!!!
- And a personal diatribe: how many operating modes do we really need?
  - Byte/nibble and page modes
    - Erase for NVM



## **Radiation Requirements (and trends)**

- How radiation hard to we really need?
  - TID
    - >90% of NASA applications are < 100 krads-Si in piecepart requirements
      - Many commercial devices (NVM and SDRAMs) meet or come close to this.
      - Charge pump TID tolerance has improved ~ an order magnitude or more over the last 10 years
    - There are always a few programs with higher level needs and, of course, defense needs
  - SEL
    - Prefer none or rates that are considered low risk
      - Latent damage ("non-destructive" event) is a bear to deal with
    - As we're packing cells tighter and even with lower Vdd, we're seeing SEL on commercial devices regularly (<90nm)
      - Often in power conversion, I/O, or control areas
  - SEU
    - It's not the bit errors, it's the SEFIs and uncorrectable errors that are the biggest issues
      - Scrubbing concerns for risk, power, speed...



## **Reliability Considerations**

- Besides the usual CMOS concerns, memories have a few other considerations
  - Data retention
    - Long-term holding of values and/or requirement to refresh values
  - Endurance
    - Ability to read and write values N times (10<sup>5</sup> cycles is typical commercial NVM spec, for example)
  - Bit disturb (usually with Flash)
    - I.e., read/write/erase of bit A disturbs values on adjacent bit-line
  - Note: Many memories have "bad bits" to begin with that are mapped
- Now add in unique space requirements
  - >10 year mission life
  - Colder and hotter temperatures (-55 to +125C)
  - Radiation

## Welcome to the Devil's Radiation Playground SDRAM and SEE





- Errors range from
  - Soft (SEUs)
    - Single bit
    - Multi-bit (MBU). Multi-cell (MCU)
  - Destructive (SEL)
  - Disruptive (SEFI)
    - May require power cycle to restore functionality (SEFI-PC)
    - SEUs are a mere nuisance by comparison
- SEE rates depend on pattern, operating frequency, operating mode...
  - Beam Daddy estimates 7.5+ years for exhaustive test!
    - Typical allotted time: 12-24 hours
- Reality: All SDRAM tests are application specific

## NASA

## **SDRAM Applications**



Remember, it's only uncorrectable errors that are the problem

## Mitigation Can Occur at Many Levels Examples



### • Single device

- Interleaved bits
  - Physical separation of bits so that a single energetic particle doesn't upset multiple bits in the same word)
- Error correcting codes (ECC)
- Row or column redundancy
- Multiple device
  - Triple modular redundancy (TMR) voting
  - External Error Detection And Correction (EDAC)
    - Hardware, FPGA, or Software
- Block
  - Page mapping (map around "fails")
  - Ping pong buffers
  - Spare(s)

## Mitigation: What We Can and Can't Do



#### Not So Great



#### Pretty Darned Good

#### **Data Corruption** Consequences: Loss of up to all data on a single memory die **Mitigation:** 1) Triplicate voting + Error Scrubbing Overhead: 200% 2)EDAC + bit interleaving + Error Scrubbing Hamming Code—Ex., ≤1 bit; 20% overhead Reed-Solomon— $Ex., \leq 2$ nibbles; 50% overhead Bonus

Data loss also corrected for SEU, MBU, and even stuck bits

## Considerations



- Technology changes in memories engender challenges
  - Impact of new materials and manufacturing methods on radiation response and modeling
  - Increasing difficulty in die accessibility
  - Increasing operating speeds and operating modes
  - More hidden "features" and limited testability
  - Multi-level storage cells (Flash, for example)
  - Unique reliability concerns
- We need to invest to keep ahead of the curve
  - DDR3 tests now?
  - PCM
  - ST MRAM
  - Reliability on RRAM, etc...
- It's the challenges the keeps us employed!

We are always open to working with others