

## What's All this Field Programmable Gate Array (FPGA) Stuff Have to Do With Space, Anyhow?

Kenneth A. LaBel

#### ken.label@nasa.gov

#### Co-Manager, NASA Electronic Parts and Packaging (NEPP) Program Melanie Berg, MEI Technology Corp.

This work is supported by the NEPP Program

## Outline



#### FPGA Background

- What they are and how they are used
- Technologies
- Space devices
- Qualification?

#### Trading the selection and use of FPGAs

- Priorities
- Radiation
  - SEUs and SETs PAUL KALLENDER-UMEZU, TOKYO
- Example trade
- Summary

#### Faulty Chips Delay Launch of Japanese Imaging Satellite

#### MISSY FREDERICK, WASHINGTON

The Japanese government has decided to postpone the launch of the nation's next reconnaissance satellite by six months or more following the discovery of potentially defective integratecircuits in the satellite, a government official said August 26.

The Prime Minister's Cabinet Office, which is in charge of the Itakura said in an August 26 telenation's Information Gathering Satellite (IGS) program, decided Aug. 25 to postpone the launch of what would be the nation's third reconnaissance satellite in orbit after deciding it was necessary to replace a number of field programmable gate array (FPGA) chips made by Actel Corp. of Mountain View, Calif., according

to Yasuhiro Itakura, research officer at Japan's Cabinet Satellite Intelligence Center, which is part of the Cabinet Office.

The satellite, which carries an optical sensor, was to have been launched by a Japanese H-2A rocket from the Tanegashima Launch Center before the end of March 2006, but it will take about six months to replace the potentially faulty chips and test the satellite to prepare it for flight, phone interview. Some 10 chips need to re-

placed, he said. Details about when the problem was discovered were not available at the time of the interview.

Problems with Actel's earlier version of its FPCA were discovered in autumn 2003, after more than 1 million of the devices were said. "We have confidence that the

shipped to various vendors. Ken O'Neill, director of military and aerospace product mar-

keting for Actel, said after news of the defect became known, Actel supplied the Japanese government with the latest version of the company's FPGA, which the company has the option to install in place of the old version. Since then, the government has been doing reliability testing of both the old and new product, though Actel had not received official word that the company would be replacing the chips as of press time, O'Neill said.

Actel believes the new version of the FPGA should not cause any further problems, O'Neill said.

"They have been tested pretty extensively, and clearly show a very high level of reliability," O'Neill launch schedule has not been al-

reliability of the earlier version is high, but the latest version of the software does offer a higher level of reliability," he said.

FPGAs contain hundreds of thousands of programmable elements, according to O'Neill, and the defect found in the old version of the chips affected one antifuse within the design, causing it to fail. O'Neill said the chips that do fail usually do so early in the lifetime of the part.

As a supplier, Actel is not directly involved with the rebuilding process, O'Neill said. The other scheduled flight of

a radar-type satellite, which is due for launch sometime in the Japanese government's 2006 fiscal year (April 2006-March 2007), is not affected by the problem with the Actel chips, and its

launched a missile that overflew Japanese territory and landed in the Pacific. Two more satellites were slated to join the first pair in orbit in November 2003, but those satellites were destroyed when the H-2A rocket carrying

tered Itakura said

tion of 1-3 meters.

them failed.

Each of the the next informa-

tion-gathering satellites to be

launched will have the same ca-

pabilities as the original satellites

launched by an H-2A rocket in

March 2003. One type of satellite

has an optical sensor capable of

1-meter resolution, while the radar-type satellite has a resolu-

The IGS program was devel-

oped in response to an August 1998 incident when North Korea

Comments: mfrederick@space.com

Space News article on FPGA Issue on a satellite

### What is an FPGA?



- A Field Programmable Gate Array (FPGA) is a building block electronic device that consists of:
  - An array of logic modules or blocks,
  - An input/output ring, and
  - Programmable interconnects.
  - All on a CMOS silicon base.
- An FPGA may replace everything from simple logic to complex processors to application specific integrated circuit (ASIC) devices in a space system.
- The pattern for interconnecting logic modules to form circuits is called the "configuration"
  - Stored or burned in the device and often a copy in external memory



#### **FPGA-architecture**

Near-ASIC performance plus off-the-shelf availability = FPGAs

## **FPGAs in a System**



- Before FPGAs, electronic systems comprised of standard standalone off-the-shelf devices and/or custom-designed application specific integrated circuits (ASICs). In essence,
  - Standard devices are convenient for availability, but do not provide an optimal solution (power, size) for a specific problem, while,
  - ASICs provide a high-performance solution, but at a cost and schedule risk.
- FPGAs combine many of the features of both types of devices, providing reasonably high-performance while being an off-the-shelf device.
  - With the use of a suite of software design tools (discussed later), you can interconnect pre-existing of generated blocks of logic to form an operational circuit.
  - These Electronic Design Automation (EDA) tools include features for
    - Design languages (i.e., "code" development that is converted into a logic design. Examples: Verilog, VHDL)
    - Routing interconnects (within the device)
    - Timing (static or dynamic)
    - Signal Integrity analysis
    - Power estimation, and so on...
  - Some tools will even generate "extra" code for single event tolerance (I.e., triple modular redundancy)

#### Where FPGAs Fit in a Electrical System/Integrated Circuit (IC) Hierarchy



#### Increasing speed and density

### **IP is not only Internet Protocol**



- Besides the generic logic blocks, FPGAs may also include dedicated silicon structures in addition to the programmable interconnect called hard *intellectual* property or hard IP.
  - This increases device performance in that the overhead associated with the routing/interconnect technologies are relatively minimized.
  - Hard IP blocks can includes items like embedded digital signal processors (DSPs) or general processors.
- Soft IP is simply having "pre-compiled" drop-in functions that utilize Logic Blocks in the device via design software tools and routing.
- Other dedicated structures on a device may include
  - Clock distribution circuits
  - Memory blocks
  - Power-on resets
  - High-speed I/O (i.e., multi-gigabit serial links)
  - Memory interfaces, etc..

## **FPGA Technologies**



- Different manufacturers have used different approaches to the interconnect fabric.
  - A quick method of discriminating FPGA types
    - One-time programmable (OTP)
    - Reprogrammable devices (subdivided by their configuration storage technology).
- OTP devices
  - Much like a traditional Programmable Read Only Memory (PROM)
  - Traditionally have their interconnect structure "burned" in by an external piece of equipment
  - This configuration is non-volatile and not subject to being changed.
- Reprogrammable devices
  - Do not require external "burn" equipment (except EPROM technology devices) just external control/interface circuitry
  - Configuration (on chip) may or may not be non-volatile depending on their configuration storage technology
    - Non-volatile for these devices implies that configuration storage takes place on the FPGA of interest and does not need to be stored externally in case of power loss or reset.
  - Conversely, volatile devices require an external storage element with a configuration file for downloading into the device on power-up or reset (i.e., RAM-like storage of configuration internal to device)
  - Note that some reprogrammable devices can be reprogrammed "on the fly" while others may require stoppage of operations

### Example FPGA Configuration Technologies



- The method of configuration and configuration storage of a device is critical in understanding the differences in FPGA technologies
  - Each FPGA implementation technique has it's pros and cons and should be chosen based on specific system needs for performance, reliability, radiation tolerance, etc...



#### programmable memory cell



## **FPGAs for Space Systems**



- While there a multitude of commercial vendors, there are currently five known vendors that market devices specifically to the space market (not just military market).
  - Microsemi (Actel) (OTP; flash-based)
  - Aeroflex (OTP)
  - Xilinx (reprogrammable latch-based)
  - ATMEL (reprogrammable SRAM-based), and,
  - Honeywell (reprogrammable SRAM-based) part is now extinct.
- It should be noted that the Honeywell device is the only traditional radiation-hardened product of the group, but suffers from two significant flaws:
  - Small number of gates (a metric used for electrical designs), and,
  - Is available ONLY as a board-level product making it impractical to be integrated into many systems.
- The prime U.S. aerospace market share for FPGAs is dominated by Microsemi (Actel) and Xilinx
  - Atmel is a larger presence in Europe (ESA/CNES) and elsewhere
  - Aeroflex makes a "smallish" device (though pretty good from radiation concerns) and has no roadmap beyond current offering.

## **Microsemi Space FPGAs**



- Microsemi has a long history of offering radiation tolerant products specifically for the Mil/Aero market in addition to their commercial product offering
  - From the early days (OTP only)
    - RH1020: combinatorial logic only for routing
    - RH1280: combinatorial and sequential logic for routing
  - To the current
    - RTSX and RTAX families (OTP)
    - RT-ProASIC3 (flash-based reprogrammable)
  - To the next generation product
    - 65nm RT4P (flash-based reprogrammable)



## **Xilinx Space FPGAs**

- Xilinx, while providing some products to the Mil/Aero sector, is a VERY large commercial house
  - Only a small portion of their sales are Mil/Aero
- All their products for "space" are latch-based reprogrammable
  - Two products are currently available as "Space-grade"
    - Virtex-IV QV (radiation tolerant)
      - Commercial design with substrate modified to eliminate single event latchup (SEL) and slightly reduce single event upset (SEU) sensitivity
    - Virtex-5 QV (formerly known as "SIRF")
      - Additional radiation hardening by design (RHBD) of portions of the internal cells/circuits to reduce SEU sensitivity
        - » Not 100% radiation hardened



#### Considerations for Device Selection (Sample List)



- Cost
  - Procurement
  - non-recurring engineering (NRE)
  - Maintenance
  - Qualification and test
- Schedule
- System performance factors
  - Speed
  - Power
  - Volume
  - Weight
  - System function and criticality
  - Other mission constraints (example, reconfigurability)
- System Complexity
  - Secondary ICs (and all their associated challenges)
  - Software, etc...

- Design Environment and Tools
  - Existing infrastructure and heritage
- Simulation tools
- System operating factors
  - Operate-through for single events
  - Survival-through for portions of the natural environment
  - Data operation (example, 95% data coverage)
- Radiation and Reliability
  - Single Event Effects (SEE) rates
  - Lifetime (total ionizing dose , thermal, reliability,...)
  - "Upscreening"
- System Validation and Verification

#### Note:

The last two are often the most ignored!

## **Assurance and FPGAs:**



## A few open items for "space qualified"

- Do we treat them as a standard off-the-shelf device or custom? Remember, the space system designer provides the internal routing and circuit, not the manufacturer.
  - Opens questions on what qualification tests are appropriate
  - Important for all FPGAs
- Reliability test designs need to take into consideration the FPGA's design capabilities (I.e., speed, I/O, logic) and technology changes. This is a "360 degree view" of the problem.
  - Assuming the more complex NEW devices have the same failure modes as previous generation may not be adequate.
  - Important for all FPGAs, but of current import to the newer sub-90nm product developments.
- Device/packaging/workmanship for >1000 pin area array packages
  - New Xilinx and Microsemi devices have this concern.
- Radiation tests
  - Manufacturer's tests are as limited as a user's test: they can not conceivably test all applications/design challenges/ or even physics issues (angle, energy)
    - Manufacturers data needs to be carefully evaluated (known missing data points)
  - Packaging and metallization issues complicate tests for heavy ions
  - Device complexity can mask failure modes

 MIL 38535 Class Y is being developed for reliability qualification of these types of devices, but the application-specific usage is caveat emptor
 Space Qualified may have limited meaning for radiation and reliability

## Mission Priorities Drive Device Choices



- Given the same function, not every space mission will consider the SAME constraints as their priority.
- In other words,
  - Mission A may need data processing real-time and have speed of performance as their first priority,
  - Mission B may need to gather science during solar events and have radiation as their first priority,
  - Mission C may have a long lifetime and be focused on reliability and radiation lifetime, while
  - Mission D may be weight constrained and have to trade performance/reliability versus mass/power.
- Typically, the program provides the given specific priorities, some of which may be in conflict with each other.



#### Simplifying the View – A Radiation Person's Perspective



## What radiation mitigation should I use?



- Whatever will meet your requirements/constraints
  - Note that some of the space products WILL already have embedded means of mitigating radiation effects (though some less effectively than others)
- Options include (but not limited to)
  - Scrubbing
    - Refreshing of memory structures/configuration
  - Triple Modular Redundancy (TMR)
    - Voting between three copies of circuit, or,
    - Voting between three operations of a circuit
  - Device triplication
    - Voting between three copies of a device
  - Drive strength selection
    - For single event transient (SET) supression

#### Comparison of Aeroflex and Xilinx Devices – Sample Candidates for a Trade Space



| Feature                 | Aeroflex                   | Xilinx                                                                    |  |
|-------------------------|----------------------------|---------------------------------------------------------------------------|--|
| Family                  | Eclipse                    | Virtex-IV                                                                 |  |
| Process                 | 0.25um CMOS/epi            | 90nm CMOS (copper)                                                        |  |
| Technology              | ОТР                        | Reprogrammable (latch)                                                    |  |
| Sample Hard<br>IP cores | RAM                        | RAM, dual PowerPC 405,<br>DSP slices, Ethernet,<br>Rocket I/O (to 10 GHz) |  |
| Datapath<br>speed       | 150 MHz                    | >500 MHz                                                                  |  |
| Logic                   | >300K usable gates*        | >200K logic cells*                                                        |  |
| TID                     | 300 krads-SI<br>guaranteed | Commercial,<br>expect >100 krads-Si                                       |  |
| SEU                     | Moderate                   | Upsets with protons                                                       |  |
| SEL                     | Immune                     | ???                                                                       |  |

\* "Marketing" gates and cells – realistically Virtex-IV is >> bigger than the Eclipse



- Given that mission priorities vary, dealing with the SEU/SET question and system implementation vary as well.
- Some systems solutions may best be met with a simpler system implementation that may be less "powerful", but can more easily meet schedule constraints, while,
- Some systems prefer higher performance that require a much more complex system design AND validation (but will drive to a longer development/validation cycle)
  - Using the Xilinx Virtex family as a sample, we will look at the types of SEUs/SETs that can occur in a representative complex architecture FPGA

## Potential Types of Commercial Xilinx Device SEE Sensitivity

| Chip Area          | SEE Issue                                                                                                     | Possible SEU Mitigation                                                                                           |
|--------------------|---------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| Config. Memory     | Single and multiple bit errors<br>corrupting circuit operation, causing<br>bus conflicts (current creep), etc | <ul> <li>Scrubbing</li> <li>Partial reconfiguration</li> <li>Partitioned design</li> </ul>                        |
| Config. Controller | Improper device configuration can occur if hit during configuration/reconfiguration                           | <ul> <li>Partitioned design</li> <li>Multiple chip voting (Redundancy by using multiple devices)</li> </ul>       |
| CLB                | Logic hits and propagated upsets caused by transients                                                         | <ul> <li>Triple modular redundancy (TMR) (or Xilinx TMR – XTMR)</li> <li>Acceptable er or rates</li> </ul>        |
| BRAM               | Memory upsets in user area                                                                                    | TMR     Error Detection and Correction (EDAC) scrubbing                                                           |
| Half-latches       | Sensitive structure used in configuration/routing                                                             | Removal of half-latches from design                                                                               |
| POR                | SEUs on POR can cause inadvertent reboot of device                                                            | Multiple chip voting (Redundancy by using multiple devices)                                                       |
| ЮВ                 | SEUs can cause false outputs to other devices or inputs to leave                                              | <ul> <li>Leverage Immune Config. Memory cell</li> <li>Evaluate input SET propagation</li> </ul>                   |
| DCM                | Can cause clock errors the spread across clock cycles                                                         | TMR     Temporal TMR                                                                                              |
| DSP                | Hard IP that is unbardened that can<br>cause single cont functional<br>interrupts (SEFIS) or data errors      | •TMR<br>•Temporal TMR                                                                                             |
| MGT                | Gigabit transceivers. Hits in logic can<br>cause bursts or SEFIs. O/w bit errors<br>in cars stream            | TMR     Protocol re-writes                                                                                        |
| PPC                | hard IP that is unhardened. SEFIs are prime concern                                                           | TMR or software task redundancy                                                                                   |
| SEL                | Higher current condition that is potentially damaging                                                         | <ul> <li>No mitigation other than substrate addition (epi).</li> <li>Circumvention techniques possible</li> </ul> |



## **Example Scenario for a Mission**

#### Embedded image controller

- Packet processing application
- Real-time jitter control
- Long-duration object staring
- Image recognition and target tracking
- The big question in this type of application comes down to:
  - Do you need to ensure that you track every single target or do you have time for a "hiccup aka SEU" now and then?
    - Science may be able to take a hiccup
    - Weapons arena may not
  - Drives systems operability requirements

## Sample Implementing Architecture Using Xilinx Virtex-IV FX Device



Taming Embedded Multi-Core on FPGAs for Packet Processing by Bryon Moyer, Teja Technologies, Inc

http://www.fpgajournal.com/articles\_2006/20060131\_teja.htm

#### Higher reliability may drive triplicate device option w/voting

## Sample Implementing Architecture Using Aeroflex Eclipse Device





http://ams.aeroflex.com/ProductFiles/DataSheets/FPGA/RadHardEclipseFPGA.pdf

## Architectural Impact within The Xilinx **Design Flow**



- Scrubbing Mitigation:
  - An additional radiation hardened FPGA may be used to implement the scrubbing control.
  - External Non-Volatile Memory is required (with voting and correction ability) that will store configuration
- TMR Mitigation
  - Triple the I/O and the design (impact power, area, and board complexity)
  - Inserted after synthesis (irregular design flow can complicate system validation)
- Advantage:
  - Large device can implement System On a Chip and reduce complexity of general design
  - Speed

## Architectural Impact within The Aeroflex Design Flow



- Aeroflex The necessity of more FPGAs is the largest impact:
  - Extra logic for FPGA to FPGA interface communication/Synchronization is necessary
    - Interface control document!
  - Can complicate Board Design
  - Requires careful architectural decision making concerning the partitioning scheme
  - Speed can be affected
- Advantage
  - SEU/SET tolerance is built into the silicon and will not require extra mitigation at this level of the system implementation
  - System Level Validation and implementation is generally less complicated

#### **Design Methodology Flowcharts:** Aeroflex vs. Xilinx





**Aeroflex: Partioning Concerns** 

#### **Verification Flow: Aeroflex vs. Xilinx**





# System Validation and Fault Tolerance

#### General Considerations

- Failure Rate Prediction and Quantification (if possible)
- Recovery Time upon Failure/Data Loss
- Difficulty of Recovery (I.e. Reboot, Power Down, etc...)
- Difficulty of System Validation after mitigation insertion
- Is it easier to have four designers working with one chip or each with their own?



#### Comments



- This presentation has shown a simplistic view of some of the trade spaces involved with FPGA selection and use for space applications
- Frankly, good designers can almost always come up with an approach that can work
  - However, optimizing the solution space for specific parameters such as weight or power or system operability must be thoroughly considered
  - And validation is a whole other matter...