Reliability of State-of-the-Art Digital Electronics

4th Annual NEPP Electronics Technology Workshop (ETW)
Reliability of State of the Art Digital Electronics

- DfR Solution is developing algorithms for predicting service life of electronics

- Particular focus on part technologies present in next generation digital electronics

- Drivers
  - Part and package modifications are reducing useful lifetimes; not a concern for initial adopters
  - This technology will eventually migrate to longer life platforms in more severe operating environments
IC Reliability (Example) - Microprocessors

- **90nm Microprocessors**
  - 150-200 FIT over 5 years (0.11% AFR)

<table>
<thead>
<tr>
<th>Part Number</th>
<th>Description</th>
<th>Node</th>
<th>Field Return Failure Rate (FIT)</th>
</tr>
</thead>
<tbody>
<tr>
<td>MT16LSDF3264HG-10EE4</td>
<td>Micron 2GB SDRAM</td>
<td>130nm</td>
<td>689</td>
</tr>
<tr>
<td>M470L6524DU0-CB3</td>
<td>Samsung 512MB SDRAM</td>
<td>130nm</td>
<td>415</td>
</tr>
<tr>
<td>HYMD512M646BF8-J</td>
<td>Hynix 1GB DDRAM</td>
<td>130nm</td>
<td>821</td>
</tr>
<tr>
<td>MC68HC908SR12CFA</td>
<td>Freescale Microcontroller</td>
<td>90nm</td>
<td>221</td>
</tr>
<tr>
<td>RH80536GC0332MSL7EN</td>
<td>Intel 1.8GHz Pentium</td>
<td>90nm</td>
<td>144</td>
</tr>
</tbody>
</table>

- **65nm Microprocessor**
  - 422 FIT over 5 years (0.37% AFR)

<table>
<thead>
<tr>
<th>Time in Use</th>
<th>Operating Reliability Goals</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0 – 1 Year (8760 hrs)</td>
</tr>
<tr>
<td>Cumulative Percent Fail</td>
<td>0.24%</td>
</tr>
<tr>
<td>Average Failure Rate</td>
<td>274 FIT</td>
</tr>
</tbody>
</table>

3X increase in AFR with decrease in node size
Growing concern in high reliability industries that systems utilizing high performance microelectronics (≤90nm feature size) will not survive their anticipated lifetimes of 10 to 30 years

- Failure will occur because of the short lifespans of individual transistors caused by intrinsic degradation (aging)
- “It is a fallacy to say that integrated circuits will not fail because they have no moving parts. The sole reason they work is by the movement of charge carriers (electrons and holes) within them.”

"The notion that a transistor ages is a new concept for circuit designers,” … aging has traditionally been the bailiwick of engineers who guarantee the transistor will operate for 10 years or so…But as transistors are scaled down further and operated with thinner voltage margins, it’s becoming harder to make those guarantees… transistor aging is emerging as a circuit designer’s problem.

IEEE Spectrum, June 2009
Some field data indicates that each new generation of integrated circuits is beginning to wearout sooner than the last.
Challenges for IC Reliability Prediction

- Limited degree of mechanism-appropriate testing
  - Only at transition to new technology nodes
  - Mechanism-specific coupons (not real devices)
  - Test data is hidden from end-users (proprietary)

- Visible JEDEC tests are of limited value
  - Limited duration (1000 hrs) does not necessarily demonstrate wearout behavior
  - Use of simple activation energy (0.7eV) with incorrect assumption that all mechanisms are thermally activated
Multi-Mechanism Approach

- Multi-Mechanism Approach developed in response to industry limitations

- Derives an acceleration factor off of JEDEC test results
  - Different acceleration algorithms for different wearout mechanisms (EM, TDDB, HCI, NBTI)
  - Different Weibull slopes (beta) for different wearout mechanisms
  - Circuit dependent, but only at functional block level (attempt to avoid intellectual property issues)
Multi-Mechanism Approach (cont.)

- Models simultaneous degradation behaviors of multiple failure mechanisms on integrated circuit devices
- Devised from published research literature, technological publications, and accepted degradation models from:
  - NASA\JPL
  - University of Maryland
  - Semiconductor Reliability Community
Key Assumptions

- Use of JEDEC accelerated test data in lieu of device time to failure data
- Default sub-circuit designs to analyze transistor stress states within functional blocks
- Single process node selection for each device
- Basic incorporation of redundancy and/or error correction techniques used in components
Resources that Define an IC

- Limited info available on integrated circuit design and reliability
  - Can we perform a prediction with just system-level design criteria (electrical and thermal data) and component documentation such as its datasheet?

- Knowledge of degradation mechanisms
  - Transistor stress states
  - Functional group susceptibility
  - Electrical and thermal conditions

- Integrated circuit materials and complexity
  - Technology node or feature size (i.e. 90nm)
    - Corresponding material set (e.g. Si, GaAs, SiGe, GaN and SOI)
  - Functional complexity
    - Identified as functional groups within a circuit
  - Operating conditions
    - Voltages, frequencies, currents, and temperature profile
IC Wearout Approach

- Define integrated circuit organization by functional groups
- Apply predefined transistor stress factors from SPICE analysis of “assumed” functional groups
- Characterize electrical and thermal conditions of customer application
- Define test conditions for IC reliability test
- Perform transistor level extrapolation using applicable failure mechanism models
The complexity of an integrated circuit can be described as a set or multiple sets of smaller sub-circuits called functional groups.

- Each functional group can be comprised of multiple cells which are the basic building blocks of the group, i.e. SRAM bit or processor core.

- Each functional group experiences different electrical stresses which can be quantified by analyzing transistor stress states.

- Degradation mechanisms come into play under specific transistor stress states such as drain bias for HCI or gate bias for BTI.

Graphic and overlay of Intel Core i5 Processor
Analyzing Transistor Stress States

- Establish relevancy of failure mechanisms, weighting factors, and inputs into Physics-of-Failure algorithms based on
  - Quantity and location of transistors within circuit
  - Probabilistic likelihood of applied operation conditions through background simulation of each functional group
  - Future versions may allow real-time input of component design file to output weighting factors without transfer of proprietary design

Phase-locked loop (PLL)

Circuit designed using CMOS library and components build from transistor models

All transistors are analyzed against electrical criteria
Validating Prediction Methodology

- The process flow uses technology node based degradation models which provide the ability to extrapolate from an appropriate accelerated test to anticipated field conditions.
  - An appropriate functional test to stress the integrated circuit similar to its application under field conditions, but in an accelerated way.
  - Utilizing Physics-of-Failure degradation equations, acceleration factors can be applied at the transistor level for clock frequency, voltages, and temperatures.

$$AF_{EM} = \left( \frac{f_1}{f_2} \right)^n \left( \frac{V_{dd1}}{V_{dd2}} \right)^\gamma \left( \frac{E_{a*}}{k} \frac{T_1-T_2}{T_1T_2} \right)$$

$$AF_{NBTI} = \left( \frac{V_{gs1}}{V_{gs2}} \right)^\gamma \left( \frac{E_{a*}}{k} \frac{T_1-T_2}{T_1T_2} \right)$$

$$AF_{HCI} = e^{\gamma V_{ds1}-V_{ds2}} \left( \frac{E_{a*}}{k} \frac{T_1-T_2}{T_1T_2} \right)$$

$$AF_{TDDB} = \left( \frac{V_{gs1}}{V_{gs2}} \right)^{(a+bT_1)} e^{\left( c^* \frac{T_1-T_2}{T_1T_2} + d^* \frac{T_2^2-T_1^2}{T_1T_2} \right)}$$

Image of a FLASH memory bit-level write/read test to determine bit life without wear leveling.
Validation Study (130nm to 90nm)

- Field return data was gathered from a family of telecommunication products
- 56 different ICs comprised 41.5% of the failed part population
- The validation activity was utilized failure data from 5 integrated circuits

<table>
<thead>
<tr>
<th>Year</th>
<th>Description</th>
<th>Quantity Replaced</th>
</tr>
</thead>
<tbody>
<tr>
<td>2004</td>
<td>1 GB DRAM</td>
<td>190</td>
</tr>
<tr>
<td>2001</td>
<td>256MB DRAM</td>
<td>152</td>
</tr>
<tr>
<td>2005</td>
<td>512MB DRAM</td>
<td>161</td>
</tr>
<tr>
<td>2002</td>
<td>Microcontroller</td>
<td>114</td>
</tr>
<tr>
<td>2005</td>
<td>Microprocessor</td>
<td>18</td>
</tr>
</tbody>
</table>
Statistical Analysis on Field Returns

- Failure rate was calculated from raw data
  - Environmental conditions to determine in-field operating temperature
  - Thermal measurements to determine power dissipation
  - Cumulative failure distributions
    - Weibull
    - Exponential

Extracted graphs from statistical analysis of field returns
Validation Study

- Multi-mechanism approach is far more accurate than other existing methodologies and does not require any field data

<table>
<thead>
<tr>
<th>Description</th>
<th>Field Data</th>
<th>Sherlock</th>
<th>JEDEC (Ea = 0.7eV)</th>
<th>SR-332</th>
<th>MIL-HDBK-217F</th>
</tr>
</thead>
<tbody>
<tr>
<td>DRAM (256MB)</td>
<td>689</td>
<td>730</td>
<td>51</td>
<td>15</td>
<td>18</td>
</tr>
<tr>
<td>DRAM (512MB)</td>
<td>415</td>
<td>418</td>
<td>51</td>
<td>15</td>
<td>18</td>
</tr>
<tr>
<td>DRAM (1GB)</td>
<td>821</td>
<td>1012</td>
<td>51</td>
<td>15</td>
<td>18</td>
</tr>
<tr>
<td>Microcontroller</td>
<td>220</td>
<td>249</td>
<td>51</td>
<td>27</td>
<td>18</td>
</tr>
<tr>
<td>Microprocessor</td>
<td>144</td>
<td>291</td>
<td>51</td>
<td>67</td>
<td>2691</td>
</tr>
</tbody>
</table>
Next Steps: Testing at Functional Block Level

- Functional testing was performed on 45nm Silicon-on-Insulator ASIC PMOS transistor arrays to drive wearout by NBTI
  - The design of the arrays were known, therefore it was not necessary to create an “assumed” circuit for SPICE analysis
  - Transistor stress state analysis → normalized “critical constants”
  - Extrapolation using NBTI Power-Law model using our methodology predicted degradation of threshold voltage by 10% after 6 years of use under field conditions

\[ AF_{NBTI} := \left( \frac{V_{gs1}}{V_{gs2}} \right)^\gamma e^{\left( \frac{E_a}{k} \frac{T_1-T_2}{T_1T_2} \right)} \]

Acceleration Transform for NBTI
Next Steps: Testing at Component Level

- Testing of Intel Ivy Bridge i7, fabricated at the 22nm technology node
  - Overclocked to ensure operation at maximum achievable clock and bus speeds (4.4 GHz vs. 3.8 GHz)
  - Ten (10) systems

- Several iterations of stress loading to achieve maximum multi-core CPU and RAM stress without bandwidth throttling
  - Floating point and integer mathematics
  - Polynomial arithmetic
  - Markov state models
  - Game, physics and texture (i.e. fur) rendering
Testing at Component Level (cont.)

- Hot (65°C) and Cold (-25°C) Testing at System-Level
  - Temperature based on thermal diode

- CPU core voltages recorded daily

<table>
<thead>
<tr>
<th>CPU Name</th>
<th>Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU Infos</td>
<td>Ivy Bridge, 4 Cores, MMX, x86-64, SSE4.2</td>
</tr>
<tr>
<td>Motherboard</td>
<td>Gigabyte Technology Co., Ltd.: G1.Sniper 3</td>
</tr>
<tr>
<td>Current</td>
<td>4211.3 MHz</td>
</tr>
<tr>
<td>Original</td>
<td>3100.0 MHz</td>
</tr>
<tr>
<td>Overclock</td>
<td>35.8%</td>
</tr>
<tr>
<td>Bus</td>
<td>108.0 MHz</td>
</tr>
<tr>
<td></td>
<td>100.0 MHz</td>
</tr>
<tr>
<td></td>
<td>8.0%</td>
</tr>
</tbody>
</table>

- 80% of cold side systems are unstable after 5 months
  - Operational only at half factory clock speed (1.5 GHz) without stress loading
Next Steps: Developing Reliability ‘Packets’

- Suppliers perform in-house EDA of their designs. Each design results in a table row of “critical constants.”
  - Analysis would quantify each of the transistor stress states necessary to degrade the transistors by each applicable mechanism during typical use duty cycles.
- The “counts” per failure mechanism type would be normalized for the IC or functional group within the IC (standard libraries).
  - This process would replace the current process of SPICE analysis on “assumed circuits.”
- Characterize electrical and thermal conditions of customer application.
- Define test conditions for IC reliability test.
- Perform transistor level extrapolation using applicable failure mechanism models.
Conclusion

- Next generation complex integrated circuits demonstrate issues in regards to reliability performance
  - Additional issues in regards to packaging must also be addressed

- Current solution set is independent from component manufacturers
  - Comprehensive approach requires eliminating reliability performance as intellectual property
Who is DfR Solutions?

The Industry Leader in Quality-Reliability-Durability of Electronics

- Best Design Verification Tool
- 2012 Global Technology Award Winner
- 50 Fastest Growing Companies in the Electronics Industry - Inc Magazine

Key Facts
- Founded in 2005
- 30+ Employees, Multiple worldwide locations
- Software, Consulting, Research, Lab Services

Over 600 Customers
- Most Major Avionic OEMs and Suppliers