Reliability Implications Of Derating Leading Edge High Complexity Microcircuits

S. Richard Biddle
Quality and Reliability Engineering Manager
Texas Instruments Incorporated
Military Semiconductor Products Division
Sherman, Texas
(s-biddle@ti.com)

Introduction

As a result of acquisition reform initiatives, original equipment manufacturers (OEMs) of military systems were authorized to use performance-based specifications for component procurement. The industry was suddenly faced with a dilemma – To continue designing with tried and proven military grade components or accept the risk of using commercial-off-the-shelf (COTS) components in a manner historically considered unacceptable. Since then, there have been volumes of papers authored and presented on the topic of using COTS microcircuits (Integrated Circuits or ICs) in traditional military environments. In a number of applications, such as military ground based communication systems, the environment might be the same as seen in commercial applications. In these cases, COTS plastic encapsulated microcircuits could be an acceptable choice. They could be used as-is with good results providing the obsolescence issues are addressed. In other applications, such as missile systems, the use of COTS components often requires the OEM to operate the components outside the component manufacturer’s datasheet specifications.

Efforts are underway by multiple OEMs and third party research organizations to develop “cookbook” methodologies to uprate or upscreen COTS components for use beyond the manufacturer’s recommended operating conditions. Historically, military OEMs have had established methods of estimating component reliability under various operating conditions. Military OEMs reasonably understood the state-of-the-art and the devices in question were very robust with exceptionally forgiving design margins. Hermetic packaging eliminated most environmental concerns. With today’s state-of-the-art submicron technologies and package configurations, however, the derating of high complexity leading edge components such as microprocessors and digital signal processors has become much more involved.

OEM and IC Manufacturer’s Product Liability Concerns

Manufacturers of military systems and components might, in certain cases, be shielded from product liability issues through the system of government approved military specifications. If design, procurement, manufacturing, and qualification are performed in accordance with these specifications, the government, not the manufacturer, accepts the risk. If the OEM elects to use non-military components, then that OEM could assume the product liability risks. This liability could also extend to the component manufacturer supporting the use of non-military grade devices in these applications.

A large number of major semiconductor manufacturers have elected to shield themselves from this potential liability. These companies typically have disclaimers similar to that published by Texas Instruments:

Plastic encapsulated TI semiconductor devices are not designed and are not warranted to be suitable for use in some military applications and/or military environments. Use of plastic encapsulated TI semiconductor devices in military applications and/or military environments in lieu of hermetically sealed ceramic devices is understood to be fully at the risk of the Buyer.
In some cases, semiconductor suppliers have elected to further shield themselves by not selling commercial components directly to military original equipment manufacturers. In addition, these suppliers are specifically not providing any information to military original equipment manufacturers that might be construed as facilitating the use of components outside published specifications. In cases where this information is provided, it is usually accompanied by a disclaimer similar to that used by TI for information released to commercial OEMs:

*Quality and reliability data provided by Texas Instruments is intended to be an estimate of product performance based upon history only. It does not imply that any performance levels reflected in such data can be met if the product is operated outside the conditions expressly stated in the latest published data sheet for a device.*

**Integrated Circuit Reliability In End Applications - External Factors**

Operating life, also referred to as mean-time-between-failures, (MTBF), of an integrated circuit is affected by multiple factors. The operating environment affects package-related reliability due to factors such as moisture induced corrosion or thermomechanical stress. Materials used in the encapsulation of microcircuits, such as mold compounds, might not be suitable for use at extended temperatures. End users often make the mistake of assuming that life test results alone are sufficient to estimate field reliability. Overall, most high complexity microcircuits exhibit aggregate life test failure rates of significantly less than 50 FIT (Failures In Time or failures per billion device hours) when calculated to a derated operating temperature of 55°C using a 60% confidence level. This would lead the user to assume a very long device life. However, the conditions found during life test, that is, constant high temperature, low or no ambient humidity, and continuous device operation, do not address humidity induced failure mechanisms or temperature cycling induced failure mechanisms which are found in non-benign environments. These are addressed by Highly Accelerated Stress Testing (HAST) or Biased Humidity Testing (85°C/85%RH), Autoclave, and Temperature Cycling/Thermal Shock. Therefore, life test data must be supplemented with other environmental test data in determining how a plastic encapsulated device will function in a given environment.

To further complicate matters, life test itself is typically performed at 125°C/1,000-hour equivalent under nominal operating conditions rather than worst case recommended operating conditions. Results of life test are therefore more representative of typical use of a device with environmental conditions excluded and not continuous operation at maximum conditions. For purposes of this discussion, "maximum recommended operating conditions" is the term used to define the parameters for a given device under which that device is warranted to operate. "Absolute maximum ratings" are the conditions to which a device can be exposed for a short period without damage. Continued use of a device at absolute maximum ratings typically will damage the device.

**Package Materials Considerations**

One item often overlooked when attempting to uprate or upscreen plastic encapsulated microcircuits is the physical properties of the encapsulation (or mold compound) material itself. The type of encapsulant used is dictated by multiple factors. These include the flow properties during the mold process, cure characteristics, thermomechanical stress of the die, high temperature performance and moisture performance. Tradeoffs must be considered since a single encapsulant cannot meet all requirements. In some cases a mold compound is selected for optimal performance and only the commercial or industrial temperature ranges are considered.

Encapsulants consist of multiple components including resins, fillers, flame-retardants, and mold release agents. Of particular interest with respect to device reliability are the flame-retardants.
Brominated epoxies will release bromine when heated. Bromine has the affect of accelerating intermetallic formation between the gold bond wires and the aluminum die bond pads (Kirkendall voiding or “purple plague”). Texas Instruments performed a study of Au/Al Intermetallic life versus various package encapsulants used for high pin count packages. The results are shown in the following table.

<table>
<thead>
<tr>
<th>Package</th>
<th>Mold Compound</th>
<th>Chemistry</th>
<th>150°C</th>
<th>140°C</th>
<th>135°C</th>
<th>115°C</th>
<th>105°C</th>
<th>Activation Energy</th>
</tr>
</thead>
<tbody>
<tr>
<td>PQFP</td>
<td>A</td>
<td>Multi-Func</td>
<td>&gt; 7.1</td>
<td>&gt; 25</td>
<td>&gt; 190</td>
<td>&gt; 783</td>
<td>&gt; 3525</td>
<td>&gt; 1.90 eV</td>
</tr>
<tr>
<td>TQFP</td>
<td>B</td>
<td>Biphenyl</td>
<td>0.3</td>
<td>0.5</td>
<td>1.7</td>
<td>4.1</td>
<td>9.5</td>
<td>1.12 eV</td>
</tr>
<tr>
<td></td>
<td>C</td>
<td>Biphenyl</td>
<td>0.4</td>
<td>0.8</td>
<td>2.5</td>
<td>5.6</td>
<td>12.7</td>
<td>1.04 eV</td>
</tr>
<tr>
<td></td>
<td>D</td>
<td>Biphenyl</td>
<td>0.6</td>
<td>1.2</td>
<td>3.7</td>
<td>8</td>
<td>18.4</td>
<td>1.04 eV</td>
</tr>
<tr>
<td>PBGA</td>
<td>E</td>
<td>Multi-Func</td>
<td>5.2</td>
<td>16.5</td>
<td>104</td>
<td>407</td>
<td>1426</td>
<td>1.74 eV</td>
</tr>
</tbody>
</table>

As can be seen from this study, encapsulant chemistry has a very direct affect on component life. While all of these encapsulants yield acceptable performance over the commercial temperature range, some exhibit a very high failure rate when exposed to higher temperatures. Some encapsulants have a glass transition temperature \( T_G \) well under 125 °C. Exposure to temperatures in excess of the specified glass transition temperature accelerates the release of bromine that in turn accelerates intermetallic formation. Note that in actual operation, the junction temperature of the die must be considered in addition to the ambient or case temperature as the encapsulant is in contact with the die surface.

### Integrated Circuit Die Level Failure Mechanisms

If the environmental failure mechanisms are ignored, then the discussion can focus on microcircuit die wear-out mechanisms. Assuming no extrinsic reliability defects, the failure mechanisms are typically gate-oxide wear-out and electromigration. The actual MTBF varies among device families and sometimes from one device type to another in the same family. This is due to both the technology/process node, defined in part by feature size, oxide thickness and core operating voltage, the actual design of the device, including current density, and the wafer fabrication process. Failures due to wear-out might or might not be catastrophic with respect to the application. In cases involving oxide wear-out, the failure is relatively sudden as the oxide breaks down. For electromigration, the device might fail suddenly if a metal line opens. More typically the failure is due to resistance increases in contacts or vias and the device might exhibit a more gradual failure. In this case the internal circuitry of the device subjected to worse case stress would fail to source or sink enough current to maintain internal device AC timing. This mechanism could exhibit itself as an intermittent problem in the system due to timing glitches as the device timing slowly degrades until the device no longer meets the AC timing required to operate in the system. It is important to note that current generations of leading edge components such as digital signal processors have much tighter timing requirements than past generations. Therefore these devices are much more susceptible to internal timing degradation or arbitrary derating of AC parameters.

### Integrated Circuit Die Level Design For Reliability

IC manufacturers design devices in accordance with a formal set of design rules. These design rules are computer verified at numerous points during die design, layout, and photomask generation. While the specifics of these design rules are trade secret, some generalities are applicable to any manufacturer.

A design model is developed for a specific technology node. A device reliability model is developed that is driven by predetermined reliability goals. These goals are typically set to meet the expectations of the commercial end customers. This model defines a specific set of
maximum design conditions required to meet the reliability goals. The various components of the model, such as gate-oxide wear-out or electromigration factors, are determined empirically.

A reliability model for electromigration, for example, would be developed for a specific metal system. The model would specify a maximum current density at a given junction temperature in the various metal lines, vias, and contacts. Adherence to the model results in an acceptable wear-out failure rate over the life of the product. The model would be used in conjunction with other models to establish limits for the layout of the device with respect to current density. As previously stated, the actual factors used in these models are determined based upon the technology node and are proprietary.

If the device is designed well inside the model limits, then the overall product life might reasonably be expected to meet or exceed the model when derated for use beyond the IC manufacturer’s recommended operating conditions. If the design is near the limits of the model, then the device would be expected to only meet worst case model reliability when used within the manufacturer’s recommended operating conditions. An example of such a design would be where the device is designed with the smallest geometry possible to maximize speed. Arbitrary derating of such a device will result in extreme degradation of operating life.

**Example Calculations for Electromigration Current Density Reliability**

For the purposes of creating an example, assume an electromigration reliability goal of less than a 0.5% or 1.0% cumulative failures during 10 power-on-years at a maximum junction temperature of 105 °C. A maximum current density specification, \( J_D \) is established for the metal system in question. For this example, a maximum current density of \( 5.0 \times 10^5 \) A/cm\(^2\) is used. To simplify calculations only DC current density in metal lines will be considered.

Current density is calculated from the following equation:

\[
J_D = J_S \times \left[ \frac{t_{50} \times e^\left(\frac{E_A}{RT}\right)}{t_{pl} \times e^\left(-\frac{E_A}{RT}\right)} \right]^{\frac{1}{N}}
\]

Where:
- \( J_D \) = Design Current Density (A/cm\(^2\))
- \( J_S \) = Stress Current Density (A/cm\(^2\))
- \( S \) = Log-normal Shape Parm
- \( k \) = Boltzman's Constant eV/°K
- \( T_D \) = Junction Temperature (°K)
- \( E_A \) = Activation Energy (eV)
- \( T_S \) = Stress Temperature (°K)
- \( N \) = Exponent in MTF Equation
- \( t_{pl} \) = Product Lifetime (Hours)
- \( Z_p \) = -2.23 for 1.0%
- \( t_{50} \) = Median Time To Fail (Hours)
- \( Z_p \) = -2.58 for 0.5%

The Median Time To Failure, \( t_{50} \), has been observed to fit the following empirical calculation:

\[
t_{50} = K \times J^{-N} \times e^\left(\frac{-E_A}{KT}\right)
\]

Where:
- \( K \) = Proportional Constant
- \( E_A \) = Activation Energy
- \( J \) = Current Density
- \( K \) = Boltzman's Constant
- \( T \) = Stress Temperature

Substituting empirically derived values for the variables of interest generates the following failure rate table:
## Product Lifetime Vs Temperature and DC Current Density

<table>
<thead>
<tr>
<th>Junction Temperature °C</th>
<th>Current Density A/cm²</th>
<th>Failure Rate In Years</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>50 FIT 0.5 % fail</td>
</tr>
<tr>
<td>85</td>
<td>5.0 x 10⁵</td>
<td>47.0</td>
</tr>
<tr>
<td>95</td>
<td>5.0 x 10⁵</td>
<td>21.3</td>
</tr>
<tr>
<td>105</td>
<td>5.0 x 10⁵</td>
<td>* 10.0 *</td>
</tr>
<tr>
<td>115</td>
<td>5.0 x 10⁵</td>
<td>4.9</td>
</tr>
<tr>
<td>125</td>
<td>5.0 x 10⁵</td>
<td>2.5</td>
</tr>
<tr>
<td>135</td>
<td>5.0 x 10⁵</td>
<td>1.3</td>
</tr>
<tr>
<td>145</td>
<td>5.0 x 10⁵</td>
<td>0.7</td>
</tr>
<tr>
<td>85</td>
<td>2.0 x 10⁵</td>
<td>293.5</td>
</tr>
<tr>
<td>95</td>
<td>2.0 x 10⁵</td>
<td>132.8</td>
</tr>
<tr>
<td>105</td>
<td>2.0 x 10⁵</td>
<td>62.7</td>
</tr>
<tr>
<td>115</td>
<td>2.0 x 10⁵</td>
<td>30.8</td>
</tr>
<tr>
<td>125</td>
<td>2.0 x 10⁵</td>
<td>15.6</td>
</tr>
<tr>
<td>135</td>
<td>2.0 x 10⁵</td>
<td>8.2</td>
</tr>
<tr>
<td>145</td>
<td>2.0 x 10⁵</td>
<td>4.5</td>
</tr>
</tbody>
</table>

Note that for the example shown, a 30°C increase of junction temperature would result in a drastic reduction of product life. For a design typical of low-complexity mature products with much more margin, the same increase in temperature might not be an issue.

### Application Integrated Circuit Die Level Reliability

In actual applications, the silicon life expectancy of a device is affected by a number of conditions. DC power, AC power, duty cycle, and junction temperature come into effect. Use of the device above or below data sheet recommended operating conditions will affect MTBF. The Arrhenius Model is often used for thermal derating calculations and the McPherson and/or Berkley Models for gate oxide derating out calculations. But neither method is accurate for derating complex devices without a thorough understanding of the device design and process node. The fact that a device will function at temperatures above those specified in the manufacturer's data sheet does not mean that the device will function reliably.

For example, operating a typical high complexity integrated circuit such as a microprocessor or digital signal processor at lower speeds at a given temperature will decrease power consumption thereby reducing electromigration. Due to the complexity of these types of devices, very detailed information regarding the device design is needed to establish the derating factors. For example, reduction of operating frequency or power consumption does not significantly affect gate-oxide wear-out since this mechanism is related to gate-oxide thickness and the field potential across the oxide. As previously noted, however, most IC manufacturers do not release design details or derating factors so it is not possible for the end user to accurately determine reliable operating conditions outside of datasheet limits or calculate expected MTBF under reduced operating conditions. If the application has frequent power-up and power-down cycles, tertiary effects such as cumulative damage by supply line transients must also be considered.

### Conclusion and Summary

In summary, ensuring microcircuit component reliability is a multifaceted discipline that requires involvement of diverse technologists. Factors affecting leading-edge component reliability extend well beyond the tried and proven approaches used by military OEMs in the past. Proper derating of a highly complex device is beyond the scope of most end users as the semiconductor manufacturers deem the information necessary to do so a trade secret. If a user elects to operate a device beyond the device manufacturer’s datasheet conditions, product reliability will
be degraded, possibly to the point where end system reliability no longer meets acceptable limits. Furthermore, depending on the failure mechanisms involved, system operation might become erratic due to gradual deterioration of a component prior to total loss of functionality.

Due to the multiple variations in field operating conditions, a component manufacturer can only base estimates of product life on models and the results of package and die level qualification. The end users must evaluate their application of the device and determine if a candidate device is suitable for use in that application. End users are advised to operate devices within the manufacturer's datasheet recommended operating conditions and select a device designed for the end-system environment.