# Testing the Tester: Lessons Learned During the Testing of a State-of-the-Art Commercial 14nm Processor Under Proton Irradiation

C. M. Szabo, A. R. Duncan, K. A. LaBel

Abstract— Testing of an Intel 14nm desktop processor was conducted under proton irradiation. We share lessons learned, demonstrating that complex devices beget further complex challenges requiring practical and theoretical investigative expertise to solve.

Index Terms—lessons learned, test methods, proton facilities, single event effects (SEE), state-of-the-art (SOTA), commercial, microprocessor

#### I. INTRODUCTION

The purpose of this paper is to present lessons learned as a result of troubleshooting experiences attained during and after proton irradiation testing of an Intel "Skylake" family i5-6600K desktop processor. Factored here are a) test data from two proton facility visits, b) an interesting hard failure condition that was observed, and, c) the events that influenced the overall testing approach. Outcome of an analysis of these factors has enabled us to convey how the single event effect (SEE) testing of complex, commercial devices, like the i5-6600K, may beget further complex challenges. Finally, with the modern product cycle moving so fast, how feasible is it for a flight project to select one of these parts?

## II. BACKGROUND

The NASA Electronic Parts and Packaging (NEPP) Program provides guidance to NASA regarding the selection and application of electronics technologies [1]. To derive and compile such guidance, NEPP remains involved in numerous research activities to understand the risks of using these electronic parts in the space environment, including the potential impact of radiation. Often, the program seeks to collaborate on these pursuits with mutually interested parties spanning government agencies, industry, and academia.

In late 2014, the space community at-large was impacted by the closure of the Indiana University Cyclotron Facility. The closure forced a sudden demand for beam access, while, unfortunately, little information existed about the supply and availability of substitute facilities. To the betterment of the community, a team of government and industry subject matter experts assembled to determine what options were available, targeting existing facilities and locating opportunities at medical proton facilities [2].

With a need established to visit candidate proton facility

sites, the NEPP program invited us to participate in their shakedown test visits. Coincidentally, with a need for both beam time and general expertise with processor devices, the investigators of this study were eager to join them. The benefits were obvious: our general experiences using these facilities would positively impact us and the proton facility subject matter team with enhanced knowledge. Meanwhile, our involvement with SEE testing would help satisfy the space community's expanding interest in complex processor devices.

Motivation for this work is also drawn from the study of historic SEE testing performed on previous commercial microprocessor devices [3-7]. Going back to the 80386 (1992), 80486 (1996), Pentium II (2002), we have taken note of their experience and have adapted our testing approaches as appropriate. Recent experimental data [8-10] (post-2012) has yielded promising results in total ionizing dose (TID) response and improved single event upset (SEU) device cross section results. We continue to investigate these parts to ascertain whether or not this trend continues.

## III. DEVICE UNDER TEST



Fig. 1. Skylake i5-6600K Desktop Processor, socketed, revealing the topside markings (left) and un-socketed, underside view, (right).

The DUT (Device Under Test) is a commercial desktop microprocessor manufactured by the Intel Corporation that was released to consumers in the third quarter of 2015. The device part number is BX80662I56600K, otherwise known as the Core<sup>TM</sup> i5-6600K [11], a member of the "Skylake" family of processors. This is a 3.5 GHz quad-core, single threaded (one thread per core) processor with 6 megabytes of shared cache and integrated GPU (Graphical Processing Unit). This device has a TDP (Thermal Design Power) of 91 Watt and maximum core operating temperature of 100 degrees Celsius.

Worth noting is its Turbo Boost [12] function, which enables computation in excess of the processor's specified base frequency of 3.5 GHz. Table 1 summarizes these ratings [13]:

 $\label{eq:table I} Table \ I$  Base and maximum frequency specifications for 15-6600K

| # Cores | Base    | Turbo Boost |
|---------|---------|-------------|
| 4       | 3.5 GHz | 3.6 GHz     |
| 2       | 3.5 GHz | 3.8 GHz     |
| 1       | 3.5 GHz | 3.9 GHz     |

## IV. THE TEST SETUP

In order to perform irradiation testing of the DUT, we utilized a test platform based on a previous investigation [9] into TID (Total Ionizing Dose) response of the 5<sup>th</sup> Generation Intel "Broadwell" i3-5005u SoC (System-on-a-Chip), a processor aimed at laptop and portable platforms. The hardware and software, listed below, have thus been adapted to support a desktop platform chosen for this effort.

## Hardware:

- i5-6600K "Skylake" Processor
- ASUS Z170M-PLUS Motherboard
- Socket 1151 Processor Heat Sink with Fan
- Corsair Vengeance VPX 2133 DDR4 (Dual Data Rate version 4) RAM (Random Access Memory)
- Samsung 850 Pro 512 Gigabyte SSD (Solid State Disk)
- Corsair RM750/RM750i 750 Watt Power Supply
- Portable 1920x1080 Digital Display
- Generic Universal Serial Bus (USB) Keyboard and Mouse
- KVM (Keyboard, Video and Mouse) over

## Ethernet Extender

- o Later replaced by separate USB over Ethernet extender, and
- o HDMI (High-Definition Multimedia Interface) over Ethernet extender
- Category 5/6/7 Ethernet Cabling (up to 150 feet)
- 120 Volt Electrical Extension Cords (50 and 100 feet)

# **Software:**

- Microsoft Windows Server 2012R2 [14] 64-bit OS (Operating System)
- HWiNFO64 [15] hardware reporting and monitoring tool
- Intel Optimized Linpack Library [16] executable binaries
- FurMark [17] graphical benchmark and stress tool
- Splinterware System Scheduler [18] automation tool
- Sysinternals PsTools [19]

The software installation, while unchanged from the previous study, was enhanced in the following manners: a) batch file controls were rewritten to improve test mode execution and clean-up, b) data logging sample rate via HWiNFO64 was increased, c) the Prime95 [20] benchmark tool was added, and d) all requisite software drivers for the motherboard features were installed. The power management setting within the OS remained set to "Balanced" and the sleep and screen-blanking timeouts remained set to "never".



Fig. 2. Test setup hardware

The minimum componentry required for a functioning computer (the microprocessor DUT, heat sink, fan, motherboard, RAM, power supply, and SSD) were assembled on a fixture to be operated in-situ. User inputs and control, and video output, were provided outside of the test chamber by way of Ethernet-based extenders. When the DUT was powered on, the system was booted into the Microsoft Windows operating system. Once the desktop interface was accessible, the operators could set into motion any of the following test modes indicated in Table 2:

TABLE II
TEST MODES AND SUPPORTING SOFTWARE

| Mode                                                                   | Active Software               |   |   |   |  |  |  |
|------------------------------------------------------------------------|-------------------------------|---|---|---|--|--|--|
|                                                                        | HWiNFO64 LINPACK FurMark Prin |   |   |   |  |  |  |
| Idle                                                                   | X                             |   |   |   |  |  |  |
| Math                                                                   | X                             | X |   |   |  |  |  |
| Full                                                                   | X                             | X | X |   |  |  |  |
| FurPrime*                                                              | X                             |   | X | X |  |  |  |
| FurMark*                                                               | X                             |   | X |   |  |  |  |
| (*) indicates modes that were implemented later into the investigation |                               |   |   |   |  |  |  |

The test fixture was aligned to allow the beam line to deliver protons at normal incidence to the underside (bottom) of the motherboard at the location of the DUT socket shown in Fig. 4. Once a test mode was entered, the operators exposed the DUT and recorded whether the system remained operational throughout the exposure, or encountered an upset condition (sudden reset, OS Crash, or unknown biased state). Operating parameters were logged from within the OS, and later, from the RM750i power supply, itself.

# V. FACILITIES UTILIZED

Initial test results were gathered at the TRIUMF [21] facility, located in Vancouver, BC, Canada, during November 2-3, 2015. The 105 MeV beam line was used for all test runs. Follow-up results were gathered at the Francis H. Burr Proton Therapy Center, Massachusetts General Hospital (MGH) [22], located in Boston, MA, USA, during the weekend of October 15-16, 2016. We used a 200 MeV beam line configuration for all test runs. Please note the nomenclature that will be used hereafter: "TRIUMF Board" refers to the motherboard that was first operated by our team at the TRIUMF facility, and "MGH Board" refers to the motherboard first operated at the MGH facility.





Fig. 3. Test setup, as installed at the TRIUMF test facility 105~MeV beam line (top), and at the MGH facility 200~MeV beam line (bottom).



Fig. 4. Intended beam exposure target at backside of motherboard, at the underside of the DUT. Approximate beam area of 1 square inch, indicated by orange box, was requested at each facility.

#### VI. ANALYSIS OF TEST RESULTS AT TRIUMF

During our visit to TRIUMF, our team was able to conduct a limited amount of test runs on a candidate DUT. Of the 32 total runs, the first 26 were conducted with the aforementioned idle, math, and full test modes shown in Table 2. The final 6 test runs deviated from the aforementioned modes, with the addition of an ad-hoc execution of Windows Memory Diagnostic Tool [23] (2 runs), and a modified "full" testing mode, which executed solely on a single processing core. Testing concluded when a hard failure was observed on our device.

TABLE III
SUMMARY OF TEST MODES AND RESULTS AT TRIUMF

| Mode     | Processing C  | Cores Enabled | Upset     |
|----------|---------------|---------------|-----------|
|          | 1 core 4 core |               |           |
| Idle     |               | 5             | 3         |
| Math     |               | 9             | 5         |
| Full     |               | 12            | 12        |
| Memtest* |               | 2             | 2         |
| Full     | 4             |               | 4         |
|          | 32 total      | test runs     | 26 upsets |



Fig. 5. Event cross sections for tests performed on the DUT at TRIUMF, as represented by Duncan, et al. [10].

Being unable to revive the DUT at the facility, it was decided that the DUT may anneal after a few days and become operational upon after return shipment to our test lab. However, upon receipt of the DUT, it remained inoperable in the original test platform. Moreover, the DUT would not operate whether mounted to the originally irradiated motherboard or to a fresh, unused motherboard.

After some troubleshooting, we were able to make the test setup become operational once again. Table 4 depicts our revival measures:

 $\label{thm:table_IV} Table\ IV$  Troubleshooting conditions for DUT hard failure

| DUT | Mother<br>board | PCIe<br>VGA | Result                      | DUT | Mother<br>board | PCIe<br>VGA | Result                      |
|-----|-----------------|-------------|-----------------------------|-----|-----------------|-------------|-----------------------------|
| I   | I               | No          | Power,<br>no BIOS<br>Screen | N   | I               | Yes         | Power +<br>BIOS<br>Screen   |
| I   | I               | Yes         | Power +<br>BIOS<br>Screen   | I   | N               | No          | Power, no<br>BIOS<br>Screen |
| N   | I               | No          | Power +<br>BIOS<br>Screen   | I   | N               | Yes         | Power +<br>BIOS<br>Screen   |

For DUT and Motherboard, I=Irradiated Device; N=Non-irradiated Device;  $All\ conditions\ responded\ to\ power\ switch$ 

With the addition of a discrete PCIe (Peripheral Component Interconnect Express) graphics adapter, we were able to interact with the test setup. This indicated, at least, that the DUT's integrated GPU may have been irreparably harmed during proton irradiation.

#### VII. ANALYSIS OF TEST RESULTS AT MGH

After a period of almost one year, follow-up testing took place at MGH. Having obtained more testing time, we sought to evaluate our fresh DUT and motherboard with an enhanced set of testing conditions. The previously used test suite would remain unchanged: "idle", "math", and "full" with all four processing cores active. For this visit, however, we included new test modes: "FurPrime", a combination of "Prime95" + FurMark stress, and, "Fur", with the intent to observe GPU-only stress behavior. These modes were repeated with the DUT performing computations on one, two, and all four processing cores, as shown in Table 5:

TABLE V
SUMMARY OF FIRST DAY TESTING AT MGH

| Mode                       | Processor Cores Enabled |   |   |  |  |  |  |
|----------------------------|-------------------------|---|---|--|--|--|--|
|                            | 1 core 2 core 4 core    |   |   |  |  |  |  |
| Idle                       | X                       | X | X |  |  |  |  |
| Math                       | X                       | X | X |  |  |  |  |
| Full                       | X                       | X | X |  |  |  |  |
| FurPrime                   | X                       |   | X |  |  |  |  |
| Fur                        | X                       |   | X |  |  |  |  |
| x = 5 test runs (65 total) |                         |   |   |  |  |  |  |



Fig. 6. Upset cross sections for tests performed on DUT at MGH.

Though intending to recreate the conditions that lead to failure at TRIUMF, we were unable to reproduce a single hard failure incident after 65 test runs. Stunned by this outcome, we returned the following day to exercise the DUT on the motherboard that had been involved in the original hard failure incident. As Table 6 reveals, we proceeded to cause not one, not two, but three DUTs to fail – quickly.

 $\label{eq:table VI} \textbf{SUMMARY OF SECOND DAY TESTING AT MGH}$ 

| Run | DUT | Test<br>Mode | # Cores | Beam<br>Time (s) | Hard<br>Failure |
|-----|-----|--------------|---------|------------------|-----------------|
| 1   | 1   | full         | 1       | 0.51             |                 |
| 2   | 2   | full         | 1       | 9.1              |                 |
| 3   | 2   | full         | 1       | 5.47             | X               |
| 4   | 1   | full         | 1       | 6.07             | X               |
| 5   | 3   | full         | 1       | 1.95             |                 |
| 6   | 3   | full         | 1       | 1.09             |                 |
| 7   | 3   | full         | 1       | 3.4              | X               |

## VIII. HARD FAILURE EVENT REVISITED

Upon completion of our testing at MGH, data analysis revealed peculiar differences in the behavior of our devices among different motherboards. The following set of figures (Figs. 7-11). will serve to illustrate these findings. For clarity, each figure is truncated to an elapsed time 37 seconds, matching the amount of parametric logging that was collected prior to the failure of DUT1, which was Run #4 from Table 6.



Fig. 7. Comparison of power supply current readings recorded during execution of the "Full" test condition at MGH. Peak current on the TRIUMF Board corresponds to 102 Watt.



Fig. 8. Voltage Identifier (VID) levels requested by processor and GPU cores during test runs compared in Fig. 7.



Fig. 9. The actual voltage levels being provided by the motherboard to the DUT, showing widely divergent behavior between data sets.



Fig. 10. Processor core digital thermal sensor (DTS) temperature readings during the test runs shown in Fig. 6.



Fig. 11. Operating frequencies observed during "Full" test conditions.

One difference was glaring: the motherboards, having been acquired months apart from one another, featured different BIOS revisions. (Recall, our TRIUMF visit occurred in November of 2015, and the MGH visit did not occur until October of 2016). With the settings normalized between motherboards, the older BIOS on the "TRIUMF Board" appears to have driven the DUT with aggressive parameters, enabling high power consumption, high operating frequency, and elevated thermal emission. The MGH board, by contrast, treated the DUT rather conservatively.

While Fig. 7 certainly appears to depict what looks like a single event latch-up (SEL) situation, we were unable to reproduce the failure condition in the absence of proton irradiation. At best, repeat measurements, sans radiation, continue to reproduce the continue to exhibit the same experimentally-observed effects (Fig. 12). During this test, the "Control board" featured the latest available BIOS revision, 3402, available as of this writing, while the "TRIUMF board" remained at version 0219 and the "MGH board" remained at revision level 2002.



Fig. 12. Revisited in September, 2017: Comparison of power supply current readings recorded during execution of the "Full" test condition. Peak current on the TRIUMF Board corresponds to 120 Watt, further in excess of the device's 91 Watt TDP. Peak power on the Control board, by contrast, is only 57 Watt.

Thus far, we can surmise that our findings indicate a role in which proton irradiation can contribute device failure, but not without the DUT operating in excess of its rated TDP.

Further stressing at an extreme level – still – could not induce another hard failure while using the TRIUMF motherboard. To emphasize "extreme", an insufficient (35 Watt) heat sink was affixed to the DUT and the "Full" test condition was modified to run infinitely, for a period of ~18 hours. Figure 13 reveals no surprises in the distribution of current levels. Despite this, no system instability was apparent, given the Linpack result. All runs passed, and the integrated graphics continued to produce output as expected.



Fig. 13. Extreme duration "Full" test condition of TRIUMF Board and fresh DUT showing current levels drawn from +12V rail. Peak power reached 135 Watt.



Fig. 14. Voltage Identifier (VID) levels requested by processor and GPU cores during extreme test.



Fig. 15. The actual voltage levels being provided by the motherboard to the DUT.



Fig. 16. Extreme duration "Full" test condition of TRIUMF Board and fresh DUT showing current levels drawn from +12V rail.



Fig. 17. Operating frequencies during the extreme "Full" test.

Upon conclusion of the "extreme" test, instability would not appear until the motherboard BIOS was set to enable all processing cores, and the system was forced to solve the Linpack problem with FurMark running full-time, Linpack results began to go awry, as shown in Table 7.

TABLE VII
IMPACT OF BIOS SETTINGS ON LINPACK SOLUTION DURING FULL TEST
SIMULATION UNDER NON-IRRADIATED CONDITIONS

| Time(s) | GFlops              | Residual | Residual(norm) |
|---------|---------------------|----------|----------------|
| 115.362 | 156.0461            | 6.53E-10 | 2.58E-02       |
| 115.662 | 155.6413            | 6.30E-10 | 2.48E-02       |
| 115.551 | 155.7915            | 5.05E-10 | 1.99E-02       |
| 115.509 | 155.8474            | 5.68E-10 | 2.24E-02       |
| 115.521 | 155.8315            | 7.03E-10 | 2.77E-02       |
| 115.682 | 155.6147            | 5.05E-10 | 1.99E-02       |
| 115.432 | 155.9512            | 5.05E-10 | 1.99E-02       |
| 115.549 | 155.793             | 7.03E-10 | 2.77E-02       |
| 115.777 | 155.4865            | 6.53E-10 | 2.58E-02       |
| 115.503 | 155.8555            | 5.05E-10 | 1.99E-02       |
| 115.764 | 155.504             | 5.89E-10 | 2.32E-02       |
| 115.662 | 52 155.641 7.03E-10 |          | 2.77E-02       |
| 115.755 | 155.5158            | 5.05E-10 | 1.99E-02       |
| 115.628 | 155.6867            | 6.53E-10 | 2.58E-02       |
| 115.727 | 155.5545            | 5.05E-10 | 1.99E-02       |
| 115.689 | 155.6054            | 5.05E-10 | 1.99E-02       |
| 115.555 | 155.785             | 5.05E-10 | 1.99E-02       |
| 115.668 | 155.6335            | 5.05E-10 | 1.99E-02       |
| 115.563 | 155.7752            | 5.05E-10 | 1.99E-02       |
| 115.749 | 155.525             | 6.30E-10 | 2.48E-02       |

Analysis of the older data acquired at TRIUMF revealed what may have been well out-of-specification voltage levels for Input/Output (VCCIO) and System Agent (VCCSA). Nominally, they should be left at 1.05V and 0.95V, respectively. This was later confirmed to be a flaw in the version of HWiNFO software that was used in 2015. As of September, 2017, the value was consistent and correct during all revisited testing.



Fig. 9. VCCIO and VCCSA readings as set on TRIUMF Board and fresh DUT.

Finally, despite the Vcore values being nearly 40% above the VID level being requested, these values remained within specification

#### IX. LESSONS LEARNED

# A. Newly Released Commercial Hardware Is Imperfect

# 1) The Processor

Modern microprocessors have a feature, known as microcode, enabling the manufacturer to enact corrections or refine product behavior after release to market. Use of microcode became prevalent after Intel had its own "lesson learned" from a past product recall [24]. Updates to microcode are typically performed by way of the motherboard BIOS, or via the OS.

Throughout our testing and analysis, Intel has released many microcode updates for their "Skylake" product line. Of the many issues addressed, one problem that made headlines around the computer enthusiast community [25,26] definitely had the potential to negatively impact our test results during our visit to TRIUMF. Just prior to this writing, another issue was reported to have impacted multiple product families [27]. Luckily for us, we procured the i5-6600K, which lacks the afflicted capability.

Currently, the manufacturer [28] maintains a long listing of errata, some of which suggesting workarounds, and others remaining unsolved. Behavior of the DUT as a result of these errata certainly can masquerade as a single event functional interrupts (SEFI) or a SEU.

# 2) The Motherboard

Motherboards require revisions to their BIOS code to refine their ability to accommodate those processor(s). The BIOS is important, as it is the interface that allows the system owner to control various operational parameters of the processor. Improper and/or unintended BIOS settings may irreparably harm a processor. Moreover, the BIOS also happens to be a delivery mechanism for refinements to onboard faculties such as: peripheral firmware, support for newly released processor devices within the same product family, and the processor microcode.

While a motherboard manufacturer likely has very intimate knowledge of the processors it intends to support, their BIOS implementations can suffer the same perils as the processor. Indeed, the motherboard manufacturer has the tough task of maintaining a satisfactory level of compatibility and capability with processors *and* a diverse array of SOTA peripherals, like RAM, PCIe graphics adapters, and so on.

The BIOS revision history of our test motherboard included a total of 12 such changes in the span of time between our TRIUMF (BIOS 0219) and MGH (BIOS 2002) tests. One phrase that appeared quite frequently alongside the manufacturer's online BIOS updates, was "Improve System Stability" [29]. As the investigators have learned, truer words could not have been said. Our figures revealed that imperfections in motherboard "stability" enabled operating conditions favorable to a current usage pattern that might appear quite similar to SEL. We just happened to discover what happens when introducing protons at the same time!

# B. Software Is Also Imperfect

Software based monitoring tools will have difficulties adapting to the features of new hardware. Sometimes, hardware vendors are simply unwilling to share details about their product's reporting capabilities; their disposition is that users should solely depend on their proprietary software. In other instances, a vendor may use a well-understood embedded controller on a motherboard, but implement it in a manner that is unusual or ineffective, leading to false readings.

Our early tests lacked power supply data logging capability because we could not adequately implement the vendor's proprietary tool with our experiment. Later, we noticed that we had been logging incorrect data about the VCCIO and VCCSA status of the motherboard.

Fortunately, these issues were overcome somewhat quickly. We believe this turnaround is a direct side effect of our product selection. Therefore, one may wish to scour enthusiast forums to learn what hardware products are popular, and choose components that not obscure, so that you have an easier time with software support. On the software side, vendors or authors that exhibit a positive track records of upkeep and maintenance.

#### C. Errata and Erroneous Errors

With a complex, commercial microprocessor device, testing every vector is practically impossible. Considering our limited time and resources, direct measurement of the transistor-level remains unachievable at this time. Instead, we analyze whatever experimentally obtained error responses we can record in an attempt to understand general device behavior under irradiation.

We were fortunate at our second test visit that our device operated long enough to provide nearly 400 error responses. Making use of the DUT's Machine-Check Architecture [30] we decoded these responses to determine the error disposition (corrected, uncorrected) and the functional location where the error condition was raised. Table 8 provides a high level summary of these details on DUT1 (before we induced its failure) at MGH.

 $Table\ VIII$  High level summary of errors recorded during first day testing of DUT1 at MGH

| Tests         |             |                 | Error Ca    | togoni           |                    |                | Radiation        |             |
|---------------|-------------|-----------------|-------------|------------------|--------------------|----------------|------------------|-------------|
| rests         |             | ı               |             | - ,              |                    |                |                  |             |
|               | Corrected / | Uncorrectable / | Total Error | Most Frequent    | Most Frequent      | Total Fluence  | Avg. Flux        | TID         |
| Test Mode     | Recoverable | Unrecoverable   | Events      | Corrected /      | Uncorrectable /    | (protons/cm^2) | (protons/cm^2/s) | rad(Si) per |
|               | recoverable | Officcoverable  | Recorded    | Recoverable      | Unrecoverable      | per test mode  | per test mode    | test mode   |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
| A II 1 II     | 42          | 0               | 50          |                  | Data Cache Level 0 |                |                  |             |
| All Idle      | 43          | 9               | 52          | 2 Data Read      | Eviction Error     |                |                  |             |
|               |             |                 |             | Error            | Eviction En of     |                |                  |             |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
|               |             | _               |             |                  | Data Cache Level 0 |                |                  |             |
| 1 Core Idle   | 20          | 3               | 23          | 2 Data Read      | Eviction Error     | 1.41E+10       | 6.05E+07         | 8.14E+02    |
|               |             |                 |             | Error            | EVICTION ENO       |                |                  |             |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
| 20 111        | 2.2         |                 | 2.5         |                  | Data Cache Level 0 | 4 005 40       | 0.005.07         | 5 24 5 22   |
| 2 Core Idle   | 23          | 3               | 26          | 2 Data Read      | Eviction Error     | 1.09E+10       | 8.08E+07         | 6.31E+02    |
|               |             |                 |             | Error            | EVICTION ENO       |                |                  |             |
|               |             |                 |             |                  | Data Cache Level 0 |                |                  |             |
| 4 Core Idle   | 0           | 3               | 3           | N/A              |                    | 6.86E+09       | 5.57E+07         | 3.98E+02    |
|               |             |                 |             | ,                | Eviction Error     |                |                  |             |
|               |             |                 |             | L3 Explicit      | Data Cache Level 0 |                |                  |             |
| All Math      | 69          | 8               | 77          | Writeback Error  | Eviction Error     |                |                  |             |
|               |             |                 |             |                  |                    |                |                  |             |
| 1 Core        | 60          | 2               | 71          | L3 Explicit      | Data Cache Level 0 | 2 625 110      | 6 225 07         | 2.005+02    |
| Math          | 69          | 2               | 71          | Writeback Error  | Eviction Error     | 3.62E+10       | 6.32E+07         | 2.09E+03    |
| 2 Core        |             |                 |             |                  | Data Cache Level 0 |                |                  |             |
|               | 0           | 3               | 3           | N/A              |                    | 1.87E+10       | 8.95E+07         | 1.08E+03    |
| Math          | -           | =               |             | ,                | Eviction Error     |                |                  |             |
| 4 Core        |             |                 |             |                  | Data Cache Level 0 |                |                  |             |
| Math          | 0           | 3               | 3           | N/A              | Eviction Error     | 1.03E+10       | 5.35E+07         | 5.97E+02    |
| IVIALII       |             |                 |             |                  |                    |                |                  |             |
| All Full      | 137         | 8               | 145         | L3 Explicit      | Data Cache Level 0 |                |                  |             |
| All Full      | 137         | 8               | 145         | Writeback Error  | Eviction Error     |                |                  |             |
|               |             |                 |             |                  | Data Cache Level 0 |                |                  |             |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
| 1 Coro Full   | 75          | 2               | 77          | 2 Data Read      | Eviction Error     | 3.27E+10       | C 2CF+07         | 1.89E+03    |
| 1 Core Full   | /3          | 2               | //          |                  | Video Scheduler    | 3.2/E+10       | 6.26E+07         | 1.09E+03    |
|               |             |                 |             | Error            | Internal Error     |                |                  |             |
|               |             |                 |             |                  |                    |                |                  |             |
| 2 Core Full   | 39          | 3               | 42          | L3 Explicit      | Video Scheduler    | 2.15E+10       | 7.64E+07         | 1.25E+03    |
| 2 Core Full   | 39          | 3               | 42          | Writeback Error  | Internal Error     | Z.13E+10       | 7.04E+U7         | 1.23E+03    |
|               |             |                 |             | L3 Explicit      | Data Cache Level 0 |                |                  |             |
| 4 Core Full   | 23          | 3               | 26          | '                |                    | 1.34E+10       | 5.89E+07         | 7.76E+02    |
|               |             | =               |             | Writeback Error  | Eviction Error     |                |                  |             |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
| All           | 45          | 7               | 52          | 2 Data Read      | Data Cache Level 0 |                |                  |             |
| FURPRIME      | 45          | /               | 32          |                  | Generic Read Error |                |                  |             |
|               |             |                 |             | Error            |                    |                |                  |             |
|               |             |                 |             | Data Cache Level |                    |                |                  |             |
| 1 Core        | 45          | 3               | 48          | 2 Data Read      | Data Cache Level 0 | 2.20E+10       | 5.95E+07         | 1.27E+03    |
| FURPRIME      | 45          | 3               | 48          |                  | Eviction Error     | 2.2UE+1U       | 5.95E+U/         | 1.2/E+U3    |
|               |             |                 |             | Error            |                    |                |                  |             |
| 4 Core        |             |                 |             |                  | Data Cache Level 0 |                |                  |             |
| FURPRIME      | 0           | 4               | 4           | N/A              | Generic Read Error | 2.60E+09       | 6.30E+07         | 1.50E+02    |
| I OINFINIIVIE |             |                 |             |                  | Generic Nead EITOI |                |                  |             |
|               |             |                 |             | Data Cache Level | Data Cacha Laval C |                |                  |             |
| All FUR       | 58          | 6               | 64          | 2 Data Read      | Data Cache Level 0 |                |                  |             |
|               |             |                 |             |                  | Eviction Error     |                |                  |             |
|               |             |                 |             | Error            |                    |                |                  |             |
|               |             |                 |             | Data Cache Level | Data Cacha Laval O |                |                  |             |
| 1 Core FUR    | 44          | 5               | 49          | 2 Data Read      | Data Cache Level 0 | 2.29E+10       | 5.93E+07         | 1.33E+03    |
|               |             |                 | .5          |                  | Eviction Error     |                |                  |             |
|               |             |                 |             | Error            |                    |                |                  |             |
|               |             |                 |             | Data Cache Level | Video Cobodul-     |                |                  |             |
| 4 Core FUR    | 14          | 1               | 15          | 2 Data Read      | Video Scheduler    | 1.61E+10       | 6.33E+07         | 9.35E+02    |
|               |             |                 |             | Error            | Internal Error     |                |                  |             |
|               |             |                 |             | LITUI            |                    |                |                  |             |
| All Modes     | 352         | 39              | 391         |                  |                    | 2.28E+11       |                  | 1.32E+04    |

Some interesting trends were identified:

- Every test condition generated unrecoverable errors, resulting in a SEU; these errors were almost evenly spread amongst the tests
- Tests executed with 1-core exhibited the highest corrected error counts and obtained the highest total fluence counts
- 3. The most prevalent unrecoverable errors were related to the Level 0 (micro op) cache
- 4. More active processing cores equates to a larger physical target, causing higher susceptibility to SEU
- Some correctable errors are threshold event indicators, even though they are counted like a single-instance event; this kind of error masking makes cross section calculations impractical

Most of all, it is highly recommended to save all of your decoded status register details to a spreadsheet to save yourself time in analyzing data for future tests.

## D. Cable Selection Matters

Test facilities, by virtue of their physical layout, can exert their own impact on the testing experience. The most notable difference between sites was the difference in the physical distance from the test chamber to the user control area. At TRIUMF, we required only ~70 feet of cabling whereas, at MGH, a significantly longer cable length in excess of ~100 feet was required.

Lengths of cabling in excess of 100 feet posed a burden on our ability to relay video output from the motherboard. Prior to visiting MGH, a combination of category 5, 6, and 7 Ethernet cabling and HDMI extenders was identified through trial and error. The lesson here, is to carefully evaluate any Ethernet-based extension devices, and not to assume that higher category Ethernet cables are the sole option to ensure interoperability of the test setup.

# E. The Product Cycle May Be Moving Too Fast

As of this writing, Intel's latest microprocessor offerings [31] span across 4 different families, all 14-nm: 5<sup>th</sup> Generation "Broadwell" (late 2014), 6<sup>th</sup> Generation "Skylake" (late 2015), and 7<sup>th</sup> Generation "Kaby Lake" (late 2016), with parts based on the 10-nm process [32] being showcased for release in 2018! At this rate, investigators will have very little time (relative to the many years required to conceive a NASA flight project and bring it to launch) to establish truly significant reliability data.

By the time 10-nm processors arrive, how soon will the industry move onto 7-nm? What if the future microprocessor market renders parts whose radiation characteristics turn out worse than the present?

NASA has flown other Intel microprocessors [33], but nothing more recent than the 80486. A rapidly changing source of supply and reliability data will not help to change this trend.

## X. CONCLUSION

We showed how different motherboards and BIOS implementations impacted the performance of a processor under proton irradiation (Figs. 7-17) and revealed that these differences did not appear to be sufficient enough to induce a hard failure in the absence of radiation. Yet, due to the new-to-market nature of our chosen DUT, and limited ability of early-revision hardware and software, retroactive testing (with updated hardware and software) was required to understand the mechanism of failure.

From this experience, we learned the importance of applying updates to newly released products to address unforeseen or unintentional errors and imperfections within their design. Therefore, we have to stress that the observed hard failure should not be used as an indictment of the performance of this part.

The findings depicted should serve as a lesson, to: a) perform any/all BIOS updates, and b) consider allowing a period of time for market saturation of these commercial devices. Or, simply allow some extra time for end users and, perhaps, the enthusiast community, to get ahold of these parts and torture *their* systems beyond manufacturer specification, before we conduct our testing.

**However** – with the product cycle changing as rapidly as it is, there may not be enough time to characterize the reliability of any modern commercial microprocessor before it, and/or experimental data, becomes obsolete.

## XI. ACKNOWLEDGEMENTS

The authors would like to thank the NASA Electronic Parts and Packaging Program for its support and guidance, as well as the Naval Surface Warfare Center Crane. We also thank Martin Malik for implementing our requests into his feature-rich HWiNFO system monitoring tool.

#### REFERENCES

- [1] LaBel, K.A., "NASA Electronic Parts and Packaging (NEPP) A NASA Office of Safety and Mission Assurance (OSMA) Program", GSFC-E-DAA-TN44274, July, 2017, https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20170006861.pdf
- [2] LaBel, K.A., Turflinger, T., Haas, T., George, J., Moss, S., Davis, S., Kostic, A., Wie, B., Reed, R., Guertin, S., "Team Update on North American Proton Facilities for Radiation Testing", GSFC-E-DAA-TN33299, July, 2016, https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20160007685.pdf
- [3] K.A. LaBel, E.G. Stassinopoulos, G.J. Brucker, C.A. Stauffer, "SEU Tests of a 80386 Based Flight-Computer/Data-Handling System and Discrete PROM and EEPROM Devices, and SEL Tests of Discrete 80386, 80387, PROM, EEPROM and ASICS", Workshop Record for the 1992 IEEE Radiation Effects Data Workshop, pp 1-11, 1992
- [4] A. Moran, et al., "Single Event testing of the INTEL 80386 and the 80486 Microprocessor," IEEE Trans. Nucl. Sci., 43(3), pp. 879-885 (1996).
- [5] Kouba, C., Choi, G., "Test Report The Single Event Effect Characteristics of the 486-DX4 Microprocessor", NASA/CR-97-206025, December 30, 1996, https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19970040588.pdf
- [6] D. M. Hiemstra and A. Baril, "Single Event Upset Characterization of the Pentium MMX and Pentium II Microprocessors Using Proton Irradiation," IEEE Trans. Nucl. Sci., 46(6), pp. 1453-1460 (1999).

- [7] J. W. Howard Jr. et al., "Total Dose and Single Event Effects Testing of the Intel Pentium III (P3) and AMD K7 Microprocessors," 2001 IEEE Radiation Effects Data Workshop, pp.38-47.
- [8] Kenneth A. LaBel, Robert A. Gigliuto, Carl M. Szabo, Jr., Martin A. Carts, Matthew Kay, Timothy Sinclair, Matthew Gadlage, Adam Duncan, and Dave Ingalls," Hardness Assurance for Total Dose and Dose Rate Testing of a State-Of-The-Art Off-Shore 32 nm CMOS Processor," https://nepp.nasa.gov/files/24951/NSREC2013\_LaBel\_W40L.pdf, Jul. 2013.
- [9] C. Szabo, A. R. Duncan, K. A. LaBel, M. J. Kay, P. Bruner, M. Krzesniak, L. Dong, "Preliminary Radiation Testing of a State-of-the-Art Commercial 14nm CMOS Processor / System-on-a-Chip," in Proc. 2015 IEEE Radiation Effects Data Workshop, Jul. 2015, pp. 1-8.
- [10] Duncan, A. R., Gadlage, M. J., Roach, A. H., Williams, A. M., Kay, M. J., Ingalls, J. D., Hedge, C. H., Bossev, D. P., Szabo, C. M., LaBel, K. A., "Single Event Effects in 14-nm Intel Microprocessors", in Proc. 2016 IEEE Radiation Effects Data Workshop, Jul. 2016, pp. 132-140.
- [11] Intel® ARK (Product Specs). (2017). Intel® Core™ i5-6600K Processor (6M Cache, up to 3.90 GHz) Product Specifications. [online] Available at: http://ark.intel.com/products/88191/Intel-Core-i5-6600K-Processor-6M-Cache-up-to-3\_90-GHz [Accessed 15 Sep. 2017].
- [12] Intel® Turbo Boost Technology 2.0. [online] Available at: https://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-technology.html [Accessed 15 Sep. 2017].
- [13] Intel® Turbo Boost Technology Frequency Tables for Intel® Core<sup>TM</sup> i5. [online] Available at: https://www.intel.com/content/www/us/en/support/processors/00000564 7.html [Accessed 15 Sep. 2017].
- [14] Windows Server 2012 R2 and Windows Server 2012. [online] Available at: https://technet.microsoft.com/en-us/library/hh801901(v=ws.11).aspx [Accessed 15 Sep. 2017].
- [15] Professional System Information and Diagnostics, HWiNFO32™ / HWiNFO64™ - Powerful system information tools for Windows. [online] Available at: http://www.hwinfo.com [Accessed 15 Sep. 2017].
- [16] Intel® Math Kernel Library LINPACK Download. [online] Available at: https://software.intel.com/en-us/articles/intel-math-kernel-librarylinpack-download [Accessed 15 Sep. 2017].
- [17] FurMark: VGA Stress Test, Graphics Card and GPU Stability Test, Burn-in Test, OpenGL Benchmark and GPU Temperature [online] Available at: http://www.ozone3d.net/benchmarks/fur/ [Accessed 15 Sep. 2017].
- [18] Splinterware Scheduler [online] Available at: http://www.splinterware.com/products/wincron.htm [Accessed 15 Sep. 2017].
- [19] PsTools Windows Sysinternals | Microsoft Docs [online] Available at: https://technet.microsoft.com/en-us/sysinternals/bb896649.aspx [Accessed 15 Sep 2017].
- [20] GIMPS Free Prime95 software downloads PrimeNet. [online] Available at: https://www.mersenne.org/download/ [Accessed 20 Sep. 20171.
- [21] Triumf.ca. (2017). Main Cyclotron & Beam Lines | TRIUMF : Canada's National Laboratory for Particle and Nuclear Physics. [online] Available at: http://www.triumf.ca/research-program/research-facilities/maincyclotron-beam-lines [Accessed 15 Sep. 2017].
- [22] The MGH Francis H. Burr Proton Beam Therapy Center Massachusetts General Hospital, Boston, MA. [online] Available at: http://www.massgeneral.org/radiationoncology/BurrProtonCenter.aspx [Accessed 15 Sep. 2017].
- [23] Windows Memory Diagnostic Tool. [online] Available at: https://www.petri.com/windows-memory-diagnostic-tool [Accessed 15 Sep. 2017].
- [24] Yeraswork, Z., Lessons Learned: Pentium Flaws Aid Intel In Sandy Bridge Chipset Recall. [online] Available at: http://www.crn.com/news/components-peripherals/229400535/lessons-learned-pentium-flaws-aid-intel-in-sandy-bridge-chipset-recall.htm [Accessed 15 Sep. 2017].
- [25] Walton, M., Intel Skylake bug causes PCs to freeze during complex workloads. [online] Available at: https://arstechnica.co.uk/gadgets/2016/01/intel-skylake-bug-causes-pcsto-freeze-during-complex-workloads/ [Accessed 15 Sep. 2017].
- [26] Ung, G., How to test your PC for the Skylake bug. [online] Available at: https://www.pcworld.com/article/3021023/hardware/how-to-test-your-pc-for-the-skylake-bug.html [Accessed 15 Sep. 2017].

- [27] Hothardware.com. (July, 2017). Intel Releases Critical Skylake And Kaby Lake HyperThreading Bug Fix. [online] Available at: https://hothardware.com/news/intel-hyperthreading-bug-fixed [Accessed 15 Sep. 2017].
- [28] 6<sup>th</sup> Genereation Intel® Processor Family Specification Update (June, 2017) [online] Available at: https://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html [Accessed 15 Sep. 2017].
- [29] ASUS USA. Z170M-PLUS | Motherboards | ASUS USA. [online] Available at: https://www.asus.com/us/Motherboards/Z170M-PLUS/HelpDesk\_BIOS/ [Accessed 15 Sep. 2017].
- [30] "Intel64 and IA-32 Architectures Software Developer's Manual," Vol 3B, July. 2017.
- [31] Processor Specifications [online] Available at https://ark.intel.com/#@PanelLabel122139 [Accessed 15 Sep. 2017]
- [32] Leading at the Edge: Intel Technology and Manufacturing. (Sept 2017) [online] Available at: https://newsroom.intel.com/press-kits/leading-edge-intel-technology-manufacturing/ [Accessed 18 Sep. 2017]
- [33] CPU History Computers and CPUs in Space [online] Available at: http://www.cpushack.com/space-craft-cpu.html [Accessed 15 Sep. 2017]