Thermal System Identification (TSI): A Methodology for Post-silicon Characterization and Prediction of the Transient Thermal Field in Multicore Chips

Minki Cho, William Song, Sudhakar Yalamanchili, and Saibal Mukhopadhyay
School of ECE, Georgia Institute of Technology
266 Ferst Drive, Klaus Advanced Computing Building
Atlanta, GA, USA
mcho8@gatech.edu

Abstract
This paper presents a methodology for post-silicon thermal prediction to predict the transient thermal field of a multicore package for various workload considering chip-to-chip variations in electrical and thermal properties. We use time-frequency duality to represent thermal system in frequency domain as a low-pass filter augmented with a positive feedback path for leakage-temperature interaction. This thermal system is identified through power/thermal measurements on a packaged IC and is used for post-silicon thermal prediction. The effectiveness of the proposed effort is presented considering a 64 core processor in predictive 22nm node and SPEC2006 benchmark applications.

Keywords
Thermal prediction, multi-core, system identification, leakage, and process variations

1. Introduction

Characterization of the spatiotemporal variation of the on-chip junction temperature (the transient thermal field) is crucial for thermal-aware design, assembly, and management for reliable in-field operation of a chip (die and package) [1, 2]. The thermal field is generated by the interaction of time-varying power pattern and the thermal properties (resistivity and heat capacity) of die and package materials. Further, the thermal properties of the die/package assembly [e.g. conductivity of thermal interface materials (TIM)] can vary between different instances of same IC (chip-to-chip variation) or over time (e.g. delamination in TIM [3]). Moreover, imperfections in the manufacturing process leads to die-to-die and within-die process variations in transistor leakage [1]. The leakage and temperature are positively correlated – a higher temperature results in higher leakage which further increase the temperature. Hence, for same dynamic power, chip-to-chip leakage variation leads to variation in on-chip temperature [4]. Fig. 1 illustrates the impact of process variation and leakage-temperature interaction on thermal behavior of a chip using example simulations in predictive 22nm node. As the die-to-die process variation increases with technology scaling, the post-silicon chip-to-chip variation in transient thermal field is also expected to increase. This challenge is further enhanced by many-core processor architectures running increasingly data intensive and unstructured workloads. As the power, performance, and lifetime reliability of processors depends on the transient temperature, in-field reliable operation of many-core processors needs the accurate characterization of the interaction of workload variation and chip-to-chip/package-to-package variations in thermal/electrical properties. This leads to a new challenge - post-silicon prediction of the transient thermal field. The objective of post-silicon thermal prediction is to predict the transient temperature of a particular instance of a packaged IC for various workload and considering chip-to-chip and package-to-package variations in electrical (leakage) and thermal properties.

The existing transient thermal simulation methods (finite element/volume or distributed RC), suitable for fine-grain design time transient thermal analysis, require accurate estimation of thermal resistivity and heat capacity of all materials [5-7]. Many works have studied on how to measure the thermal resistance and capacitance of thermal interface material (TIM), heat sink, convective, and heat spreader [8-11]. Many steady-state method works are modeled after ASTM D5470 [8]. A. Poppe et. al presented dynamic electrical temperature measurement [9] and R. Campbell et. al presented the flash diffusivity method for accurate measurement of thermophysical property data [10]. The

![Figure 1: Illustration of the need for post-silicon transient thermal analysis considering process variation: (a) the interaction of leakage (average for all input condition) and temperature in a NAND2 gate considering different process corners (HVT – High threshold voltage t i.e. low leakage corner, NVT – nominal threshold voltage, and LVT – low threshold voltage corner). We observe that different process corner results in different leakage-temperature interaction. (b) the effect of such interaction for an example self-consistent thermal simulation (using distributed RC network) considering a square wave dynamic power profile (e.g. turning on and off a the chip after a time-interval) and leakage of 10million NAND2 gate. We see that even for same dynamic power, LVT chips can have higher temperature than HVT chips.](image-url)
measurements of thermal resistance and capacitance suffer from repeatability, contamination, pressure, and inaccuracy problems. Even if we measure accurately the thermal resistances of TIM, heat sinks, and interface, in stacking condition those values are changed due to imperfect attachment and manufacturing. K. Kurabayashi et al. presents that the die attach resistance differs substantially from the value predicted using the bulk thermal conductivity of the attachment material because of partial voiding and delamination [12]. Consequently, the fine-grained distributed RC based thermal simulators used during design time are difficult to adopt for post-silicon thermal analysis.

2. Contributions and Novelty

This paper presents a unique approach for transient thermal analysis that addresses the specific requirements of post-silicon thermal prediction. The proposed approach, referred to as Thermal System Identification or TSI, is based on principles of system identification, frequency domain signal analysis, and positive feedback system. We develop the mathematical principles of the proposed approach and demonstrate its effectiveness in post-silicon thermal analysis of a 64 core processor at predictive 22nm node [13]. The each core is modeled as close to Intel Nehalem [14] architecture running at 3.0GHz. This post silicon characterization of a multicore chips can be used by operating systems to schedule workloads since the identification of the chip thermal system enables schedulers to reason about the thermal consequences of scheduling a specific workload on a target chip. This understanding can also be exploited in configuring large system (e.g. data centers) via thermally compatible aggregations of multicore packages. Fig. 2 shows the overall flow of the proposed post-silicon thermal prediction approach. This paper makes the following contributions:

- **High-level Transfer Function of the Thermal System including Leakage-Temperature Interaction:** We provide a high-level abstraction of the thermal behavior of a chip as a multi-input multi-output (MIMO) system where power sources are system inputs and observed temperature values at different locations are the system outputs. The interaction of leakage and temperature is used as an integral part of this high-level MIMO system. We show that this thermal system can be represented in frequency domain as a filter matrix. In time domain heat diffusion equation represents a distributed RC network which behaves as a low-pass filter in frequency domain. This is augmented with a positive feedback path representing leakage-temperature interaction.

- **Thermal System Identification - Post-silicon Extraction of Transfer Function of the Thermal System and Fast Prediction of Transient Thermal Field:** We present methodologies that can identify this thermal system (i.e. the thermal filters) after fabrication and packaging using sequences of on-chip power and temperature measurements. These methods allow one to construct a unique thermal system for each chip (thermal system identification or TSI).

We present methods to accurately predict the chip-specific transient thermal fields for varying workloads using the corresponding thermal filter matrix \( [H(ω)] \). The frequency response of the temperature variation over a time interval is computed from the Fourier transform of power pattern in that interval and the filter matrix \( [T(ω)=H(ω)×P(ω)] \). The time-domain temperature is obtained from the temperature spectra.

Several methods have been proposed in recent years for fast steady-state spatial thermal map (e.g. power blurring method in [15] and discrete cosine transform (DCT) based method in [16], fast transient temperature simulations (e.g. [17-18]), and fast spatiotemporal analysis considering multilayers of power and materials (e.g. ThermalScope [19]). The TSI based approach provides important advantages in post-silicon thermal analysis over the above mentioned approaches used in fine-grain design-time thermal analysis. First, the proposed approach performs temperature prediction using the thermal transfer function extracted from the full thermal system (i.e. stacks of heat sink, spreader, TIM, and chip), instead of computing thermal resistance and capacitances of individual materials in isolation. Therefore, the effects of any non-uniformity and/or uncertainty in the thermal properties of the materials are captured in the extracted transfer function. Moreover, as the leakage temperature interaction is considered as a part of the MIMO system, the effect of process variation of individual chips is also automatically considered. Second, the fast simulators mentioned earlier do not consider leakage-temperature interaction. Currently, the transient temperature estimation considering leakage-temperature interaction is performed using distributed RC based simulators (e.g. Hotspot [21]) where leakage power is updated in each time-step based on the current thermal map [19-22]. Therefore, higher accuracy of the temperature estimation requires fine-grain time-step which in turn increases simulation time. In the proposed approach the leakage temperature interaction is...
incorporated in the system transfer function and temperature estimation is performed in the frequency domain. Consequently, the accuracy of the proposed method is less sensitive to time-step allowing fast estimation of transient temperature.

3. Mathematical Approach

3.1. Modeling the MIMO Thermal System with Leakage-Temperature Interaction

In a MIMO system, the temperature of an observation point is affected by the multiple input power sources. Since a distributed RC network is a linear system, superposition principle can be applied here i.e. the temperature at one location is the additive response of all power sources in the system. Assume that there are M power sources organized into m×m 2D grids. We further assume that there are L numbers of observation points organized in l×l grids. The temperature at the observation point (i, j) in frequency domain can be estimated as:

\[ T_k(i, j) = P_k(i, j)H_{11}(\omega) + \sum_{l=1}^{N} P_{l}(i, j)H_{l1}(\omega) + \cdots + \sum_{m=1}^{N} P_{m}(i, j)H_{m1}(\omega) = \sum_{l=1}^{N} P_{l}(i, j)H_{l1}(\omega) \]  

Note \( H_{ij}(\omega) \) is defined as the self-transfer function of a location (i.e. the transfer function connecting power and temperature of a location \( H_{self}(\omega) \)). Likewise \( H_{pq}(\omega) \) \((\forall p, q \neq i, j)\) is defined as the cross transfer function \( H_{cross}(\omega) \) that connects power of one location and temperature of another. The above formulation leads to the 2D filter matrix for the MIMO system (Fig. 3):

\[
\begin{pmatrix}
T_{11}(\omega)
\vdots
T_{00}(\omega)
\end{pmatrix} =
\begin{pmatrix}
H_{11}(\omega) & \cdots & H_{m1}(\omega) \\
\vdots & \ddots & \vdots \\
H_{10}(\omega) & \cdots & H_{m0}(\omega)
\end{pmatrix}
\begin{pmatrix}
P_{11}(\omega) \\
\vdots \\
P_{00}(\omega)
\end{pmatrix}
\]  

(2)

We now estimate the self and cross transfer functions considering the leakage feedback. Without loss of generality, we explain this considering two sources and two locations. Consider leakage current \( P_L \) depends on temperature as:

\[ P_L(T) = P_L(T_0) + f(T) \]  

(3)

where \( P_L(T_0) \) is the leakage power at room temperature, and the function \( f(T) \) represents sensitivity of leakage power to temperature. First, we consider \( H_{self} \) i.e. the temperature of location \( i \) due to the power source of at location \( i \). We obtain:

\[ T_{i0}(\omega) = \left[ P_{L}(\omega) + P_{L}(\omega) + f(T_{i0}(\omega)) \right]H_{i1}(\omega) = \left[ P_{L}(\omega) + P_{L}(\omega) + f(T_{i0}(\omega)) \right]H_{i1}(\omega) \]  

(4)

The last approximation assumes a linear interaction between leakage and temperature to improve analytical tractability. Both the room temperature leakage \( P_L(T_0) \) and the coefficient \( \alpha \) depends on leakage-temperature interaction. Note \( P_L(\omega) = P_L(\omega) + P_L(\omega) \) is the spectral response of power without leakage-temperature feedback (can be estimated from the workload). Now the thermal system model can be represented as (Fig. 3):

\[ T_i(\omega) = \frac{H_{i1}(\omega)}{1 - \alpha H_{i1}(\omega)} P_i(\omega) \]  

(5)

We now evaluate the temperature of location \( i \) due to power source at location \( k \). We apply superposition principle during this evaluation estimate \( T_i(\omega) \) assume \( P_i(\omega) = 0 \). But the heat generated in location \( k \) propagates to location \( i \) which increases the temperature of location \( i \). Increase in temperature at location \( i \) triggers the leakage feedback loop at location \( i \). This results in leakage power at location \( i \) and hence, increase temperature of location \( i \). The temperature increase in core \( i \) due to power of core \( k \) is therefore estimated as (Fig. 2):

\[ T_i(\omega) = P_i(\omega)H_{i1}(\omega) + \alpha T_i(\omega)H_{i0}(\omega) = \frac{H_{i1}(\omega)}{1 - \alpha H_{i1}(\omega)} P_i(\omega) \]  

(6)

3.2. Methods for Thermal System Identification

The principle discussed above requires frequency response of the self and cross transfer functions for each chip (i.e. TSI). To perform TSI on the MIMO system, one input power source is excited at a time and temperature is measured at all observation points considered. Hence, the equation (2) transforms to:

\[ \forall i & j: T_{ij}(\omega) = P_{ij}(\omega)H_{ij}(\omega) + T_{i0}(\omega)H_{i0}(\omega) = T_{i0}(\omega)H_{i0}(\omega) \]  

(7)

The above equation can be used to estimate the thermal filter from all input power sources to all temperature observation points. As equation (7) is division of two complex numbers, both magnitude and phase of the filter response are extracted. For better accuracy it will be efficient to minimize the leakage of unselected locations.

4. Applications of TSI to Thermal Modeling of Many-Core Processors

![Image](https://via.placeholder.com/150)

Figure 3: Mathematical principle of the proposed approach. The thermal system is considered as a MIMO system. The leakage temperature interaction is considered as a positive feedback for both self and cross transfer functions between power and
In this section we apply the TSI based approach to the post-silicon thermal prediction of many-core processor. We consider one temperature sensor is present in each core. Therefore, the MIMO thermal system for many-core has power of each core as an input and temperature of each core as an output.

4.1. Baseline Thermal Simulator used for Verification of the Proposed Approach

We first describe the baseline thermal simulation platform used to verify accuracy of the TSI based approach. We consider 3D model of the thermal system including chip, TIM, heat spreader, and heat sink. 3D distributed RC grid is generated for the different regions of the system. We use circuit simulator, HSPICE, for solving the distributed RC grid in time-domain. The power profiles are applied as current sources. The chip is modeled as a homogenous 64 core processor with private cache designed in predictive 22nm technology (total chip area 400mm$^2$, each core and private cache ~6.25mm$^2$). Each core was modeled as close to Intel Nehalem architecture [14] running at 3.0GHz. We generate power traces of SPEC 2006 benchmark suites using cycle-accurate architecture simulation for timing (Zesto [23]) and power (McPAT [24]) considering x86 architecture. Each benchmark was run or repeated for 0.5 seconds in real time. The above environment considers architectural inputs (e.g., cache sizes, instruction decode width, number of execution units, etc.) and device parameters at various technology nodes to estimate the physical features of the processor. The example power traces obtained from the simulation are shown in Fig. 4.

Figure 4: Transient power traces of exemplary benchmarks for SPEC2006 applications (part a). The frequency response of the power traces are also shown in part (b) of the figure. The frequency response are used with the thermal transfer function to compute the temperature variations.

4.2. Thermal System Identification for Many-Core

The practical challenge in TSI of many-core processors is the generation of power spectra in equation (7). The accurate approach is to apply sinusoidal power waveforms of different frequency (small signal analysis). However, generating sinusoidal power waveform in hardware (in a chip) is challenging. We propose two alternative approaches. **Power Spectra Generation with Core-Gating Control:** First, we propose to control the core level power and clock gating (i.e. core-gating available in current processors [25-26]) to generate power pattern of desired frequency spectra. To illustrate this approach we perform SPICE simulation considering core gating (Fig. 5). We consider the core as hundreds of 15-stage ring oscillators to emulate dynamic power. Each core is controlled with a sleep transistor and sleep control signal. The frequency of the sleep control signal is modulated to generate power variation of different frequency. The pseudo-periodic temperature response [similar to figure 1(b)] is measured to obtain the thermal filter response.

In this section we apply the TSI based approach to the post-silicon thermal prediction of many-core processor. We consider one temperature sensor is present in each core. Therefore, the MIMO thermal system for many-core has power of each core as an input and temperature of each core as an output.

**Figure 5:** Core gating based approach to power spectra generation. (a) sleep transistor signal (5Mhz) and (b) power pattern (5Mhz). We can observe from SPICE based circuit simulation considering a core of hundreds of 15-stage ring oscillators to emulate dynamic power. The cores are controlled with a sleep transistor and sleep control signal. The frequency of the sleep control signal is modulated to generate power variation of different frequency. The pseudo-periodic temperature response [similar to figure 1(b)] is measured to obtain the thermal filter response.

The practical challenge in TSI of many-core processors is the generation of power spectra in equation (7). The accurate approach is to apply sinusoidal power waveforms of different frequency (small signal analysis). However, generating sinusoidal power waveform in hardware (in a chip) is challenging. We propose two alternative approaches. **Power Spectra Generation with Core-Gating Control:** First, we propose to control the core level power and clock gating (i.e. core-gating available in current processors [25-26]) to generate power pattern of desired frequency spectra. To illustrate this approach we perform SPICE simulation considering core gating (Fig. 5). We consider the core as hundreds of 15-stage ring oscillators to emulate dynamic power. Each core is controlled with a periodic sleep control signal of a given frequency which generates periodic power
pattern of same frequency. Hence, by controlling the period of the sleep control signal we can modulate the spectral behavior of the generated power patterns. The on-chip power monitors can be used to sense the core level power [27]. Application Driven Power Spectra: The second approach is to run multiple test applications in individual cores and measure power and temperature to compute the filter response. As power profile generated by each application may not contain significant spectral power at all frequency, we consider average of the filter responses computed using different applications as the extracted filter.

Figure 6 shows the thermal filter extracted using the practical core-level control closely follows the one from the theoretically ideal small-signal analysis. We observe that the thermal systems behave as the 1st order low-pass filter. The cutoff frequency is located in the low frequency range. Hence, fast time-varying power input has less impact on the temperature while low frequency power variations are more critical. We next study the behavior of the extracted core-to-core cross thermal filters. Fig. 7 shows frequency responses for different location of interest when a power source is applied at core D0. We observe that gain at the observation point continues to decrease in all frequency range as it moves away from the source. The decrease in gain due to spatial effect is larger at higher frequencies i.e. fast varying power

Figure 7: Filter behavior of thermal system: distance between source core and observation node. We observe that both self (D1) and cross (D1, D2) transfer functions behave as low-pass filter. The strength of the cross transfer function reduces significantly with distance i.e. power spread in the distant cores will have minimal impact on the temperature of a core. We also see that the effect of cross transfer function is even less pronounced at higher frequency.

Figure 8: Filter behavior of the thermal system: effect of the location of the source core. We observe that behavior of the self-transfer function depends on the location of the core. We consider average of the filter responses computed using different applications as the extracted filter.

Figure 9: Estimation error in transient variation of temperature for a typical core in the 64 core system. The simulations were performed considering random workloads created for all 64 cores using random assignments of benchmark applications for SPEC2006 suites.

Figure 10: Estimation error in instances of the spatial thermal field at different time points. The entire simulations consists of random workloads for running in all 64 cores for 500ms. The proposed method represents the estimation using system function extracted from power/thermal measurement. We observe that the proposed TSI based approach can successfully predict the spatial thermal field.

Figure 11: The Estimation Error on: (a) Maximum temperature, (b) Spatial difference. The temperature estimated from the TSI based approach closely follows the distributed RC based thermal simulation.
We verify the accuracy of the post-silicon TSI based thermal models against the distributed RC based thermal simulator described in section 4.1. We first create several (60) workloads by randomly assigning the power trace of different application (0.5s of real time data) to different cores and use them for thermal analysis. The same patterns were also run through the baseline distributed RC based thermal simulator. Figure 9 compares the transient temperature variation for a typical core and Figure 10 compares example spatial thermal maps at different time instants generated from distributed RC based simulation and the proposed approach (with power/thermal measurement driven filter). It can be observed that both transient and spatial temperature variation are well captured. In Figure 11, we estimate errors on two critical thermal parameters of the chip: maximum temperature and spatial difference. Spatial difference is the difference between maximum and minimum chip temperature at a given time and represents the uniformity of temperature in space. We observe that the average estimation error on maximum temperature is less than 2°C and on spatial difference is less than 1°C. Figure 12 shows the distribution of estimation errors considering temperature of all cores at all-time points, and all random patterns. We observe that average error is less than 1°C between detail RC based thermal simulation (SPICE) and proposed TSI based model (the „system function from power/thermal measurement”). We observe that proposed method reduces the prediction time by ~4X compared to Hotspot and ~5X compared to HSPICE based RC solver under same spatial granularity and number of power samples.

The TSI based approach is limited by the accuracy of the temperature sensors used in transfer function extraction. To understand the impact of the measurement error, we study the effect of inaccuracies in the temperature sensor on the estimated temperature. A uniform distribution of sensor error in the range of -3°C to +3°C is considered. We first added sensor errors into measured temperature data and re-extract transfer function. Figure 12 shows that, even with relatively large sensor error that results in errors in transfer function extraction, the average estimation error increases marginally. Repeated power/thermal measurements can be used to further reduce the effect of sensor errors. The accuracy of the analysis also depends on the granularity of the transfer function in the frequency domain. A challenge comes from the practical limitations on controlling the frequency components in the power spectra. Limited number of frequency components degrades the accuracy of proposed approach. This is also shown in Fig. 12 where we observe that if the system transfer function is extracted using small-signal analysis (i.e. applying multiple sinusoidal power waveform with a single frequency), the estimation error is reduced (the red curve). The fine-grain spectral analysis however requires larger number of power/temperature measurement and increases the characterization time.

5. Application to Post-Silicon Thermal Prediction

4.3. Accuracy of TSI based Thermal Prediction

Figure 12: Accuracy of the proposed approach considering random workload: core level error statistics considering 64 cores and 60 random workloads. The statistics were computed considering errors in all cores in all simulation time points. We observe that a transfer function from small signal analysis provides marginally better accuracy compared to that from power/thermal measurements. We also see that even high errors in the sensors (assumed to be uniformly distributed between -30°C to +30°C) have relatively less impact in the accuracy of transient temperature estimation.

Figure 13: The application of TSI based approach on the prediction of impact of process variation on transient temperature. The effect of leakage-temperature interaction is captured in the extracted filter (part-a) showing a higher gain for low-Vt process corners compared to high-Vt corners. Hence, for same workload and dynamic power pattern we observe higher temperature for low-Vt chips than high-Vt chips (part-b). The power pattern represents workloads using SPEC2006 benchmarks. The thermal conductivity was kept constant in the three simulations source has less impact on neighboring regions. Figure 8 shows filter response for different location of source core. The results show that gain at the observation point changes as the location of source core varies. A side/edge core has less gain in low frequency compared to center core. Since the area of heat spreader/sink is larger than the area of chip, the cooling from heat sink is more effective on edge/side cores. We further note that the filter response between a source and an observation node depends on the physical property of the material system that determines the heat flow. It is independent of the magnitude of the generated power, floorplan of the chip, and architecture. The latter factors modulate the power profile and hence, temperature profile but not the filter response.
5.1. Capturing the effect of Process Variations and TIM Conductivity on Thermal Prediction

After verifying the accuracy of TSI based thermal prediction, we next study its effectiveness in post-silicon thermal prediction. We study the ability of TSI in predicting the effect of variations in process corners and thermal conductivity. In this analysis, low-Vt implies a negative 100mV Vth shifts for all devices in a chip while high-Vt implies positive 100mV Vth shifts. The low-Vth dies have much higher leakage and stronger leakage temperature interaction. Fig. 13(a) and 14(a) shows that proposed method captures the effect of chip-to-chip variations in leakage and thermal conductivity of the thermal stack consisting of TIM, heat spreader, and heat sink on the extracted thermal filters. We observe that low-Vt die and lower conductivity thermal stack increase the gain in the low-frequency range of the filter transfer function. To illustrate the impact of these variations in filter response, we consider Normal random die-to-die variation of Vth. Each Vth point generated from this Normal distribution represents a unique die for the same many-core processor. For each of such die we consider three different thermalconductivities. TSI is next used to extract the thermal system for all of these die/package condition. The extracted filters for each such instance of the packaged dies are unique. The same workload pattern is applied to all such unique thermal systems to study the effect of process and thermal conductivity variation on chip temperature. Fig. 13(b) and 14(b) show time-domain temperature variation for a typical core for a chip running the same workload but moved to different Vth and thermal conductivity corners.

6. Conclusion

We have presented a methodology or post-silicon thermal prediction. The proposed method first identifies the frequency domain response of the thermal system of a packaged die. The extracted filter is used that for fast chip-specific analysis of transient thermal field considering leakage-temperature feedback. The capabilities of post-silicon characterization of the thermal system can benefit thermal design and management at chip as well as large system level.

Acknowledgement

This work is supported in part by Semiconductor Research Corp (under grant 2084.001), IBM Faculty Award, and Intel Corp. Authors would like to thank Dr. R. Rao, Dr. E. Kursun, Dr. W. Huang, and Dr. P. Bose from IBM Corp. for many helpful discussions.

References

[22] P. Zhou et. al., “Thermal effects with leakage power considered in 2D/3D floorplanning,” IEEE international