# A CAD TOOL FOR THE PREDICTION OF VLSI INTERCONNECT RELIABILITY

by

David Frank Frost

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electronic Engineering, University of Natal 1988

#### Abstract

This thesis proposes a new approach to the design of reliable VLSI interconnects, based on predictive failure models embedded in a software tool for reliability analysis.

A method for predicting the failure rate of complex integrated circuit interconnects subject to electromigration, is presented. This method is based on the principle of fracturing an interconnect pattern into a number of statistically independent conductor segments. Five commonly-occurring segment types are identified: straight runs, steps resulting from a discontinuity in the wafer surface, contact windows, vias and bonding pads. The relationship between median time-to-failure (Mtf) of each segment and physical dimensions, temperature and current density are determined. This model includes the effect of time-varying current density. The standard deviation of lifetime is also determined as a function of dimensions. A minimum order statistical method is used to compute the failure rate of the interconnect system. This method, which is applicable to current densities below  $10^6 A/cm^2$ , combines mask layout and simulation data from the design data base with process data to calculate failure rates.

A suite of software tools called Reliant (RELIability Analyzer for iNTerconnects) which implements the algorithms described above, is presented. Reliant fractures a conductor pattern into segments and extracts electrical equivalent circuits for each segment. The equivalent circuits are used in conjunction with a modified version of the SPICE circuit simulator to determine the currents in all segments and to compute reliability. An interface to a data base query system provides the capability to access reliability data interactively. The performance of Reliant is evaluated, based on two CMOS standard cell layouts. Test structures for the calibration of the reliability models are provided.

Reliant is suitable for the analysis of leaf cells containing a few hundred transistors. For MOS VLSI circuits, an alternative approach based on the use of an event-driven switch-level simulator is presented.

### Autobiography

David Frank Frost was born in Cape Town, South Africa on 5 November, 1951. He matriculated from Hottentots-Holland High School, Somerset West in 1969. He received the B.Sc. (Eng.) degree from the University of Stellenbosch in 1974 and the M.Eng. (Electronic) degree from the University of Pretoria in 1979. From 1976 to 1978, he was employed in the Microelectronics Division of the National Electrical Engineering Research Institute, designing analog and digital bipolar integrated circuits. Since 1979, he has held the position of Senior Lecturer in the Department of Electrical & Electronic Engineering at the University of Stellenbosch. He has presented a variety of courses in electronics and IC processing and design, and was responsible for the establishment of processing and CAD laboratories at Stellenbosch. In 1986 he spent a 12 month sabbatical period in the Department of Electrical & Computer Engineering at Clemson University in the USA.

The work described in this thesis was carried out at Stellenbosch and Clemson during the period June 1985 to June 1988.

#### Preface

The development of CAD tools for the prediction of VLSI reliability is an emerging field of research. It has developed amid growing concern in the semiconductor industry about the reliability of the increasingly complex integrated circuits (ICs) being produced today. Historically, ICs have always been considered to be components having a high inherent reliability, in fact the move toward VLSI has been made possible by rapid increases in the reliability of the devices produced. However, reductions in minimum feature size and the use of thinner oxides and shallower diffusions produce increased stresses within ICs, accelerating the process of wearout. The probability of IC failure due to wearout is further enhanced by an exponential growth in the complexity of a single die. The growing importance of Application Specific Integrated Circuits (ASICS), with their limited production volumes, are forcing the industry to re-evaluate costly traditional methods of reliability qualification through testing and burn-in procedures. These factors increase the need for CAD-based methods to predict the reliability of an integrated circuit, before manufacture.

This is the first work to propose and implement a Reliability Analysis Software Tool for predicting the failure rate of an IC during the design phase. Several original contributions to knowledge are contained in this work and these are summarized below.

- The application of a system reliability model to the problem of IC interconnect reliability is discussed in chapter 3 and the following points should be noted.
  - 1. The identification of 5 commonly-occurring features of interconnect patterns, called segment types. These are straight runs, steps (caused when a conductor crosses a discontinuity in the wafer surface), contact windows, vias and bonding pads. The interconnect pattern is fractured into a collection of statistically independent conductor segments, each of which may be classified according to the 5 types mentioned above.

- 2. The development of suitable reliability models for these segments.
- 3. The use of a minimum order statistical approach to calculate the reliability of the interconnect pattern, when subject to electromigration.
- 4. The evaluation of interconnect reliability in terms of an actual reliability figure-of-merit (instantaneous failure rate), as opposed to the common practice of considering only the current density in each conductor. The method developed in this thesis is superior to the current-density approach in three respects. Firstly, considering only the current density ignores the complex dependence of median time to failure (Mtf) and standard deviation (σ) on conductor dimensions. Secondly, the effect of circuit complexity is not taken into account. This is of primary importance in VLSI. Finally, the interconnect failure rates obtained here may be easily combined with similar figures for other failure modes. The approach used is therefore consistent with the long-term goal of estimating the reliability of the whole IC during the design phase.
- A suite of software tools called Reliant (RELIability Analyzer for iN-Terconnects) which fractures the interconnect pattern into segments, extracts the equivalent circuit of each branch and uses a circuit simulator (SPICE) to determine the reliability of all segments, has been developed. Reliant includes an interface to a data base query system which may be used to access reliability data interactively.
- A method for determining approximate current waveforms in a MOS VLSI circuit using a switch-level simulator, is presented. This approach has a considerable speed advantage when compared to the use of a circuit simulator and enables reliability data to be collected concurrently with the process of design verification.

This thesis represents the author's original work and has not been submitted in any form to another University for the purposes of obtaining a degree. Where use has been made of work carried out by others, this has been duly acknowledged in the text.

### Acknowledgements

The author would like to thank the following for their contributions to this work.

- My supervisor, Prof. K.F Poole, for his constant enthusiasm, support and encouragement.
- Professors J.W. Lathrop and J.W. Harrison of Clemson University for useful discussions on electromigration and reliability.
- Kevin Kemp, for assistance with debugging Reliant and preparing the circuit examples in Chapter 4.
- David Haeussler, for coding part of the LoadQuadTree procedure used in Extrem.
- The University of Stellenbosch, for granting me sabbatical leave for 10½ months during 1986 to work on this project.
- The financial support of the Foundation for Research Development, the Harry Crossley Bursary Fund and Clemson University is also gratefully acknowledged.

## Contents

| 1 | INT                                         | EGRATED CIRCUIT RELIABILITY                                          | 1  |  |  |
|---|---------------------------------------------|----------------------------------------------------------------------|----|--|--|
|   | 1.1                                         | Introduction                                                         | 1  |  |  |
|   | 1.2                                         | Failure Mechanisms in Integrated Circuits                            | 3  |  |  |
|   |                                             | 1.2.1 Electromigration (EM) in Thin Metal Films                      | 4  |  |  |
|   |                                             | 1.2.2 Time-dependent Dielectric Breakdown of Gate Ox-                |    |  |  |
|   |                                             | ides                                                                 | 5  |  |  |
|   |                                             | 1.2.3 Threshold-voltage Shifting Effects in MOS Devices.             | 5  |  |  |
|   |                                             | 1.2.4 Alpha-particle Induced Soft Errors                             | 6  |  |  |
|   |                                             | 1.2.5 Electrostatic Discharge (ESD)                                  | 7  |  |  |
|   | 1.3                                         | A Statistical Approach to Design for Reliability                     | 8  |  |  |
|   | 1.4                                         | Integrating Design-for-Reliability with the CAD Environ-             |    |  |  |
|   |                                             | ment                                                                 | 9  |  |  |
|   | 1.5                                         | Previous Work                                                        | 10 |  |  |
|   | 1.6                                         | Summary of this Thesis                                               | 10 |  |  |
| 2 | A RELIABILITY MODEL OF AN INTERCONNECT SYS- |                                                                      |    |  |  |
|   | TE                                          | MI                                                                   | 12 |  |  |
|   | 2.1                                         | Introduction                                                         | 12 |  |  |
|   | 2.2                                         | A System Model of the Reliability of VLSI Interconnect Pat-          |    |  |  |
|   |                                             | terns                                                                | 13 |  |  |
|   | 2.3                                         | Characterizing the Reliability of Conductor Segments                 | 17 |  |  |
|   | 2.4                                         | Minimum Order Statistics                                             | 17 |  |  |
|   | 2.5                                         | Statistical Models of Electromigration Failure                       | 19 |  |  |
|   |                                             | 2.5.1 Distribution of Conductor Lifetimes                            | 19 |  |  |
|   |                                             | 2.5.2 Mtf Dependency on DC Current Density                           | 21 |  |  |
|   |                                             | 2.5.3 Mtf and Current Pulses                                         | 26 |  |  |
|   |                                             | 2.5.4 A General Model for Electromigration Caused by a               |    |  |  |
|   |                                             | Time-varying Current Density                                         | 29 |  |  |
|   | •                                           | 2.5.5 The Dependency of $t_{50}$ and $\sigma$ on Physical Dimensions | 30 |  |  |
|   | 2.6                                         | Summary                                                              | 33 |  |  |

|              | 2.7 An Example                                                                                                                                                                    | 33                               |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| 3            | RELIANT: A RELIABILITY ANALYZER FOR INTEGR CIRCUIT INTERCONNECTS  3.1 Overview                                                                                                    | <b>36</b><br>36                  |
| 4            | EVALUATION OF RELIANT  4.1 Calibrating the Reliability Models                                                                                                                     | 57<br>57<br>58<br>63<br>65<br>67 |
| 5            | REVIEW                                                                                                                                                                            | 71                               |
| 6            | REFERENCES                                                                                                                                                                        | 73                               |
| A            | DERIVATION OF THE FAILURE RATE OF A SERIES CONNECTED SYSTEM                                                                                                                       | S-<br>80                         |
| В            | FITTING A LOGNORMAL DISTRIBUTION TO TH<br>MINIMUM ORDER STATISTIC                                                                                                                 | E 81                             |
| $\mathbf{C}$ | VARIATION OF $t_{50}$ AND $\sigma$ WITH $n$                                                                                                                                       | 83                               |
| D            | RELIABILITY ANALYSIS OF A 3100 GATE CMOS STADARD CELL DEVICE  D.1 Approximate Models of Contact, Via and Step Segments  D.2 Derivation of The Failure Rate of a Power Bus Section | <b>86</b><br>86                  |
| $\mathbf{E}$ | A TEST CHIP FOR CALIBRATION OF RELIABILIT MODELS                                                                                                                                  | Y<br>89                          |

# F PUBLICATIONS BY THE AUTHOR WHICH RELATE TO THIS THESIS 92

# List of Figures

| 1.1  | Bathtub curve of instantaneous failure rate vs. time                        | 2   |
|------|-----------------------------------------------------------------------------|-----|
| 2.1  | A n-component series-connected system                                       | 14  |
| 2.2  | A n-component parallel-connected system                                     | 15  |
| 2.3  | Set of conductor segment types used to model reliability                    | 16  |
| 2.4  | $p(t)$ and $P(t)$ for the lognormal distribution $(t_{50} = 1, \sigma = 1)$ | 20  |
| 2.5  | $\log(p(t))$ vs. $\log(t)$ for minimum order statistic and lognor-          |     |
|      | mal distribution                                                            | 22  |
| 2.6  | $t_{50}'/t_{50}$ vs. $n$                                                    | 23  |
| 2.7  | $\sigma'/\sigma$ vs. $n$                                                    | 24  |
| 2.8  | $\delta T$ vs. $J$ (conductor thickness = $0.5\mu m$ , dielectric thickness |     |
|      | $=1\mu m)$                                                                  | 26  |
| 2.9  | A series of unidirectional, rectangular current pulses                      | 27  |
|      | $t_{50}$ vs. conductor width $W$                                            | 30  |
|      | $\sigma$ vs. conductor width $W$                                            | 31  |
|      | Typical conductor grain structure for different conductor                   | 0 - |
|      | widths                                                                      | 32  |
| 2.13 | Failure rate of interconnects (3100 gate CMOS circuit)                      | 35  |
| 3.1  | Block diagram of Reliant                                                    | 37  |
| 3.2  | Block diagram of Extrem                                                     | 40  |
| 3.3  | An example showing various intersections                                    | 43  |
| 3.4  | Determination of dominant current flow direction                            | 44  |
| 3.5  | Segmentation and extraction of example in Fig. 3.3                          | 48  |
| 3.6  | Equivalent circuit for the example in Fig. 3.5                              | 49  |
| 3.7  | Multiple storage, adaptive quad tree                                        | 52  |
| 4.1  | User interaction with Reliant                                               | 59  |
| 4.2  | Spice input file geinput.spc                                                | 60  |
| 4.3  | A part of Sirprice input file geinput.spr                                   | 61  |
| 4.4  | Layout of GEINPUT                                                           | 62  |
| 4.5  | Failure rate vs. time for GEINPUT                                           | 63  |

| 4.6  | Layout of GEBINC                                              | 64 |
|------|---------------------------------------------------------------|----|
| 4.7  | Failure rate vs. time for GEBINC                              | 65 |
| 4.8  | Identification of highly-stressed segments using Datatrieve . | 66 |
| 4.9  | Charge movement in a MOS circuit                              | 69 |
| 4.10 | Current source model of power bus                             | 70 |
| C.1  | $t_{50s}/t_{50}$ vs. $n$                                      | 84 |
| C.2  | $\sigma_s/\sigma$ vs. $n$                                     | 85 |
| D.1  | Layout of $V_{DD}$ and $Ground$ buses                         | 87 |
| E.1  | Layout of test chip                                           | 91 |

## List of Tables

| 2.1 | 3100 Gate CMOS standard cell circuit                        | 34 |
|-----|-------------------------------------------------------------|----|
| 4.1 | Dependencies of $t_{50}$ and $\sigma$ for each Segment Type | 57 |
| 4.2 | Summary: Reliant analysis results for GEINV and GEBINC      | 58 |

a market following impact and extractional consistence

# List of Symbols

| $E_a$             | Activation energy $(eV)$                               |
|-------------------|--------------------------------------------------------|
| k                 | Boltzmann's constant: $8.62 \times 10^{-5} eV/K$       |
| T                 | Absolute temperature $(K)$                             |
| q                 | Electron charge $(-1.602 \times 10^{-19}C)$            |
| $\kappa$          | Thermal conductivity of SiO2: $1.4W/m.K$               |
| $\rho$            | Resistivity of Aluminum $(3 \times 10^{-10} \Omega.m)$ |
| J                 | Current density $(A/cm^2)$                             |
| t                 | Time                                                   |
| p(t)              | Probability density function (Pdf)                     |
| P(t)              | Failure probability or cumulative distribution         |
|                   | function (Cdf)                                         |
| R(t)              | Reliability                                            |
| h(t)              | Instantaneous failure rate or hazard function $(FIT)$  |
| $t_{50}$          | Median time to failure (Mtf)                           |
| $\sigma$          | Standard deviation (Sigma)                             |
| $N(x,\mu,\sigma)$ | Normal distribution in $x$ having mean $\mu$           |
|                   | and standard deviation $\sigma$                        |
| Lambda            | Minimum feature size of fabrication process            |
|                   |                                                        |

### Chapter 1

### INTEGRATED CIRCUIT RELIABILITY

#### 1.1 Introduction

The reliability of any piece of equipment may be defined as the probability that it will perform it's required function under stated conditions for a stated period of time [O'Connor 1981]. Reliability has always been an important consideration for the designer of electronic systems, particularly those systems having military or aerospace applications. A major factor which limits the complexity of all systems is the maximum reliability achievable within the constraints of the available technology. The great technological advances in electronic engineering such as the development of the transistor (1947) and the integrated circuit (1958) resulted in dramatic improvements in the reliability of electronic equipment, along with improved performance and reduced cost. Each of these developments therefore made possible an increase in system complexity, while maintaining acceptable levels of reliability. However, since 1958 the development of IC technology has resembled an exponential function of time, rather than a series of step functions. As evidence of this, we may consider the annual doubling in the number of active devices realizable on a single chip. This growth has largely been due to reductions in minimum feature size, although improved process control has also allowed the maximum chip size to grow to approximately 10mm by 15mm. ICs containing in excess of 100 000 transistors are now common.

This trend has important implications for reliability. Firstly, it will be shown that device scaling results in an increase in electrical stress, with a resultant loss of reliability due to stress-induced wearout mechanisms. Sec-



Figure 1.1: Bathtub curve of instantaneous failure rate vs. time

ondly, if the individual devices or interconnects are subject to failure according to some random distribution, an increase in complexity (e.g. number of active devices on the chip) must necessarily increase the probability of failure at any given time. This is a fundamental issue which will be considered in detail in chapter 2 of this thesis, where a model for the relationship between reliability and complexity will be presented. It may be mentioned here in passing that when ICs were first developed, it was widely held that the reliability of an IC was independent of the number of active devices on the die, because they were all manufactured simultaneously in a single manufacturing process. This argument is based on a deterministic view of the IC in which, if one active device is "good", all will be "good". In fact, failure remains a random process and no matter how well controlled the manufacturing process may be, a distribution of times-to-failure for the devices on a single die will always be observed.

The reliability of an IC is traditionally pictured in terms of the bathtub curve of instantaneous failure rate versus time (Fig. 1.1). The unit of failure rate is the FIT (1FIT = 1 failure per 10<sup>9</sup> hours, or 0.1% per million hours). Three regions may be discerned on this graph. During the infant mortality phase, ICs exhibit a decreasing failure rate. Failures which occur here are the result of gross built-in flaws or defects in devices or interconnects, which fail rapidly when stressed. Only a small percentage of ICs contain such defects and as they are removed by failure, the failure rate of the remaining components decreases toward zero. During the wearout phase, failure of "good" (i.e. defect-free) components through wearout becomes significant and an increase in failure rate is observed. It must be emphasized that the same wearout mechanisms may be responsible for many of the infant mortality failures, with the presence of defects acting as an acceleration factor. There is also a close relationship between these early reliability failures and yield failures caused by defects. For example, a severe photolithographic defect may cause an open circuit in a conductor, which is classified as a yield failure [Stapper 1983], [Maly 1985]. A less severe defect of the same kind may only cause a local narrowing of the conductor, which would eventually lead to a reliability failure when the conductor was stressed.

Between the aforementioned two regions on the bathtub curve the failure rate approaches it's minimum value, which is the optimum region for operation of the IC. Temperature is a strong acceleration factor in most wearout mechanisms and operation at an elevated temperature may be used to accelerate the passage of the infant mortality phase. Burn-in procedures are commonly used in this way to qualify the reliability of ICs for military and aerospace applications. A typical burn-in procedure consists of operating the component at  $85^{\circ}C$  for 168 hours. Reliability qualification by burn-in can greatly increase the cost of a component, particularly in the case of small production volumes. A failure rate limit of 100FIT is widely used, and 10FIT has been proposed as a target for the near future.

#### 1.2 Failure Mechanisms in Integrated Circuits

The most common IC failure mechanisms reported in the literature will now be reviewed. The list is not exhaustive and no attempt is made to classify the failure mechanisms in order of importance. Only qualitative descriptions are given here, mathematical models for electromigration are presented in Chapter 2.

#### 1.2.1 Electromigration (EM) in Thin Metal Films

When an electron current flows through a metal conductor, it exerts two forces on the metal ions: an electrostatic force resulting from the interaction between the negatively charged electrons and the positively charged ions and a friction force commonly known as the "electron wind" force. In metals, which are good conductors, the electron wind force dominates [d'Heurle 1978-1]. The result of this force is a movement of material in the direction of electron current flow. Electromigration has been observed to occur by lattice diffusion in several bulk metals. However, in the case of thin polycrystalline films of Al and Au, the dominant mechanism for mass transport is migration along grain boundaries. The ratio between grain boundary flux and lattice flux has been estimated at  $10^6$  for Al [d'Heurle 1970].

Grain boundary electromigration may lead to the failure of a conductor when a flux divergence occurs at a point in the material. A local flux divergence may occur at the interface between three adjacent metal grains in a polycrystalline film (the so-called "triple point"). It may also result from variations in grain size or diffusion coefficient. Temperature gradients caused by localized heating also influence the rate and direction of mass transport. The most common failure mode is the formation of a void, which grows in size until it's diameter equals the width of the conductor, causing an open circuit. Prior to this catastrophic failure, the increase in resistance caused by the growth of the void may gradually degrade the performance of the circuit. A different failure mode occurs when the flux divergence causes mass accumulation, leading to the formation of a hillock or whisker. This may result in a short circuit to an adjacent conductor, or the rupture of an overlaying dielectric.

The process of electromigration is essentially current driven and the operating current density is the primary stress factor. The median time to failure (Mtf) of a conductor is inversely proportional to current density J when J is less than  $10^5 A/cm^2$  and decreases more rapidly for higher current densities. The current densities occurring in modern high-speed VLSI circuits (particularly bipolar) may equal or exceed this value. For example, in a conductor  $2\mu m$  wide and  $0.5\mu m$  thick, a current of only 1mA is sufficient to produce a current density of  $10^5 A/cm^2$ . The downward pressure on design rules can be expected to aggravate the situation [Woods 1984].

Temperature is an acceleration factor in electromigration. Activation energies of the order of 0.54eV for grain boundary electromigration have been reported by several authors (see [Black 1974], for example). The statis-

tical distribution of time to failure has generally been found to be lognormal. The median time to failure and standard deviation are strongly influenced by the dimensions of the conductor and the median grain size [Agarwala 1970], [Scoggan 1975], [Kinsbron 1980], [Vaidya 1980], [Iyer 1984].

# 1.2.2 Time-dependent Dielectric Breakdown of Gate Oxides

Since the development of the MOS transistor, there has been a steady decline in the thickness of the gate oxide used. In the case of transistors used as logic switches, thinner oxides were sought in order to reduce threshold voltages [Sze 1981]. In dynamic RAMs, thinner oxides compensate for the loss of storage capacitance due to shrinking layout dimensions. Oxide thicknesses of 200 - 300Å are typical of modern MOS processes, with 100Å - 200Å layers being used in 1 MBit DRAMS [Baglee 1986-1]. The absolute breakdown field strength of Silicon dioxide is approximately 10MV/cm. The field strength in a dielectric 100Å thick, with 5V applied is 5MV/cm, or 50% of the absolute breakdown value. As the field strength approaches the breakdown value, there is an increase in both the conduction through the oxide due to Fowler-Nordheim tunneling and the incidence of oxide failure. Both these phenomena are associated with the presence of defects in the oxide, such as trapped particles or pinholes. The distribution of time to failure is lognormal, with applied voltage the primary stress factor. Activation energies of 0.6eV to 0.8eV have been reported for an Al-SiO<sub>2</sub> system [Anolick 1980].

# 1.2.3 Threshold-voltage Shifting Effects in MOS Devices

Some physical phenomena may give rise to a shift in the threshold voltage  $V_T$  of a MOS transistor after manufacture. This is not a catastrophic failure, but a sufficiently large shift in  $V_T$  will cause the circuit to malfunction. The time to failure depends on the criterion of failure, i.e. the maximum  $V_T$  shift which will still allow the particular circuit to function correctly. Threshold shifts are caused by mobile ions trapped in the gate oxide [Lycoudes 1980] and hot-carrier injection from the channel into the gate oxide [Eitan 1981], [Takeda 1982], [Sabnis 1986]. The incidence of the former is related to Sodium ion contamination during the fabrication process and may be largely eliminated through careful processing. Hot-carrier injection is prevalent in short-channel transistors, where large electric fields occur

in the vicinity of the drain. When the electric field exceeds approximately 100kV/cm, the carriers absorb more energy from the field than they are able to lose through scattering and their net energy level increases, relative to the energy levels of conduction and valence bands. Because these carriers are no longer in thermal equilibrium with the lattice, they are known as "hot-carriers". The hot-carriers produce more electron-hole pairs through impact ionization. Once a certain energy threshold is reached, this process becomes self-sustaining and avalanche multiplication begins. In a n-channel device, most of the electrons produced by impact ionization are collected by the drain, while the holes flow to the substrate to form the substrate current. The magnitude of the substrate current may be used as a measure of the incidence of hot-carrier formation in transistors. A small percentage of the hot carriers will have sufficient energy to surmount the  $Si-SiO_2$  energy barrier and may be injected into the oxide. Both holes and electrons may enter the oxide in this manner, although the effect of hot-electrons has been more widely reported. The energy barrier for electrons is 3eV, as opposed 4eV for holes, which suggests that most hot-carrier injection is the result of electrons. An injected charge  $\delta Q$  give rise to a shift in threshold voltage of  $\delta V = \delta Q/C_{ox}$ . The density of fast states at the  $Si\text{-}SiO_2$  interface near the drain is also increased.

When constant-voltage scaling is applied to the device, the magnitude of the electric field increases and with it the likelihood of hot-carrier injection. The time to failure has been found to be a negative exponential function of applied voltage [Sabnis 1986]. Solutions to this problem which have been proposed are the graded drain transistor [Takeda 1982] and the lightly doped drain (LDD) transistor [Ogura 1981]. Both these techniques attempt to reduce the electric field in the vicinity of the drain. Unfortunately, it has been found that structures fabricated in this manner are more susceptible to damage by electrostatic discharge (see paragraph 1.2.5) [Duvvury 1986].

#### 1.2.4 Alpha-particle Induced Soft Errors

The phenomenon of soft errors in DRAMs caused by incident  $\alpha$ -particles was first reported in 1979 [May 1979]. These particles, which originate in the metals used in the packaging and interconnect layers, may have high energy levels. When an  $\alpha$ -particle passes through Silicon, it produces electron-hole pairs by impact ionization along it's path. This series of electron-hole pairs is known as an  $\alpha$ -particle track. The charge generated in this manner may cause a soft error if it is collected by a memory storage cell, or if reaches one of the bit lines during the time interval when the

bit lines are floating. The latter effect may be reduced by minimizing the ratio of bit-line floating time to cycle time. As the size of DRAM storage cells decreases, the critical charge  $Q_{crit}$  needed to change the state of the cell is reduced accordingly. Therefore scaling the memory cell increases it's susceptibility to soft errors.

Some techniques for reducing the Soft Error Rate (SER) of DRAM cells by structural modification have been proposed. The so- called HI-C technique [Tasch 1978] uses a shallow implant below the storage node to increase the capacitance and to reduce the storage node depletion volume. The result is a memory cell with an increased critical charge and a reduced ability to collect carriers generated by  $\alpha$ -particles. Sai-Halasz has proposed a blanket buried n-type grid in the p-type substrate of the DRAM, which acts as a collector for the excess electrons [Sai-Halasz 1982]. The use of trench capacitor structures has also been found to reduce the SER [Ishiuchi 1986], [Baglee 1986-2].

#### 1.2.5 Electrostatic Discharge (ESD)

Damage to an IC through ESD may occur when a static charge buildup on an external object is allowed to discharge through one of the pins on the integrated circuit. Such static charges are caused by triboelectric generation or electrostatic induction [Moss 1986]. The results of the discharge may include dielectric breakdown, junction breakdown, metallization damage, latchup, soft errors and the creation of latent defects, which fail at some future time. Methods of avoiding ESD damage fall into two categories:

- Avoidance of static charge buildup by improved handling and assembly techniques;
- Design of protection networks for the input and output pins. Typical VLSI input protection networks include a thick-oxide nMOS transistor providing primary protection and a diffused resistor and a field plate diode for secondary protection. For output protection, it is possible to design the output driver transistor to safely withstand 5kV discharges [Duvvury 1986].

The trend toward VLSI has increased ESD susceptibility, due to the use of thinner dielectrics and reduced conductor spacings.

# 1.3 A Statistical Approach to Design for Reliability

It has been shown for several important failure mechanisms that the shrinking of layout design rules has a negative impact on reliability. Razdan and Strojwas define layout design rules as follows [Razdan 1986]:

Layout design rules are constraints placed on the designer by the process capabilities. The rules are set such that chips following the rules will have an acceptable yield and the number of circuits which may be placed on the chip is maximized.

This definition should be extended, by including the requirement that the design rules should also result in circuits having an acceptable reliability.

The requirements for optimization of yield and reliability may be quite different. In developing design rules based on yield considerations, the aim is to determine a set of dimensional constraints (such as minimum size, minimum overlap, minimum spacing) for each mask layer which will guarantee some minimum yield figure, given that the circuit is subject to catastrophic failure caused by shorts, breakdown or parasitic effects. Design rules are determined by considering the effect on circuit operation of disturbances in the fabrication process. These disturbances include local variations such as spot defects and global variations caused by underetching or misalignment. The design rules determined in this way generally have the property that they are absolute, in that they do not depend on the electrical variables (voltage, current) in particular parts of the circuit. They are also global, being valid for all parts of a layout or several layouts using the same technology.

Design rules for reliability, on the other hand, must be designed to limit the electrical stress applied to each part of the circuit. For example, a design rule for electromigration which has been widely used in the past, states that the current density in a conductor may not exceed  $10^5 A/cm^2$ . This rule is a relative one, because it requires a priori information about the current in a conductor before the conductor width can be established. It is also a local rule, in that it's application will result in different conductor widths being used in different parts of the circuit.

While design-for-reliability rules such as this have proven valuable to designers in the past, they have limitations. Firstly, they do not give any quantitative indication of the reliability of a structure (e.g. a conductor or transistor) designed according to the stated rules. Consequently, the designer is not able to predict the reliability of the structure during the

design phase, nor is it possible to combine the reliability figures for all structures in order to predict the overall reliability of the IC. Secondly, these rules do not explicitly indicate the relationship between reliability and chip surface area and so it is not possible to globally optimize the trade-off between chip size and reliability. The downward pressure on design rules may demand that future designs be optimized in this manner. Thirdly, it is often difficult to interpret the rules correctly. For example, it is not clear how the maximum current density rule should be applied to circuits where currents are complex functions of time.

This thesis proposes that design-for-reliability should not be based solely on design rules, but that predictive models for failure mechanisms be used to predict the actual reliability of each device or conductor. The decision as to what a critical dimension should be, should be based on the actual reliability implications of that decision. This may be weighed against other considerations, such as area or circuit performance. If this approach is applied to all the significant failure modes, it will become possible to predict the reliability of the IC as a whole, during the design phase.

This approach offers two significant benefits for the IC manufacturer:

- the ability to produce more optimized designs, and
- the ability to predict product reliability without resorting to costly qualification procedures.

# 1.4 Integrating Design-for-Reliability with the CAD Environment

To be effective, this design-for-reliability strategy must be supported by the CAD environment. Software tools must be developed to provide reliability data to the design data base, which may be used to create new designs, or to modify existing ones. Much of the data required for reliability prediction is already available in the design data base and new tools should make use of this wherever possible.

Two paradigms for CAD tools supporting design-for-reliability are proposed.

A Reliability Analysis Tool This tool makes use of the layout description file, simulation input waveforms and process information to compute the reliability of a chip in the presence of various failure mechanisms. Reliability is evaluated in a piecewise manner, with reliability

data on specific failure mechanisms or structures stored individually in the data base. It is possible to interrogate this data base interactively.

Synthesis Tools Tools which perform the operations of device design, placement, routing and compaction, taking reliability into consideration. These tools are designed to optimize the trade-off between reliability and other design parameters.

The Reliant program described in Chapter 3 is an example of a reliability analysis tool.

#### 1.5 Previous Work

When work on this thesis began in June 1985, no reports of software tools for IC reliability prediction existed in the literature. In June 1986, the author and Prof. K.F. Poole submitted a paper for publication in the IEEE Transactions on Reliability, describing methods for the implementation of such a program [Frost 87]. In December 1986, two papers were presented at the International Conference on Computer-aided Design (ICCAD- 86) which partially addressed the problem. [Hall 1986] described a program called SPIDER, which determines the current density in interconnects using a modified finite difference approach. [Hohol 1986] described RELIC, a program which computes cumulative wear functions for a number of failure modes. These papers address aspects of the problem, but do not provide a predictive method of assessing the reliability of a VLSI design, in terms of hazard rate, probability density function (Pdf) or cumulative distribution function (Cdf). Hence this work is the only one which offers a solution to this problem.

#### 1.6 Summary of this Thesis

Mathematical and statistical models for IC interconnect reliability are developed in Chapter 2. System models for reliability are introduced and a method for predicting the reliability of an IC, based on a systems approach is presented. This model is used to predict the reliability of a VLSI interconnect system which is subject to electromigration. The limitations on the applicability of this model are discussed in detail. Models for the dependence of interconnect reliability parameters on current density, physical dimensions and temperature are developed, based on a critical analysis

of the available literature. Methods for determining these parameters are presented.

Chapter 3 is devoted to the design of the Reliant software analysis tool for VLSI interconnects, which is based on the mathematical and statistical models developed in chapter 2. The structure of this program is described and the choice of algorithms and data structures is explained.

In Chapter 4, the performance of Reliant is evaluated using two VLSI leaf cells. The limitations of reliability prediction tools based on circuit simulators are discussed and an alternative approach to MOS VLSI prediction using a switch-level simulator, is proposed.

Chapter 5 summarizes the results achieved and suggests some directions for future research.

### Chapter 2

# A RELIABILITY MODEL OF AN INTERCONNECT SYSTEM

#### 2.1 Introduction

In this chapter, a reliability model of a VLSI interconnect system is developed. The interconnect system is subject to wearout due to electromigration induced void formation.

There were several reasons for the choice of this particular reliability problem. Interconnects are becoming an increasingly important factor in VLSI, occupying a large percentage of the die area and limiting the speed performance. It is expected, therefore that interconnect reliability will have a major influence on overall chip reliability. The large number of papers on electromigration which have appeared since 1970 support this view and also provide a useful source of experimental data. A significant feature of electromigration is that it is not confined to individual active devices. As the interconnect pattern is more closely associated with circuit design rather than device design, a method for designing reliable interconnects is of particular value to the designers of circuits and software tools for automated layout.

The method was developed with the following goals in mind.

- It should be "design-sensitive", reflecting the factors which are influenced by the circuit designer, rather than the process engineer.
- The algorithms used should be simple enough to enable complex circuits to be analyzed in a reasonable time.

Because the electromigration process is not yet fully understood, the
best available physical models should be used. Software should be
constructed in such a way that better models may be incorporated as
they become available.

#### 2.2 A System Model of the Reliability of VLSI Interconnect Patterns

The interconnect patterns found in VLSI circuits are highly complex. However, these patterns generally consist of collections of a small number of features, such as straight conductors, contact cuts and vias. These features will be referred to as *segments*. The reliability of each segment depends on the current density in the segment and it's dimensions. Assume that

- the reliability of each segment is known;
- the reliability of each segment is independent of all other segments.

The interconnect pattern as a whole may then be viewed as a system of segments, and the reliability thereof may be determined using the theory of system reliability.

Fig. 2.1 shows a block diagram of a *n*-component series- connected system. A system of this type has no redundancy and will fail when any one of it's components fails. The reliability of a series-connected system is equal to that of the minimum order statistic of the reliability of it's components. The minimum order statistic represents the "weakest" component in the system. In Fig. 2.2, a parallel-connected system having *n* parallel paths is shown. This circuit possesses redundancy, as all *n* components must fail to cause system failure. The reliability of this system is described by the reliability of the maximum order statistic, or "strongest" component. The reliability of complex systems may be determined by the decomposition of parallel and serial paths.

The pattern of a complete interconnect mask layer may be treated as a series-connected system if and only if the failure of any segment therein will lead to the failure of the IC. This restriction therefore excludes chips having built-in redundancy at the functional level. Furthermore, the interconnect pattern itself may not contain redundant paths. In the remainder of this thesis, non-redundant circuits will be assumed.

In defining a set of segment types, the following factors were considered.

• The set must be comprehensive, i.e. it must include all features commonly occurring in VLSI interconnect patterns.



Figure 2.1: A n-component series-connected system

- It must be sufficiently "fine-grained" to model the simplest features which may occur individually in an interconnect pattern.
- The condition of mutual statistical independence of the segments (i.e. the reliability of any segment is independent of the states of all other segments) must always be satisfied.
- It must be possible to characterize the reliability of each segment type independently by means of test structure measurements.

The following five segment types were chosen, based on these considerations (see Fig. 2.3).

- A straight conductor of length L, width W. This will be referred to as a Run segment.
- A Contact segment containing a contact cut of width W, height H, and a surrounding area of metal. A contact only occurs on the lower metal layer (Metal1) in the case of a multilayer metallization process.



Figure 2.2: A n-component parallel-connected system

- A Via segment, containing a via cut of width W, height H, and a surrounding area of metal. A via only occurs in a multilayer metallization process, between Metal1 and Metal2 layers.
- Step segment, which occurs when a metal track runs over a discontinuity in the underlying surface. This segment type was included to model the conductor thinning and stress concentration which may occur in such cases. In the following discussion, a step will only be considered to have occurred when a Metal1 conductor crosses a Polysilicon line at a 90° angle. It is assumed that the intermetal dielectric has sufficiently planarized the surface that steps will not occur on Metal2. Note that the length L of this segment is defined as being in the direction of current flow and the width W in the orthogonal direction.
- A Pad segment is used to model the reliability of a bonding pad.

Each segment has some dimensional parameters associated with it. The process of fracturing the interconnect pattern into a set of segments consists of identifying instances of the segment primitives and determining the dimensional parameters of each instance. The requirement of statistical independence places two restrictions on the points at which a conductor is fractured.



Figure 2.3: Set of conductor segment types used to model reliability

- The length of a segment must be greater than a minimum value. A void which forms in a grain boundary will generally have a diameter smaller than the median grain size, generally 3μm or less. Also, there is usually a hillock of material associated with the formation of a void, which may drift some distance along the conductor. For statistical independence, the minimum segment size must contain the region of void and hillock growth. For the first 10% of failures, it has been found that voids are generally smaller than the grain size and that when a hillock forms, it always does so within 10μm of a void [LaCombe 1986]. It appears reasonable, therefore to take 3μm as the absolute minimum segment length, with 10μm a more conservative value.
- The flux divergence must be approximately zero at the interface between two segments and the current flow direction must be perpendicular to the interface. This ensures that the current flow pattern in a segment is not influenced by the patterns in adjacent segments.

A restriction must also be placed on the maximium current density, to avoid thermal interaction. This is discussed in paragraph 2.5.2. The segment dimensions shown in Fig. 2.3 are derived as follows.

Runs The segment has a length  $L >= 10 \mu m$  and width W.

Contacts and Vias These segments consist of a cut of width  $W_c$  and length  $L_c$ . The surrounding metal width is taken as 2.Lambda, where Lambda is the minimum feature size of the fabrication process. The overall metal width is therefore  $W = W_c + 4.Lambda$ . The distance from the interface to the edge of the cut is taken as W, at which point the flux divergence is approximately zero [Horowitz 1983]. The overall metal length is therefore  $L = L_c + W_c + 4.Lambda$ .

Steps A step has a width W and a length  $L >= 3\mu m$ .

Pads A pad has a length  $L_p$  which is the length of the bond wire contact area and a width  $W_p$ .

# 2.3 Characterizing the Reliability of Conductor Segments

The approach developed in the preceding paragraph assumes that the reliability of each segment is known. This data must be acquired by means of tests on a range of test structures. For accurate reliability prediction, the reliability parameters of the manufacturing process must be fully characterized in this manner, on a regular basis.

#### 2.4 Minimum Order Statistics

Models for the reliability of the minimum order statistic of a collection of segments will now be presented.

Each segment has a probability density function (Pdf) of lifetime, p(t). The associated probability that the segment has failed, or cumulative distribution function (Cdf), is P(t). Taking the time at which stress is applied to the segment as 0, the Cdf is the integral of the Pdf from time 0 to time t:

$$P(t) = \int_0^t p(\tau)d\tau \tag{2.1}$$

The reliability of the segment R(t), which is the probability that it has not failed by time t, is

R(t) = 1 - P(t) (2.2)

The hazard function, or instantaneous failure rate h(t) is the rate at which segments would fail, if a large number of segments were evaluated.

$$h(t) = \frac{p(t)}{1 - P(t)} \tag{2.3}$$

Consider an interconnect pattern consisting of n segments, each having a different Pdf  $p_i(t)$  and Cdf  $P_i(t)$ . The Cdf of the interconnect pattern  $P_s(t)$  is given by

$$P_s(t) = 1 - \prod_{i=1}^{n} (1 - P_i(t))$$
 (2.4)

For the special case where all segments are identical, this reduces to

$$P_s(t) = 1 - (1 - P(t))^n (2.5)$$

It can also be shown that the instantaneous failure rate  $h_s(t)$  of the interconnect pattern is simply the sum of the failure rates of the individual segments:

$$h_s(t) = \sum_{i=1}^n h_i(t)$$
 (2.6)

The proof of this theorem which is found in most statistics textbooks uses a Markov chain model, where each element has an exponential lifetime distribution. In fact, this is a general result which is independent of the nature of the distribution used. A proof of this theorem appears in Appendix A.

As mentioned in Chapter 1, a maximum failure rate of 10FIT is considered acceptable for the chip as a whole. This extremely low failure rate is attained when  $P_i(t) \ll 1$ . When considering only the "early failures", the following approximations may be made:

$$P_s(t) \approx \sum_{i=1}^n P_i(t) \tag{2.7}$$

$$h_i(t) \approx p_i(t)$$
 (2.8)

$$p_s(t) \approx \sum_{i=1}^n p_i(t) \tag{2.9}$$

The method described here is not limited to conductor failure. It may be applied across several failure mechanisms, provided that the aforementioned assumptions remain valid. Equations 2.4 and 2.6 may be used to predict the Cdf and failure rate of the IC, taking all failure mechanisms into account.

# 2.5 Statistical Models of Electromigration Failure

The choice of distributions for the conductor segments, and the dependency of statistical parameters on various factors will now be considered.

#### 2.5.1 Distribution of Conductor Lifetimes

In the following discussion, the lifetime of a conductor segment refers to the time at which an open-circuit occurs. However, the analysis method presented here may also be applied to other lifetime definitions, e.g. the time at which a 10% resistance increase has taken place.

In the literature, the lognormal distribution is used by most authors to model electromigration failure (see [Black 1974], for example). This distribution is generally found to fit experimental lifetest data well. The form of the lognormal Pdf is

$$p(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-\frac{1}{2} \left[\frac{\ln t - \ln t_{50}}{\sigma}\right]^2\right)$$
 (2.10)

The lognormal distribution is characterized by two parameters, the median time to failure (Mtf)  $t_{50}$  and the standard deviation  $\sigma$ . The lognormal distribution is formed by substituting  $x = \ln(t)$  and  $\mu = \ln(t_{50})$  in a normal distribution  $N(x, \mu, \sigma)$ . Note that unlike the normal distribution, the median is not equal to the mean in a lognormal distribution. Also, the parameter  $\sigma$  is the square root of the variance of the prototype normal distribution and not of the lognormal distribution itself. Fig. 2.4 shows the Pdf and Cdf of a lognormal distribution having  $t_{50} = 1$  and  $\sigma = 1$ .

The theoretical basis for the use of the lognormal distribution to model thermally-activated failure mechanisms such as electromigration is now considered. In a polycrystalline metal thin film with an ideally textured grain, the atomic flux along a grain boundary is [d'Heurle 1978-2]:



Figure 2.4: p(t) and P(t) for the lognormal distribution  $(t_{50} = 1, \sigma = 1)$ 

$$J_b = \frac{N_b \delta D_b Z_b^* q \varepsilon}{dkT} \tag{2.11}$$

where  $N_b$  is the atomic density,  $\delta$  is the effective width of the boundary, d is the average grain size,  $D_b$  is the diffusion coefficient,  $Z_b^*q$  is the effective charge, and  $\varepsilon$  is the applied electric field. The diffusion coefficient is a negative exponential function of temperature:

$$D_b = D_o \exp\left(-\frac{E_a}{kT}\right) \tag{2.12}$$

Substituting 2.12 in 2.11 and writing  $\varepsilon = \rho J$  where  $\rho$  is the resistivity of the metal and J the electron current density, we obtain

$$J_{b} = \frac{N_{b}\delta Z_{b}^{\star}q\rho JD_{o}}{dkT} \exp\left(-\frac{E_{a}}{kT}\right)$$
 (2.13)

The time to failure  $t_L$  is inversely proportional to the rate of mass transport. Neglecting the absolute temperature term below the line in equation (2.13), the following expression for  $t_L$  is obtained:

$$t_L = \frac{C}{J} \exp\left(\frac{E_a}{kT}\right) \tag{2.14}$$

where C is a constant. The inverse exponential dependence on absolute temperature (Arrhenius relationship) shows the influence of thermal energy level on atomic mobility.

Equation (2.14) shows that  $t_L$  would be lognormally distributed if the thermal energy levels are normally distributed. Therefore, apart from it's usefulness in describing measured data, the use of the lognormal distribution would appear to have a good theoretical basis. However, an inconsistency arises when considering conductors of different lengths. As the lognormal distribution does not have the property of closure, a seriesconnected system of segments, each having a lognormal lifetime distribution, will have a lifetime distribution which is not lognormal. This poses a theoretical problem when comparing conductors having different lengths. Consider a conductor of length L, which has a lognormal lifetime distribution. If two of these conductors are connected in series, the resulting conductor of length 2L will not have a lognormal lifetime distribution. In practice, the lognormal distribution is used to fit conductor lifetime data regardless of the conductor length.

It has been proven empirically that the minimum order statistic produces an approximately lognormal distribution for early failures (see Appendix B). A lognormal distribution is fitted to the minimum order statistical distribution in the region  $t << t_{50}$  by choosing appropriate values of  $t_{50}$  and  $\sigma$ . These fitted parameter values are called  $t_{50}'$  and  $\sigma'$  respectively. Fig. 2.5 shows the result for  $n=10^2$ ,  $10^4$  and  $10^6$  elements. A good correspondence is obtained over 7 decades of p(t), for all values of n. This is consistent with the experimental observations of LaCombe, et al that the early failures are lognormally distributed for various conductor lengths [LaCombe 1986]. Figs. 2.6 and 2.7 show the normalized variation of  $t_{50}'$  and  $\sigma'$  with n, respectively. These graphs may be used to determine the parameters of a single via from the parameters of a via chain test structure.

#### 2.5.2 Mtf Dependency on DC Current Density

The Mtf of the lifetime distribution is described by equation 2.14, with constant C taking a specific value. This constant will be denoted by the symbol G, to indicate it's dependence on geometric factors:

$$t_{50} = \frac{G}{J} \exp\left(\frac{E_a}{kT}\right) \tag{2.15}$$



Figure 2.5:  $\log(p(t))$  vs.  $\log(t)$  for minimum order statistic and lognormal distribution

This first-order theoretical expression does not adequately model the dependence of Mtf on current density at all current levels. The following expression, originally due to [Black 1968] has been widely used:

$$t_{50} = \frac{G}{J^n} \exp\left(\frac{E_a}{kT}\right) \tag{2.16}$$

where n is an exponent whose reported values in the literature vary from 1 to 5 [Huntingdon 1961], [Black 1968], [Blair 1970]. The measured values of n show an increase from 1 for  $J \leq 10^5 A/cm^2$  to 5 for  $J \geq 2 \times 10^6 A/cm^2$ . The theoretical expression 2.15 therefore only agrees with measured results for  $J \leq 10^5 A/cm^2$ . In addition to the problem of choosing the correct value of n, variations in the value of  $E_a$  with stress have also been reported [Partridge 1985].

Modelling the effects of stress and temperature on Mtf is extremely important for accurate reliability prediction, because measurements are always performed on test structures at elevated temperatures and current densities. The results of these tests are extrapolated to normal operating



Figure 2.6:  $t'_{50}/t_{50}$  vs. n

conditions using expressions such as 2.16. The use of incorrect values of n and  $E_a$  can lead to highly inaccurate predictions, when extrapolating over several hundred degrees of temperature and two or three decades of current density. Recently, McPherson proposed that the variations in n and  $E_a$  may be accounted for by using a generalized Eyring model for electromigration, in which the reaction rate is determined by the stress-dependent free energy of activation [McPherson 1986]. Writing McPherson's result in a slightly different form, we obtain the following expression for Mtf:

$$t_{50} = \frac{G}{J_{eff}} \exp\left(\frac{E_a}{kT}\right) \tag{2.17}$$

where

$$J_{eff} = J_{eff}(J) = \frac{\sinh(\Psi J)}{\Psi}$$
 (2.18)

and

$$\Psi = \Psi_0 + \frac{\Psi_1}{kT} \tag{2.19}$$

Constants  $\Psi_0$ ,  $\Psi_1$  and  $E_a$  are determined by measurement. The effective current density  $J_{eff}$  is asymptotically equal to the nominal current density



Figure 2.7:  $\sigma'/\sigma$  vs. n

J, for  $J << 1/\Psi \ (\approx 5 \times 10^5 A/cm^2)$ . For J above this value,  $J_{eff}$  increases exponentially. Equation 2.17 provides the best model for Mtf currently available.

The existence of a current density threshold in Aluminum, below which no electromigration takes place, has been proposed by Blech [Blech 1976]. This hypothesis is based on measurements of atom drift velocity using the Blech-Kinsbron edge displacement method [Blech 1975]. The test structure consisted of a stripe of Aluminum deposited on a layer of Titanium Nitride. An electric current will tend to concentrate in the stripe, because of it's lower resistivity and the stripe is displaced in the direction of current flow. The rate of displacement is used as an estimate of average ion drift velocity. In testing conductors in this way, Blech found that no displacement occurred at current densities less than  $1.1 \times 10^5 A/cm^2$  for a  $115 \mu m$  long conductor and that this threshold increased with decreasing conductor length. English, et al have repeated this experiment with fine-grained Aluminum and report a threshold which is a factor of 4 lower [English 1983].

Blech explained this effect by suggesting that a compressive stress buildup occurs as the current-carrying portion of the conductor is forced against the non-current carrying portion beyond the anode. The total force applied to the conductor by the electron current is proportional to the product of the current and the conductor length, and this force must exceed the elastic deformation limit of the metal before hillocks form at the anode.

The current density threshold concept has also been used in models of interconnect failure due to void formation [Gardner 1987], [Hohol 1986]. It has not been used here, for the following reasons:

- The conclusion that the overall length of the conductor influences the rate of mass flow, does not correlate with the observation by LaCombe, et al that electromigration failure is a local phenomenon. The Blech-Kinsbron structure will always produce a global movement of material, which will be related to the total force applied to the conductor. Void formation, on the other hand, is determined by the conditions pertaining at a grain boundary triple-point. Therefore the extrapolation of results from the former case to the latter case does not have a sound basis.
- In the case of a conductor whose width is less than the median grain size, the conductor is effectively divided into "bamboo" sections, and the overall conductor length can have no influence on void formation at triple points.

When considering the effect of current density on Mtf of conductor segments, the possibility of thermal interaction between neighboring segments must be considered. As statistical independence is a requirement, the maximum current density must be limited to prevent significant self-heating. Consider a conductor of length L, width W and thickness T on the surface of a Silicon wafer. The insulating  $SiO_2$  layer has a thickness H. The thermal resistance of the dielectric layer is given by the following equation, if fringing effects are ignored:

$$R_{th} = \frac{H}{\kappa LW} \tag{2.20}$$

where  $\kappa$  is the thermal conductivity of  $SiO_2$ . If a current I flows through the conductor, the dissipation is

$$P_d = I^2 R = J^2 \rho L W T \tag{2.21}$$

where  $\rho$  is the resistivity of Aluminum. Assuming an isothermal substrate, the rise in conductor temperature due to self-heating is

$$\delta T = P_d R_{th} = \frac{\rho T H J^2}{\kappa} \tag{2.22}$$



Figure 2.8:  $\delta T$  vs. J (conductor thickness =  $0.5\mu m$ , dielectric thickness =  $1\mu m$ )

The rise in temperature as a function of J is shown in Fig. 2.8 for a conductor  $0.5\mu m$  thick, on a  $1\mu m$  dielectric. The self-heating effect (and hence the possibility of thermal interaction) increases rapidly above  $J=10^6 A/cm^2$ . This places an upper bound on the current density for which accurate results may be expected from the method used.

### 2.5.3 Mtf and Current Pulses

Thus far, the models for Mtf presented have assumed a constant (DC) current density J flowing through a conductor segment. In order to accurately predict reliability of real-world ICs, the models must include the effect of a time-varying current density j(t). The problem of developing a model for a general time-varying current density has not been solved yet, but several workers have investigated unidirectional and bidirectional current pulses [English 1972], [Bobbio 1974], [Kinsbron 1978], [Schoen 1980], [English 1983], [Towner 1983], [Harrison 1988]. These sources were investigated, with the aim of generalizing the results obtained to time-varying



Figure 2.9: A series of unidirectional, rectangular current pulses

waveforms other than perfectly rectangular pulses.

Fig. 2.9 shows a series of unidirectional, rectangular current pulses of amplitude  $J_p$  having a repetition frequency f = 1/T and pulse width  $t_1$ . The duty cycle d is

$$d = \frac{t_1}{T} \tag{2.23}$$

where  $0 \le d \le 1$ . Let  $t_{50dc}$  be the lifetime of the conductor when a continuous (DC) current of  $J_p$  is applied and  $t_{50p}$  the lifetime when the current pulse series is applied. A second lifetime  $t_{50po}$  may be defined as the total "on" time during lifetime  $t_{Lp}$ .

$$t_{50p}^o = dt_{50p} (2.24)$$

If the DC model for Mtf is applied to the pulse series (which is equivalent to assuming that electromigration is a quasi-static process), then only the total "on" time will influence the lifetime of the conductor. The lifetime in terms of "on" time  $t_{Lo}$  should therefore be independent of duty cycle and could be determined using equation (2.15). The actual lifetime of the

conductor would be inversely proportional to d.

$$t_{50p} = \frac{G}{dJ_{eff}(J_p)} \exp\left(\frac{E_a}{kT}\right) = \frac{t_{50dc}}{d}$$
 (2.25)

Attempts to verify equation (2.25) experimentally have been reported in the literature. Generally, equations of the following form have been used to fit the results obtained:

 $t_{50p} = \frac{t_{50dc}}{d^n} \tag{2.26}$ 

Towner, et al tested Aluminum conductors using a 1kHz square wave with duty cycles of 25%, 50% and 75% and having a peak value of  $2 \times 10^6 A/cm^2$  [Towner 1983]. Their results showed a lifetime enhancement for small values of d, which was modeled empirically by equation (2.26) with n equal to 2.

Miller tested Aluminum conductors at a repetition frequency of 250 kHz, a peak current density of  $4 \times 10^6 A/cm^2$  and duty cycles from 25% to 100% [Miller 1978]. His results are best approximated by n=3.25.

Schoen has attempted to explain Miller's results by proposing the existence of a damage relaxation mechanism, which allows the damage accumulating during the pulse "on" time to be removed during the "off" period [Schoen 1980]. While the possibility of a relaxation mechanism cannot be ruled out, Schoen's analysis assumes total reversibility of the accumulated damage, which is unlikely. Furthermore, in order to match the strong dependence of  $t_{50p}$  on d, he was forced to use an unrealistically small value for the time constant of the relaxation process (3.6ms).

English and Kinsbron have argued that temperature gradients due to self-heating are responsible for the divergent results reported by other workers [English 1983]. They have attempted to avoid this problem by measuring electromigration mass transport at current density levels less than  $10^6 A/cm^2$ , using the Blech-Kinsbron edge displacement technique. Their results are attractive, in that they indicate n=1 as predicted by the DC model. However, this result must be treated with caution because of the indirect method of measurement used.

Little work on the effect of bidirectional current pulses on conductor lifetime has been reported. The lifetime of conductors carrying an alternating current is greatly increased, compared to the DC case [d'Heurle 1971]. This is indicative of a partial reversibility of electromigration damage, when the direction of current flow is reversed. Bobbio, et al have used resistance measurements to demonstrate this effect [Bobbio 1974]. The time constant associated with the reversal was of the order of several hours. However, they did not relate this effect to conductor lifetimes. From the results summarized above, it is clear that there are large variations in the results obtained by different workers and much of the data is inconclusive. A coherent model for pulse- induced electromigration which is supported by experimental data does not exist at present, although one is currently being developed [Harrison 1988]. As the goal of this thesis is to present a method for prediction of interconnect reliability, it was decided to use the best model currently available, but to structure the algorithms and software tools in such a way that improved models could be included, as they are developed.

### 2.5.4 A General Model for Electromigration Caused by a Time-varying Current Density

The model used in this thesis is based on the DC model for Mtf. The damage function f(t) is defined as that fraction of the median lifetime of the conductor which has been consumed by damage at time t. From equation (2.17) it follows that

$$f(t) = \frac{t}{t_{50}} = \frac{tJ_{eff}(j(t))}{G} \exp\left(-\frac{E_a}{kT}\right)$$
 (2.27)

During a time interval  $\delta t$  at time t, the damage function is increased by an amount  $\delta f$ , where

$$\delta f = \frac{J_{eff}(j(t))}{G} \exp\left(-\frac{E_a}{kT}\right) \delta t \tag{2.28}$$

Therefore the total damage to time t is

$$f(t) = \frac{1}{G} \left( \int_0^t J_{eff}(j(\tau)) d\tau \right) \exp\left( -\frac{E_a}{kT} \right)$$
 (2.29)

From equation (2.27), the Mtf may be written in terms of f(t) as follows:

$$t_{50} = \frac{t}{f(t)} = \frac{G}{\frac{1}{t} \int_0^t J_{eff}(j(\tau)) d\tau} \exp\left(\frac{E_a}{kT}\right)$$
 (2.30)

Equation (2.30) reduces to equation (2.25), if j(t) is a square wave having peak value  $J_p$  and duty cycle d and the integral is evaluated across one or more cycles of j(t). This provides the most pessimistic estimate of lifetime for this case, as there is no lifetime enhancement for small values of d included in the model. When considering bidirectional current flow, the integral below the line in equation (2.30) is replaced by it's absolute value:

$$t_{50} = \frac{t}{f(t)} = \frac{G}{\left|\frac{1}{t} \int_0^t J_{eff}(j(\tau)) d\tau\right|} \exp\left(\frac{E_a}{kT}\right)$$
(2.31)



Figure 2.10:  $t_{50}$  vs. conductor width W

This model is probably optimistic for the case of a conductor carrying a symmetrical current waveform, as it predicts an infinite lifetime for such a conductor, independent of the current amplitude.

## 2.5.5 The Dependency of $t_{50}$ and $\sigma$ on Physical Dimensions

Finally, the effect of the physical dimensions on  $t_{50}$  and  $\sigma$  of conductor segments will be considered.

#### Run Segments

The relationship between conductor width,  $t_{50}$  and  $\sigma$  has been widely studied for the case of straight conductors. Figs. 2.10 and 2.11 show  $t_{50}$  and  $\sigma$  as functions of conductor width W, determined experimentally by [Kinsbron 1980]. Similar results have been reported by other workers [Scoggan 1975], [Vaidya 1980], [Iyer 1984].  $t_{50}$  Shows a linear decrease with decreasing W, until W is approximately equal to the median grain size  $(2 - 5\mu m)$ . For



Figure 2.11:  $\sigma$  vs. conductor width W

narrower conductors, the lifetime is improved.  $\sigma$  Increases monotonically with decreasing W, but the rate of increase is much greater when W is less than the median grain size. The form of these functions may be explained in terms of the number of grain boundary triple points which contribute to void formation across the width of a conductor (see Fig. 2.12). When the conductor is several grain sizes wide, the formation of an open circuit requires that a sufficient number of triple points be aligned across the width of the conductor. As the width increases, the probability of this occurring is reduced and the lifetime is enhanced. Also, the wider the conductor, the smaller the variation in lifetime will be between different segments, implying a small value of  $\sigma$ . If W is less than the median grain size, the conductor acquires a "bamboo"-like grain structure, in which the migration of metal is blocked at intervals by grain boundaries perpendicular to the direction of current flow. The narrower the conductor, the smaller the probability of including a triple point within a segment and so  $t_{50}$  increases rapidly with decreasing width.  $\sigma$  Increases at small widths because of the greater variation in lifetime between segments which contain triple points and those which do not. A Monte Carlo simulation of void growth at triple



Figure 2.12: Typical conductor grain structure for different conductor widths

points has provided good qualitative agreement with this model [Nikawa 1981]. A contributing factor to the increase in  $\sigma$  at submicron linewidths is the presence of local width reductions due to photolithographic defects.

Empirical equations may be used to model the dependencies of the geometry factor G and  $\sigma$  on W. Expressions of the following form provide a good fit with measured data in the literature [Frost 1987]:

$$G = G(W) = C_1 W + C_2 + \frac{C_3}{W^n}$$
 (2.32)

$$\sigma = \sigma(W) = \frac{C_4}{W^m} + C_5 \tag{2.33}$$

where  $C_1$ ,  $C_2$ ,  $C_3$ ,  $C_4$ ,  $C_5$ , n and m are constants.

The effect of length on  $t_{50}$  and  $\sigma$  has also been widely studied. Conductor length is modeled implicitly in the minimum order statistical method used here. A conductor of length L is modeled as n unit length conductors in series, where  $L_E = unit length = 10 \mu m$ .

$$n = L/L_E (2.34)$$

The variations in  $t_{50}$  and  $\sigma$  with n are derived in Appendix C. The variations in effective Mtf  $(t'_{50})$  and Sigma  $(\sigma')$  with n were discussed in paragraph 2.5.1.

### Contact, Via, Step and Pad Segments

Little information on the reliability of these structures is available in the literature. Prokop and Joseph studied the geometry-dependence of Al-Si contact reliability [Prokop 1972], but their results have not been verified by other workers.

### 2.6 Summary

A method of analyzing the reliability of an interconnect pattern has been presented. This may be summarized as follows.

- The interconnect pattern is fractured into a collection of segments: Runs, Contacts, Vias, Steps and Pads.
- The Mtf of each segment is determined using equation 2.31. The geometry factor G is determined by the segment type and it's dimensions.
- Sigma is determined by the segment type and it's dimensions.
- A lognormal distribution is assumed for each segment.
- The reliability of the interconnect system is determined as the minimum order statistic of the reliability of all the segments.
- The method is limited to current densities less than  $10^6 A/cm^2$ .

### 2.7 An Example

An approximate analysis was performed on the interconnect mask of an IC, of which some conductor characteristics were measured. The circuit was a 3100-gate CMOS standard cell design, with double layer metallization and a minimum feature size of  $2\mu m$ . Some parameters of the interconnects layers are summarized in Table 2.1. A constant current of  $3\mu A$  was assumed for all logic interconnects, and the current in the Ground and  $V_{DD}$  buses calculated accordingly. The geometry factor for Runs was based on the

|   | Layer   | Min. Width | Steps | Vias/    | Thickness | Total            |
|---|---------|------------|-------|----------|-----------|------------------|
|   |         | $(\mu m)$  |       | Contacts | Å         | Length $(\mu m)$ |
| Ĭ | Metal 1 | 4          | 55296 | 36018    | 6000      | 367800           |
|   |         |            |       | contacts |           |                  |
|   | Metal 1 | 3          | 0     | 7884     | 6000      | 656640           |
|   |         |            |       | vias     |           |                  |
| ľ | Metal 2 | 5          | 62137 | 16935    | 8000      | 1107500          |
|   |         |            |       | vias     |           |                  |

Table 2.1: 3100 Gate CMOS standard cell circuit

data of [Kinsbron 1980]. Simple geometrical models were used to determine geometry factors for other segment types (see Appendix D).

The interconnect failure rate is shown in Fig. 2.13, as a function of time. The interconnect pattern is seen to be highly reliable: a failure rate of 10FIT is not exceeded for more than 30 years. A breakdown of the failure rate data shows that the power and ground buses contributed roughly 60% of the total, followed by the contact windows with 30%. Runs in logic interconnects contribute less than 2% of the total, despite the fact that they comprise 84% of the total conductor length.



Figure 2.13: Failure rate of interconnects (3100 gate CMOS circuit)

### Chapter 3

## RELIANT: A RELIABILITY ANALYZER FOR INTEGRATED CIRCUIT INTERCONNECTS

### 3.1 Overview

In this chapter, the design of the Reliant suite of software tools is described. Predicting the reliability of an interconnect pattern requires a data base consisting of the following information:

- the mask layout,
- current waveforms for each conductor,
- process reliability data,
- an environmental specification.

A high priority when designing Reliant was to develop a tool which could easily be integrated into a typical CAD environment. This is important because reliability prediction tools are not currently used by circuit designers, and they will not be easily accepted if they are cumbersome and difficult to use. Accordingly, the following principles were laid down.

• When Reliant makes use of data already available in the design environment, it should be able to access this data in it's existing form, without additional user input.



Figure 3.1: Block diagram of Reliant

• The time required to compute reliability must not significantly increase the overall design time. Where possible, reliability calculation must take place concurrently with other forms of design verification.

Fig. 3.1 shows a block diagram of Reliant. It is assumed that the circuit designer has created a layout of the IC in Caltech Intermediate Form (CIF) [Mead 1981]. This file is called designname.cif, where designname represents the name of the design to be analyzed. It is also assumed that the designer has written a SPICE input file with which the design may be simulated. This file (designname.spc) must include device models, analysis specifications and definitions of all external components which will be connected to the circuit. A description of the file format may be found in [Vladimirescu 1981].

Reliant consists of three modules: Extrem, Combine and Sirprice. Extrem fractures the interconnect pattern into segments and stores the seg-

ment properties in a database file (designname.db1). It simultaneously extracts an equivalent circuit from the layout, modelling the parasitic series resistances and capacitances of all conductive layers. A partial transistor extraction is also performed. Currently, this module has been implemented for nMOS and CMOS technologies. A SPICE- compatible netlist description of the extracted circuit is saved in designname.ext. This file contains a resistor for every conductor segment identified by Extrem.

In order to simulate the extracted circuit, the external components, models and analysis specifications must added. The Combine module uses the definitions in *designname*.spc, together with the extracted netlist to create an augmented netlist called *designname*.spr.

Sirprice simulates circuit behavior and stores the current waveform for each segment. This information is used together with the segment database to calculate the failure rate of each segment. Sirprice produces several output data files. The file designname.dat is the normal SPICE output file, and designname.lis is a print file listing the failure rate of each segment type, for the first 20 years. File designname.db2 is a reliability database which provides an interface to future postprocessors, and segments.dat is an interface to a commercial database package (VAX Datatrieve) which is used to examine the database interactively.

Reliant was developed in a VAX/VMS environment. It consists of approximately 8000 lines of VAX Pascal, plus some modifications to the Fortran 77 source code of Spice 2G.5.

# 3.2 Extrem: A Circuit Extractor for Electromigration Modelling

#### 3.2.1 Overview

Extrem has two functions. Firstly, it fractures the interconnect pattern of each metal layer into a set of segments and determine the type and dimensions of each segment occurrence. Secondly, it creates a netlist from which the circuit may be simulated and the current in each segment determined. Extrem processes the layout in three phases. During the first phase, the CIF layout file is parsed. Wires and polygons are reduced to collections of boxes and the hierarchy is flattened. In the second phase, the intersections of objects on the various mask layers with one another are determined. In the third phase, the segments and equivalent circuit are determined from the list of intersections for each box. Segmentation of a layout  $\Omega$  may be

represented by the following four mappings:

$$\Omega(H) \to \Omega(C)$$
 (3.1)

$$\Omega(C) \to \Omega(I)$$
 (3.2)

$$\Omega(I) \to \Omega(S)$$
 (3.3)

$$\Omega(I) \to \Omega(T, R, C)$$
 (3.4)

A general set of hierarchically organized CIF objects  $\Omega(H)$ , is mapped onto a set of fully instantiated Boxes and Labels,  $\Omega(C)$ .  $\Omega(C)$  is mapped onto a set of intersections  $\Omega(I)$ , which in turn is mapped onto a set of segment instances  $\Omega(S)$ , in a data base. The intersections are also mapped onto a set of transistors, parasitic resistances and capacitances  $\Omega(T,R,S)$ , which forms the extracted circuit. Each segment in the data base is linked to a resistor in the extracted circuit, by means of a unique ID number.

# 3.2.2 Algorithms for Segmentation and Circuit Extraction

Fig. 3.2 shows a block diagram of Extrem. The CIF Parser performs the mapping in equation (3.1). All CIF commands and some common user extensions are parsed. Cell calls and transformations are fully instantiated. Polygons, boxes and wires are checked for non-Manhattan line segments, and error messages indicating the coordinates of such segments are generated. Only Manhattan objects are processed. Polygons and wires are fractured into sets of boxes before processing. Round flashes are ignored.

The segmentation algorithm will now be described in more detail. In the following analysis, it is assumed that the parser has

- fully instantiated all symbol calls;
- replaced Polygons and Wires with collections of Boxes;
- removed any non-Manhattan geometry.

These assumptions limit the set of objects which must be processed to rectilinear Boxes and Labels. An IC layout may now be described as a set of objects

$$C = \{c_j(x_{min}, y_{min}, x_{max}, y_{max}, \Theta, \lambda) | j = 1, 2, 3..n_s\}$$
(3.5)



Figure 3.2: Block diagram of Extrem

string and  $\lambda$  is the mask layer. If an object  $s_j$  is a Box,  $\Theta$  will be a null string. If  $s_j$  is a Label,  $(x_{min}, y_{min}) = (x_{max}, y_{max})$ .

Each mask layer  $\lambda$  has the property of Type, which defines it's function in the fabrication process. The set of possible values of Type depends on the technology used. For example, a p-Well CMOS process with two metal layers would be defined as follows: If  $T_l$  represents the Type of the l-th mask layer, then

$$T_l \in \{Metal1, Metal2, Poly, Diffusion, Cuts, Glass, Vias, Pwell, Pplus, BuriedContact, Labels\}$$
 (3.6)

For a nMOS depletion load process with double layer metal,

$$T_l \in \{Metal1, Metal2, Poly, Diffusion, Cuts, Glass, Vias, Implant, BuriedContact, Labels\}$$
 (3.7)

The set of layer types having the property of electrical conduction is a subset of the full set of layer types. For the MOS processes described above, the conductive layers are *Metall*, *Metal2*, *Diffusion* and *Poly*.

The Labels layer is reserved for Label objects, which are used to define the bonding pads of an IC. The other layers contain only Boxes.

The electrical characteristics of an IC are determined by relative positions of the mask objects. Two objects intersect if they enclose the same set of points in 2- dimensional Cartesian space. The identification and classification of intersections forms the basis of the segmentation algorithm. If two Manhattan rectangles intersect, the area of intersection is also a Manhattan rectangle. The following algorithm creates a record of every intersection of an object on layer i with an object on layer j.

### Algorithm 3.1: Find All Intersections

For each object on layer i Do

For each object on layer j Do

If  $(x_{min}^i \leq x_{max}^j)$  And  $(x_{max}^i \geq x_{min}^j)$ And  $(y_{min}^i \leq y_{max}^j)$  And  $(y_{max}^i \geq x_{min}^j)$  Then

Begin

Determine intersection type;

Create intersection record with  $x_{max} = min(x_{max}^i, x_{max}^j)$   $y_{max} = min(y_{max}^i, y_{max}^j)$   $x_{min} = max(x_{min}^i, x_{min}^j)$   $y_{min} = max(y_{min}^i, y_{min}^j)$ ;

End:

The intersections are classified according to the types of the layers containing the objects. For nMOS and CMOS technologies, the following intersection types are defined.

- Def. 3.1: If  $T_i = T_j$  and  $T_i$ ,  $T_j \in \{conductive layers\}$  then the intersection is an Abutment. Note that overlapping objects on the same layer have been removed by preprocessing. Therefore intersecting objects on the same layer can only abut along a single edge.
- Def. 3.2: If  $(T_i = Cuts \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Cuts)$  then the intersection type is ContactDiff.
- Def. 3.3: If  $(T_i = Cuts \text{ and } T_j = Poly)$  or  $(T_i = Poly \text{ and } T_j = Cuts)$  then the intersection type is ContactPoly.

- Def. 3.4: If  $(T_i = Cuts \text{ and } T_j = Metal1)$  or  $(T_i = Metal1 \text{ and } T_j = Cuts)$  then the intersection type is ContactM1.
- Def. 3.5: If  $(T_i = Vias \text{ and } T_j = Metal1)$  or  $(T_i = Metal1 \text{ and } T_j = Vias)$  then the intersection type is ViaM1.
- Def. 3.6: If  $(T_i = Vias \text{ and } T_j = Metal2)$  or  $(T_i = Metal2 \text{ and } T_j = Vias)$  then the intersection type is ViaM2.
- Def. 3.7: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is enclosed by objects on the Pplus layer, then the intersection type is Pchannel.
- Def. 3.8: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is not enclosed by objects on the Pplus layer, then the intersection type is Nchannel.
- Def. 3.9: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is enclosed by objects on the Implant layer, then the intersection type is Ndepletion.
- Def. 3.10: If  $(T_i = Metal1 \text{ and } T_j = Labels)$  or  $(T_i = Metal2 \text{ and } T_j = Labels)$  or  $(T_i = Labels \text{ and } T_j = Metal1)$  or  $(T_i = Labels \text{ and } T_j = Metal2)$ , then the intersection type is Pin.

The identification and classification of intersections provides the basis for determining the current flow patterns in the conductive layers. This is illustrated by the example in Fig. 3.3. If we consider a single box on a conductive layer, the intersections represent areas where the electrical variables are modified by interaction with the variables of other objects. For example, at an Abutment, current may enter or leave the box horizontally. At a Via1, Via2, ContactM1, ContactPoly or ContactDiff there is a vertical current flow between the box and a different conductive layer. At a Pchannel, Nchannel or Ndepletion intersection, the horizontal current flow in a Diffusion box is modulated by the voltage on the intersecting Poly. The Pin intersection represents a bonding wire connecting to the external environment. When an intersection is identified, a node number is assigned to it. A copy of the intersection record, including the node number, is attached to both objects.

Before the segments of a conducting layer can be determined, the dominant current flow direction in each object must be established. As can be seen from the example, current flow is not always parallel to the x and y



Figure 3.3: An example showing various intersections

axes, even when the layout is limited to Manhattan geometry. Furthermore, when there are more than two intersections the current flow pattern depends on the relative magnitudes of the current components at the intersections. Assuming that these boundary values are known, an accurate numerical solution is possible by solving Laplace's equation in two dimensions [Horowitz 1983], [Barke 1985]. This approach was rejected, for the following reasons:

- it is computionally intensive;
- a priori information about the boundary values is not available;
- it is only necessary to determine the dominant current flow direction, in order to replace the object with a collection of segments. The current flow direction at every point in the object is not required.



Figure 3.4: Determination of dominant current flow direction

The segments defined in Chapter 2 are Manhattan structures, symmetrical about the direction of current flow. Therefore the axis-direction (x or y) which most closely approximates the real current flow direction in each part of the object, must be found. A probabilistic method of determining current flow direction may be derived by considering the structures in Fig. 3.4. If there are n intersections, the total number of potential current paths between two intersections is

$$C_2^n = \frac{n!}{2!(n-2)!} \tag{3.8}$$

When the intersections of an object are widely scattered in the x-direction but not in the y-direction, these potential current vectors lie close to a line parallel to the x-axis, described by the equation

$$y = y_{avg} = \frac{1}{n} \sum_{i=1}^{n} y_i \tag{3.9}$$

Similarly, if the scatter is greatest in the y-direction, the potential current flow vectors are approximated by the line

$$x = x_{avg} = \frac{1}{n} \sum_{i=1}^{n} x_i \tag{3.10}$$

The variances of the x and y coordinates of the centers of the intersections provide a measure of the scatter in each direction.

Algorithm 3.2: Finding the Dominant Direction of Current Flow Within an Object

- 1. Determine  $x_{avg}$  and  $y_{avg}$  using equations 3.9 and 3.10.
- 2. Determine Var(x) and Var(y) as follows:

$$Var(x) = \sum_{i=1}^{n} (x_i - x_{avg})^2$$
 (3.11)

$$Var(y) = \sum_{i=1}^{n} (y_i - y_{avg})^2$$
 (3.12)

3. If Var(x) < Var(y) then dominant direction = x, else dominant direction = y.

Before proceeding with the segmentation of the current object, the possibility of steps in the *Metall* layer must be investigated (it is necessary to determine current flow directions in the *Metall* layer, before steps can be identified).

Def. 3.11: A Step intersection occurs when the edge of a Poly object crosses a Metall object at right angles to the direction of current flow in the Metall object.

Steps are detected as follows. The *Metal1* and *Poly* layers are checked for intersections, using Algorithm 3.1. The edges of the overlap region are compared with the width of the *Metal1* object. If an edge crosses more than 75% of the conductor width, an intersection record is generated. A single *Metal1* conductor crossing a *Poly* stripe will contain two *Step* intersections.

The segmentation of all Box objects on conductive layers is now performed. First, the intersection records for a given Box are sorted in ascending center coordinate order in the dominant direction of current flow. As there are typically less than 10 intersections per box in a VLSI layout, a simple bubble sort is used. In order to extract the features occurring on all conductive layers, a larger set of segments is defined:

$$SegmentTypes = \{MetalRun, NonMetalRun, \\ ContactDiff, ContactPoly, \\ ContactM1, Step, ViaM1, \\ ViaM2, Pad, Pchannel, \\ Nchannel, Ndepletion\}$$
 (3.13)

The following partial mapping between intersections and segments exists:

$$ContactDiff \rightarrow ContactDiff$$
 (3.14)

$$ContactPoly \rightarrow ContactPoly$$
 (3.15)

$$ContactM1 \rightarrow ContactM1$$
 (3.16)

$$Step \rightarrow Step$$
 (3.17)

$$ViaM1 \rightarrow ViaM1$$
 (3.18)

$$ViaM2 \rightarrow ViaM2$$
 (3.19)

$$Pin \rightarrow Pad$$
 (3.20)

$$Pchannel \rightarrow Pchannel$$
 (3.21)

$$Nchannel \rightarrow Nchannel$$
 (3.22)

$$Ndepletion \rightarrow Ndepletion$$
 (3.23)

Each intersection produces a segment of a specific type. The unmapped segment types are the *MetalRuns* and *NonMetalRuns*. These occur between the intersections and are used to link up the segments of other type. The five metal interconnect segment types defined in Chapter 2 form the following subset of *SegmentTypes*.

$$MetalSegmentTypes = \{MetalRun, ContactM1, \\ Step, ViaM2, Pad\}$$
 (3.24)

A record for each element of MetalSegmentTypes is stored in the data base, for use during the reliability prediction phase. This record has the format:

Extrem assigns values to all the parameters in Segment except for Mtf, Sigma and Failure Rate, which are determined by Sirprice.

The segmentation algorithm may be summarized as follows.

### Algorithm 3.3: Segmentation of an Object

```
For i := 1 To n Do
     Begin
     create a Current Segment record using the
     intersection \rightarrow segment mapping;
     With CurrentSegment Do
           Begin
           Determine the dimensions;
           Determine the series resistance and parallel
           capacitance;
           If Type \in \{MetalSegments\} Then
           save in designname.db1;
           Write \pi-section RC equivalent network to
           designname.ext;
           End;
      If i > 1 Then
           Begin
           create a Run segment linking the right(top) edge
           of the PreviousSegment to the left(bottom) edge
           of the CurrentSegment;
           With RunSegment Do
                 Begin
```



Figure 3.5: Segmentation and extraction of example in Fig. 3.3

Determine the dimensions; Calculate the series resistance and parallel capacitance; If  $Type \ \epsilon \ \{MetalSegments\}$  Then save in designname.db1; Write  $\pi$ -section RC equivalent network to designname.ext; End;

End;

PreviousSegment := CurrentSegment; End;

Fig. 3.5 illustrates the segmentation of the layout example in Fig. 3.3. The segment ID number is shown in the middle of each segment. Fig. 3.6 shows the extracted equivalent circuit for this layout.



Figure 3.6: Equivalent circuit for the example in Fig. 3.5

The netlist is defined as a SPICE subcircuit called designname, plus an instance X1 of this subcircuit. The user-defined Labels on the bonding pads are the external net names and these are mapped onto the internal node names which have been assigned by Extrem. Channel segments are represented by instances of the default pMOS and nMOS devices. Any other segment i is represented by a  $\pi$ -section network consisting of  $R_i$ ,  $C_{iL}$  and  $C_{iU}$  or an L- section  $R_i$ ,  $C_i$ . Resistors representing a metal layer are designated R\$i. The Sirprice simulator automatically stores the currents in all resistors having the R\$ prefix, for use in the reliability calculation (see paragraph 3.4).

### 3.2.3 Data Structures for Region Queries

Because of the large number of intersections which must be found in a VLSI design, appropriate data structures for region queries must be used. If n objects are stored in a linear list, the problem of finding all objects intersecting a given window requires O(n) computation time. A number of other data structures have been developed for region queries which yield

performance better than O(n). These techniques fall into three groups.

- 1. Scan-line Techniques [McCreight 1980], [Gupta 1982], [Ullman 1984]. In scan-line techniques, the static two dimensional intersection problem is transformed into a dynamic one-dimensional intersection problem. A scan line parallel to the x-axis is drawn across the layout. The x<sub>min</sub> and x<sub>max</sub> coordinates of any objects which intersect the scan line form a set of intervals which may intersect the window. To find the intersecting intervals rapidly, a balanced search tree is used. The scan line then advances one coordinate step in the y-direction. If an object no longer intersects the scan line, it's interval is deleted from the current interval set, while new intervals are added. The process is repeated until the scan line has traversed all the objects. The scan line algorithm will report the objects intersecting a window in O(log(n+i)) time, where n is the number of objects and i the number of intersections found.
- 2. Quad Trees [Brown 1986], [Finkel 1974]. Quad trees are widely used for the manipulation of 2-dimensional geographical data. A quad tree is built by recursively dividing a coordinate space into 4 equally-sized quadrants, and then classifying the objects according to the quadrant in which they fall. The root node of the tree represents the coordinate space of the whole layout. Each node has four pointers, each pointing to a quadrant. If an object falls entirely within a single quadrant, it is loaded onto the tree at the corresponding node. In a perfect quad tree, the quadrants are subdivided and the objects redistributed among the child quadrants until each quadrant contains only one object. This may result in a very large data structure containing little data. More efficient memory usage is obtained by making the tree building algorithm adapt according to the number of objects in a particular part of the tree. A threshold is placed on the number of objects allowed on a node and when this is exceeded, the node is subdivided and the objects are redistributed. This results in an unbalanced tree which is deepest in the areas having the greatest density of objects. Intersection searching time will depend on the number of levels traversed, but is  $O(\log(n))$  on average. Objects which are bisected by the boundaries of a quad are dealt with by maintaining bisector lists at each node, or by multiple storage in adjacent quads.
- 3. Multi-dimensional Binary Search Trees [Rosenberg 1985], [Bentley 1975]. These data structures, which are also known as k-d trees,

are used in data bases for associative queries based on multiple search keys. K-d trees may be used in the present application by defining the four corner coordinates of a rectangle as keys. Intersection searching time is O(Log(n)). A full description of k-d trees is beyond the scope of this thesis. Rosenberg has reported that this structure is faster than an adaptive quad tree at finding intersections, but has a higher memory usage.

The multiple storage, adaptive quad tree was chosen because it is easy to implement, provides good performance and uses memory efficiently. Fig. 3.7 depicts the structure of a tree before and after the intersection search. The tree nodes are defined as follows:

Node = Record

MinquadHor, MinquadVer, MaxquadHor, MaxquadVer:

Integer;

Internal\_Node : Boolean;
PtrTollBoxes : ListBox\_pntr;

Count: Integer;

PtrTosubNode1, PtrTosubNode2, PtrTosubNode3,

PtrTosubNode4: NodePntr;

End;

The first four variables define the extents of the quad. PtrToLLBoxes is a pointer to the list of objects attached to the node. Instead of storing multiple representations of an object crossing a quad boundary, multiple pointers to the same object record are used. This reduces the memory requirement considerably, but precautions must be taken to prevent multiple reporting of the same object, from different quads. In a quad tree, an internal node contains pointers to subnodes, while a leaf node contains a pointer to a list of objects. Internal Node is a flag which identifies the node as internal or leaf. The tree is initialized with the root node a leaf. Count is a tally of the number of objects listed on the node, which is incremented when an object is stored at the node. When Count reaches a threshold (in this case 30), the node is redefined as internal and four subnodes are created. The objects are redistributed among the subnodes. Because a node can change state from leaf to internal, the same data structure is used for both node types. If Internal\_Node is True, PtrTosubNode1 through PtrTosubNode4 point to the 4 subnodes. If Internal\_Node is False, PtrToLLBoxes points to a linked list of pointers to object records. The linked list elements are defined as follows:



Figure 3.7: Multiple storage, adaptive quad tree

ListBox = Record

Ptr\_To\_BoxInfo: Box;

Ptr\_To\_Next : ListBox\_pntr;

End;

The pointer Ptr\_To\_BoxInfo points to BoxInfo, a complete record of an object.

BoxInfo = Record

MinBoxHor: Integer; MinBoxVer: Integer; MaxBoxHor: Integer; MaxBoxVer: Integer;

BoxLabel: Packed Array [1..8] Of Char;

Mark: Boolean;

AllIntersectionsFound: Boolean; IntersectionList: IntersectionPointer;

End;

BoxInfo contains the object's coordinates and label. Mark is a flag, which is set when the object is found for the first time during a search operation. The search algorithm only reports intersections with unmarked objects, preventing multiple reporting of the same intersection. The price paid for using multiple representation is that all Mark flags must be reset before every search, which entails an extra traversal of the quad tree. The flag AllIntersectionsFound is set when an object has been checked for intersection against all other objects. IntersectionList points to a linked list of intersection records, each having the following form:

Intersection = Record

Next: IntersectionPointer;

Done: Boolean;

Size: Array[Direction] Of Integer; Centre: Array[Direction] Of Integer;

NodeNumber: Integer;

Case Tipe: IntersectionType Of
ContactDiff,ContactPoly,
ContactM1,ViaM1,ViaM2:
(CoincidentNode: Integer);
Pin: (PinNumber: Integer);

```
Step: (UpperNode: Integer); End;
```

When an intersection is found, an intersection record is added to the intersection list of both objects.

### 3.2.4 Technology Dependence

Up to ten different technologies may defined in the module Technology.Pas, which is then linked to the rest of the modules making up Extrem. Currently, only a p-well CMOS process with double layer metal has been implemented. Other MOS processes could be included with minor modifications. Bipolar processes would require new transistor extraction algorithms.

### 3.3 Preparing the Netlist for Simulation

The user prepares a SPICE file called designname.spc. This file must contain the following:

- MODEL definitions for the transistors;
- .TRAN analysis specifications and options;
- external components such as voltage sources and load resistors;
- a definition of subcircuit designname (this may be empty);
- an instance X1 of subcircuit designname.

The node numbers used in the subcircuit call must correspond to the labels on the bonding pads of the layout. The subcircuit is a convenient way of "encapsulating" the IC and preventing the user from duplicating node names assigned by the extractor. Typically, the user will first simulate the circuit as designed, to verify it's functionality. Once functionality is established, the layout is generated and the equivalent circuit extracted using Extrem. Combine copies all the information in the .spc file to designname.spr, but replaces the contents of the subcircuit definition with the circuit extracted by Extrem. The extracted circuit is then simulated using Sirprice, simultaneously verifying both layout correctness and interconnect reliability.

# 3.4 Sirprice: A Simulator for Electromigration Damage

Sirprice performs a simulation of the extracted circuit and determines the reliability of all conductor segments. The functioning of this module is illustrated by the following Pascal pseudo-code program.

```
Begin
Rspice(designname, Success);
If Success Then
     Begin
      Initialize Failure Rates of whole circuit, Runs, Vias,
      Contacts, Steps and Pads to 0;
      For I := 1 To NumberOfSegments Do
           Begin
           Compute Mtf(I);
           Compute Sigma(I);
           For Time := 1 To 20 Years Do
           Compute Failure Rate of Segment I;
           Store Mtf, Sigma and Failure Rate of Segment I;
           Update Failure Rate for Segments of this type;
            Update Failure Rate for all Segment types;
           End;
      Create Datatrieve File;
      End
Else
      Write Diagnostic Message;
End;
```

RSPICE is a slightly modified version of SPICE2G.5. The following alterations to the Fortran 77 source code have been made.

- RSPICE is defined as a subroutine, and can be called from within a Pascal procedure. A flag is returned to indicate the error status of the simulation.
- At every internal timestep, the time and the current in each resistor R\$i is saved in a linked list.
- The analysis temperature is saved.

If RSPICE terminates successfully, the reliability of each *Metal1* and *Metal2* segment is computed, using the methods set out in Chapter 2. This involves the integration of information from the segment data base (.db1) file, current-time data collected by RSPICE and the reliability characteristics of the technology used. The latter is provided by a technology definition module Stechnology.Pas, which contains all the constants used in the equations discussed below.

The Mtf is determined using Equation (2.31), with the geometry factor G given by Equation (2.32) and T equal to the analysis temperature specified by the user in the .spc file.  $\sigma$  Is determined using Equation (2.33). The failure rate is then calculated for the first 20 years. Mtf,  $\sigma$  and the 20 failure rate values are added to the segment record and stored in a new data base file, designname.db2. Running totals of the 20 failure rate values for each segment type and the total values for the interconnect pattern are updated. Finally, a call interface to VAX Datatrieve is generated from the .db2 data base [DEC 1981]. The application of Datatrieve for interactive examination of reliability data is described in the next section.

### Chapter 4

### EVALUATION OF RELIANT

### 4.1 Calibrating the Reliability Models

Calibration of Reliant for a specific fabrication process requires the lifetime measurement of a number of test structures. Table 4.1 summarizes the dependencies of  $t_{50}$  and  $\sigma$  on the various constants which must be determined. If simple linear regression models for dependency on W, L,  $W_c$  and  $L_c$  are assumed, a total of 14 test structures and 26 independent tests are required. The sample size for each test must provide statistically meaningful results, with the emphasis on the early failures. An evaluation of a fabrication process is beyond the scope of this thesis, whose goal is to present methods and tools for use by designers in an industrial environment. Many semi-conductor manufacturers include electromigration monitors in their process evaluation programs and may have sufficient data available to perform a calibration. In the absence of such data, Appendix E details a comprehensive test chip which may be used for this purpose.

| Segment          | t <sub>50</sub>       | σ          | No. of     | No. of |
|------------------|-----------------------|------------|------------|--------|
| Type             |                       |            | Structures | Tests  |
| Run (Metal1)     | $W, E_a, \Psi$        | W          | 2          | 4      |
| Run ( $Metal2$ ) | $W,E_a,\Psi$          | W          | 2          | 4      |
| Step             | $W,E_a,\Psi$          | W          | 2          | 4      |
| Contact          | $W_c, L_c, E_a, \Psi$ | $W_c, L_c$ | 3          | 5      |
| Via              | $W_c, L_c, E_a, \Psi$ | $W_c, L_c$ | 3          | 5      |
| Pad              | $W,E_a,\Psi$          | W          | 2          | 4      |
|                  |                       |            | 14         | 26     |

Table 4.1: Dependencies of  $t_{50}$  and  $\sigma$  for each Segment Type

| Parameter             | $\overline{GEINV}$    | $\overline{GEBINC}$  |
|-----------------------|-----------------------|----------------------|
|                       | @ $T = 27^{\circ}C$   | @ $T = 80^{\circ}C$  |
| Run time (VAX11/785): |                       |                      |
| 1) Spice              | 33s                   | 1520s                |
| 2) Extrem             | 1.7s                  | 7.4s                 |
| 3) Combine            | 0.5s                  | 1.7s                 |
| 4) Sirprice           | 44s                   | 1610s                |
| (2+3+4)               | 44.2s                 | 1619s                |
| 6) Reliant/Spice      | 1.33:1                | 1.06:1               |
| No. of nodes          | 88                    | 522                  |
| No. of transistors    | 4                     | 26                   |
| Area                  | $2727 \mu m^2$        | $12827 \mu m^2$      |
| No. of segments       | 100                   | 601                  |
| No. of metal segments | 58                    | 351                  |
| Failure rate $(FIT)$  |                       |                      |
| 0 t = 20 yrs.         | $4.2 \times 10^{-15}$ | $9.7 \times 10^{-2}$ |

Table 4.2: Summary: Reliant analysis results for GEINV and GEBINC

### 4.2 Two Examples

Reliant was used to evaluate the reliability of a CMOS standard cell 4-bit binary ripple counter with reset [GE 1986]. A CIF layout file was supplied by the manufacturers. This file was first modified with a mask editor to remove some non-Manhattan geometry. The BINR4 counter consists of a double inverter input stage (which buffer the clock and generates it's complement) and four counter stages. First, the four-transistor input stage (GEINPUT) was simulated for two clock cycles at a temperature of 27°C. Fig. 4.1 shows the user interaction with Reliant.

The user's simulation file geinput.spc appears in Fig. 4.2 and Fig. 4.3 shows the Sirprice input file geinput.spr. Bonding pads were added to the layout before simulation, as shown in Fig. 4.4. Fig. 4.5 is a graph of the tabulated results in geinput.lis. The failure rate is  $4.2 \times 10^{-15} FIT$  after 20 years and the major contributors to the failure rate are the contacts.

A single counter stage plus the double inverter (GEBINC) was then analyzed for two clock cycles at a temperature of  $80^{\circ}C$ . The layout and analysis results appear in Figs. 4.6 and 4.7 respectively. The Failure rate is  $9.7 \times 10^{-2} FIT$  after 20 years and the major contributors are the steps.

Table 4.2 summarizes the results of the two analyses. For both circuits, the major contribution to run time was the circuit simulation phase. Cir-

```
$ r extrem
EXTREM Version 1.0
The following technologies have been defined:
( 1) cmospw2m
Technology?
Design Name?
geinput
Include parasitic capacitances? (Y/N)
Parsing geinput.cif....
No errors detected in CIF file
Building quad search trees....
Finding all intersections between layers....
Extracting conductor segments and equivalent circuit....
Segment records stored in geinput.DB1
Spice netlist stored in geinput.EXT
CIF output stored in geinput.K
Total cpu time: 1.6E+00 seconds
$ r combine
COMBINE Version 1.0
Design Name?
geinput
Reading external circuit and analysis specs from geinput.SPC....
Spice netlist stored in geinput.SPR
Total cpu time: 4.5E-01 seconds
$ r sirprice
SIRPRICE Version 1.0
The following technologies have been defined:
(1) cmospw2m
Technology?
Design Name?
geinput
Simulating circuit in geinput.SPR....
Simulation completed.
Regular Spice output stored in geinput.DAT
Calculating reliability...
Creating Datatrieve records...
Total cpu time: 4.2E+01 seconds
```

Figure 4.1: User interaction with Reliant

```
USERS SIMULATION OF GEINPUT
VIN 3 0 PWL(0 0 1N 5 3N 5 4N 0 6N 0)
VDD 1 0 DC 5
VSS 2 0 DC 0
CL1 4 0 0.01P
CL2 5 0 0.01P
X1
+ 2 1 5 4 3
+GEINPUT
.SUBCKT GEINPUT
+ 2 1 5 4 3
M1 4 3 1 1 PMOS L=3U W=22U
M2 4 3 2 2 NMOS L=3U W=22U
M3 5 4 2 2 NMOS L=3U W=22U
M4 5 4 1 1 PMOS L=3U W=22U
.ENDS GEINPUT
.MODEL NMOS NMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.16E-6 XJ=0.14E-6
+ CJ=1.6E-4 CJSW=1.8E-10 UO=550 VTO=1.022 CGSO=1.3E-10
+ CGD0=1.3E-10 NSUB=4E15 NFS=1E10
+ VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 THETA=0.06 KAPPA=0.4 ETA=0.14
.MODEL PMOS PMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.3E-6 XJ=0.42E-6
+ CJ=7.7E-4 CJSW=5.4E-10 UO=180 VTO=-1.046 CGSO=4E-10
+ CGDO=1.3E-10 TPG=-1 NSUB=7E15 NFS=1E10
+ VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 ETA=0.06 THETA=0.03 KAPPA=0.4
.OPTIONS NODE
.TRAN .05N 6N
.PRINT TRAN V(3) V(4) V(5)
 .END
```

Figure 4.2: Spice input file geinput.spc

```
USERS SIMULATION OF GEINPUT
VIN 3 0 PWL(0 0 1N 5 3N 5 4N 0 6N 0)
VDD 1 0 DC 5
VSS 2 0 DC 0
CL1 4 0 0.01P
CL2 5 0 0.01P
* Circuit extracted from geinput.cif by Extrem
X1
+ 2 1 5 4 3
+GEINPUT
.SUBCKT GEINPUT
+ 2 3 1 4 5
C1 57 0 6.2E-14
R$1 1 57 1.6E-07
C2L 6 0 6.2E-15
C2U 57 0 6.2E-15
R89 52 86 1.0E-06
                         L=3U W=22U
M3 87 42 86 999 PMOS
R90 87 55 1.0E-06
R91 55 41 1.0E-06
C93L 41 0 4.0E-15
C93U 88 0 4.0E-15
R93 41 88 2.4E+00
M4 89 43 88 999 PMOS
                         L=3U W=22U
R94 89 53 1.0E-06
R95 53 51 1.0E-06
C96L 41 0 1.0E-15
C96U 56 0 1.0E-15
R96 41 56 1.1E+01
VPBULK 999 0 5
 .ENDS GEINPUT
 .MODEL NMOS NMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.16E-6 XJ=0.14E-6
+ CJ=1.6E-4 CJSW=1.8E-10 U0=550 VT0=1.022 CGSO=1.3E-10
 + CGD0=1.3E-10 NSUB=4E15 NFS=1E10
 + VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 THETA=0.06 KAPPA=0.4 ETA=0.14
 .MODEL PMOS PMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.3E-6 XJ=0.42E-6
+ CJ=7.7E-4 CJSW=5.4E-10 UO=180 VTO=-1.046 CGSO=4E-10
 + CGDO=1.3E-10 TPG=-1 NSUB=7E15 NFS=1E10
 + VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 ETA=0.06 THETA=0.03 KAPPA=0.4
 .OPTIONS NODE
 .TRAN .05N 6N
 .PRINT TRAN V(3) V(4) V(5)
 .END
```

Figure 4.3: A part of Sirprice input file geinput.spr



Figure 4.4: Layout of GEINPUT

cuit extraction and netlist generation amounted to 5% of the total run time for GEINPUT and 0.5% for GEBINC. Extractor efficiency is usually evaluated in terms of the number of transistors extracted per second. Because Extrem extracts segments rather than devices, efficiency must be measured in segments per second. The extraction rate was 59 segments/s for GEINPUT and 81 segments/s for GEBINC. These results compare favorably with those reported elsewhere [Gupta 1982].

The run time of Sirprice is greater than that of Spice, because of the additional reliability calculations which are performed. The larger data structures also lead to more page faults in a virtual memory system. The time difference is O(n), where n is the number of nodes. As the simulation time is  $O(n^2)$ , this difference becomes negligibly small for large circuits.

Fig. 4.8 shows how the Datatrieve interface was used to optimize the interconnect reliability of GEBINC. First, segment 79 was found to have the highest failure rate at a specific time (arbitrarily chosen as 20 years). The failure rate of this segment was  $7.775 \times 10^{-3} FIT$ . All segments having a failure rate greater than  $5.0 \times 10^{-3}$  FIT were then found. The 11 segments thus identified had a total failure rate of  $8.1 \times 10^{-2} FIT$ , which represents



Figure 4.5: Failure rate vs. time for GEINPUT

81% of the total failure rate for all 351 segments. When their center coordinates were listed, these were found to be clustered along a line between points (66,31.5) and (85.5,31.5). On examination of the layout with a mask editor it was found that the 11 segments belonged to a single conductor, which had a width of  $3\mu m$ . This width was increased to  $4\mu m$  and the analysis repeated. The failure rate after 20 years was then  $4.5 \times 10^{-3} FIT$ , representing a twentyfold improvement for a mere 0.6% increase in area.

#### 4.3 Limitations

The following limitations of Reliant have been identified.

1. The probabilistic method of determining current flow direction sometimes produces transistors with the length and width exchanged. This occurs when the channel width is much larger than the source or drain width. A solution would be to use conventional circuit extraction techniques to identify the transistors.



Figure 4.6: Layout of GEBINC

- 2. Initially, Spice DC convergence problems (pivot element < PIVTOL) were experienced with GEBINC. This was cured by placing a lower limit of  $1\Omega$  on the value of any extracted resistor. Transient convergence problems were eliminated by ramping  $V_{DD}$ .
- 3. Because of the fine grain of the segmentation algorithm, even simple layouts result in extracted circuits containing several hundred nodes. For the two examples shown, there are 20 25 extracted nodes per transistor. The use of SPICE to simulate these circuits limits the application of Sirprice to VLSI cells containing no more than a few hundred transistors. As an experiment, nodal capacitances were omitted. This resulted in a 20% improvement in speed but produced failure rates which differed by orders of magnitude from the results described above. An alternative solution would be to compact the nodes within a conductor branch, by redistributing the nodal capacitances to the branch ends and combining the branch resistors in series. A third possibility is to make use of event-driven simulation techniques to speed up the reliability analysis. This option is discussed in the



Figure 4.7: Failure rate vs. time for GEBINC

following paragraph.

## 4.4 Extending the VLSI Capability of Reliant

Because the time required to solve a set of n circuit equations is  $O(n^2)$ , the behavior of logic circuits containing more than a few hundred elements is usually analyzed using event-driven simulation techniques. In an event-driven simulator, the nodal voltages and impedance levels are discretized to represent a number of pre-defined logic states. An event occurs whenever the state of a node changes. The effect of the event on other nodes is given by Boolean expressions defining the logic functions of the circuit and time delays associated with each state change. When an event occurs, the next state of all nodes is evaluated and a list of pending events is compiled for nodes whose state will change. The time delays are computed and the pending events are scheduled by means of a queue.

Two kinds of event-driven simulator are in general use: the logic simu-

```
DTR> READY SEGMENTS READ
DTR> FIND ALL SEGMENTS
[351 records found]
DTR> FIND SEGMENTS WITH LAMBDA20=MAX(LAMBDA20)
[1 record found]
DTR> SELECT
DTR> PRINT ID, SEGMENTTYPE, XMIN, YMIN, LAMBDA20
    ID
                                     YMIN
                                                 LAMBDA20
           SEGMENTTYPE
                        XMIN
    79
              STEP
                          10650
                                      3000
                                                 7.7754E-03
DTR> FIND ALL SEGMENTS WITH LAMBDA20>5E-3
[11 records found]
DTR> SELECT
DTR> PRINT ALL LAMBDA20
 LAMBDA20
 7.5505E-03
 7.5667E-03
 7.6158E-03
 6.0731E-03
 7.6322E-03
 7.6815E-03
 6.1255E-03
 7.6980E-03
 7.7201E-03
 7.7366E-03
 7.7754E-03
DTR> PRINT ALL SEGMENTTYPE, (XMIN+XMAX)/2, (YMIN+YMAX)/2
SEGMENTTYPE
 STEP
                   6600.000
                                    3150.000
 STEP
                   6900.000
                                    3150.000
 STEP
                   7800.000
                                    3150.000
 STRAIGHT
                    7350.000
                                    3150.000
                                    3150.000
 STEP
                    8100.000
 STEP
                    9000.000
                                     3150.000
 STRAIGHT
                    8550.000
                                      3150.000
                   9300.000
 STEP
                                     3150.000
 STEP
                   9700.000
                                    3150.000
 STEP
                  10000.000
                                    3150.000
 STEP
                   10700.000
                                    3150.000
```

Figure 4.8: Identification of highly-stressed segments using Datatrieve

DTR>

lator and the switch-level simulator. In the former, the circuit is modeled as a collection of modules, each having a primitive Boolean function, e.g. AND, OR. As each Boolean function may correspond to a circuit containing several transistors, there is not a close correspondence between the logic simulator model of circuit function and the topology of the circuit.

#### 4.4.1 Switch-level Simulation Techniques

The switch-level simulator represents each MOS transistor as a switch whose state (open, closed) is determined by the state of a controlling node (0,1) [Bryant 1984], [Hayes 1984]. Therefore, no abstractions are made about the network topology, only about the transistor model. In most switch-level simulators, the conductance of the transistor in the on state is also modeled. This parameter is sometimes referred to as the strength of the transistor and is represented by a set of discrete values. The state of a node is determined by an ordering of the strengths of the transistors connected to the node. In Terman's RNL simulator, a semi-analog approach is followed. The total resistances between a given node,  $V_{DD}$  and Ground are determined and a voltage divider method is used to determine the voltage on the node. A threshold function is applied to this voltage to determine the logic state of the node. Nodal capacitances to ground are used together with the transistor conductance values to compute the delay at each node, by means of a single time constant RC model. The variation in channel resistance with  $V_{ds}$  is modeled by defining static and dynamic resistors for each transistor [Terman 1985].

## 4.4.2 A Method for Estimating Interconnect Reliability using a Switch-level Simulator

Switch-level simulators do not provide the user with explicit voltage or current waveform information, although an approximate nodal voltage value or waveform may be implicitly assumed in the methods of next state and delay determination. The possibility of extracting sufficient information about branch current waveforms to estimate interconnect reliability is now considered. It is assumed that circuit extraction has been performed on the layout, and the equivalent circuit contains the following elements:

- MOS transistors defined by linear static and dynamic resistances;
- parasitic interconnect resistances;
- a capacitance from each node to ground.

The steady-state behavior of the network is derived from a linear network consisting of transistor static resistances and parasitic interconnect resistances. The steady-state nodal voltages after each event may be determined by one of the following methods.

- If all resistive networks are proper trees to  $V_{DD}$  or Ground, equivalent resistances to each of the global nodes may be determined by a tree search algorithm. This is the method used in the RNL simulator.
- For general resistive networks, the circuit equations must be solved.
  However, this could be done much faster than in Spice because the
  networks are linear and only those parts of a network affected by an
  event need be analyzed.

With all node voltages known, the steady-state currents in the interconnect resistors may be determined.

The transient behavior of the network is derived from a linear network consisting of transistor dynamic resistances, parasitic interconnect resistances and a capacitor  $C_i$  to ground from each node i. Transient currents are the result of charging or discharging of the nodal capacitances. The damage function expression in equation (2.29) may be written as follows:

$$f(t) = B \int_0^t J_{eff}(j(\tau)) d\tau \tag{4.1}$$

where B is a constant. Therefore

$$f(t) = B \int_0^t \sinh(\Psi j(\tau)) d\tau$$
 (4.2)

$$= B \int_0^t \frac{1}{\Psi} \left( \Psi j(\tau) + (\Psi j(\tau))^2 + \cdots \right) d\tau \tag{4.3}$$

$$= B\left[\frac{Q}{A} + \frac{\Psi}{A^2 2!} \int_0^t i^2(\tau) d\tau + \cdots\right]$$
 (4.4)

(4.5)

where A is the cross-sectional area of the conductor and Q is the total charge through the conductor. Therefore as a first approximation

$$f(t) \approx \frac{BQ}{A} \tag{4.6}$$

This approximation is accurate if  $j(t) \ll 1/\Psi = 5.0 \times 10^5 A/cm^2$ . When current densities are below this threshold (a reasonable assumption for logic interconnects in MOS circuits) it is only necessary to predict the



Figure 4.9: Charge movement in a MOS circuit

total charge movement accurately. The precise current waveform is not important. The charge movement into node i is

$$Q_i = C_i \left( V_i^{\infty} - V_i^0 \right) \tag{4.7}$$

where  $V_i^0$  and  $V_i^\infty$  are the initial and final voltages on node i, respectively. If the resistive network is a proper tree, the charge movements may be determined by searching the tree and accumulating the charges from the leaf nodes to the root. This is illustrated by the example in Fig. 4.9.

Nets such as  $V_{DD}$  and Ground are considered global if the voltage at every point in the net is independent of the current flowing into or out of the net. Once the static and dynamic currents in the logic networks have been determined, the equivalent circuits of the global nets may be analyzed with the logic currents modeled as current sources (see Fig. 4.10).



Figure 4.10: Current source model of power bus

#### Chapter 5

#### REVIEW

This thesis has demonstrated the feasibility of predicting the reliability of VLSI interconnects during the design phase. Models for failure rate of conductors based on a lognormal distribution of lifetime have been developed and a methodology for determining the failure rate of complex interconnect patterns presented. The intrinsic lifetime of IC interconnects may be determined in this manner. The implementation of this methodology in a software tool for reliability analysis has been described and the functioning of the tool has been demonstrated for two VLSI leaf cells. It has been shown that reliability analysis may be achieved concurrently with layout verification by simulating an extracted equivalent circuit, and that the overhead for reliability prediction is small. However, the use of a circuit simulator limits the size of circuits which may be analyzed to a few hundred transistors at most. A method of overcoming this limitation has been proposed. This method uses an event-driven switch-level simulator to model the logic interconnects and global nets are accommodated using a simple currentsource model.

The role of local and global variations in conductor width due to process disturbances (e.g. spot defects, over/underetching) has not been considered in this thesis. A worst-case analysis of global variations may be made by preprocessing the mask description file to include maximum overetching of conductors. Spot defects are related to the incidence of early failures and the inclusion of these effects necessitates the use of a different lifetime distribution to model infant mortality. It should be noted that the methodology proposed here places no restriction on the distribution used. A paper (co-authored by the present author) describing the distribution of early EM failures is currently in the review process and a draft copy is included in Appendix F. This aspect needs further study, for example, to develop efficient means of parameter estimation for the early-failure distribution.

In this regard, the relationship between yield failures due to spot defects and early reliability failures should be investigated. These phenomena are obviously closely related, and the high cost of lifetime testing makes the characterization of the early lifetime distribution from yield data an extremely attractive possibility.

In it's present form, Reliant provides a good indication of the intrinsic lifetime and relative reliability of conductor segments, and should prove a useful design tool in an industrial environment, for identifying highly stressed areas of an interconnect layout. Historically, minimum feature sizes and failure rates measured in industry have shown a steady decrease, while circuit complexity has increased dramatically. These results would appear to contradict the conclusions reached in this thesis that decreasing linewidths and increasing complexity must lead to higher failure rates. It must be borne in mind that currently, Reliant only models the wearout portion of the bathtub curve. The measured decrease in failure rates is ascribable to a reduction in defect density and concomitant decrease in infant mortality failures. The inclusion of local and global process variations into Reliant would produce a bathtub-shaped failure rate prediction, with results more in line with those measured in industry.

The Reliant program has the potential to be extended in three directions. For MOS VLSI circuits, the switch-level simulation techniques presented in Chapter 4 may be implemented. This extension could probably be based on an existing simulator such as RNL, but would definitely require some internal modifications to this tool. To address the bipolar area, new device extraction algorithms must be written. Because of the high current densities existing in ECL circuits, this may prove to be the main area of application for Reliant. Finally, a long term goal should be the inclusion of all significant failure modes to provide an overall reliability prediction.

#### Chapter 6

#### REFERENCES

[Agarwala 1970] B. N. Agarwala, M.T. Attardo, A.P. Ingraham, "Dependence of Electromigration-induced Failure Time on Length and Width of Aluminum Thin Film Conductors", J. App. Phys., vol. 41, 1970, pp 3954 - 3960.

[Anolick 1980] E.S. Anolick, G.R. Nelson, "Low Field Time-dependent Dielectric Integrity", IEEE Trans. Reliability, vol. R-29, Aug. 1980, pp 217 - 221.

[Baglee 1986-2] D.A. Baglee, "Reliability of Trench Capacitors for VLSI Memories", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 215 - 219.

[Baglee 1986-1] D.A. Baglee, "Oxide Reliability in VLSI Technology", Tutorial notes: 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 6-1 to 6-17.

[Barke 1985] E. Barke, "Resistance Calculation from Mask Artwork Data by Finite Element Method", 22nd Design Automation Conf., New York: IEEE, 1985, pp 305-311.

[Black 1968] J.R. Black, "Electromigration Failure Modes in Aluminum Metallization for Semiconductor Devices", Proc. IEEE, vol. 57, 1969, p 1587.

[Black 1974] J.R. Black, "Physics of Electromigration", Proc. 12th Annual Reliab. Phys. Symp., IEEE, 1974, pp 142 - 149.

[Blair 1970] J.C. Blair, P.B. Ghate, C.T. Haywood, "Electromigration induced Failures in Aluminum Film Conductors", Appl. Phys. Lett., vol. 17, 1970, p 281.

[Blech 1976] I.A. Blech, "Electromigration in Thin Aluminum Films on Titanium Nitride", J. App. Phys., vol. 47, Apr. 1976, pp 1203 - 1208.

[Blech 1975] I.A. Blech, E.Kinsbron, "Electromigration in Thin Gold Films on Molybdenum Surfaces", Thin Solid Films, vol. 25, 1975, p 327.

[Bobbio 1974] A. Bobbio, A. Ferro, O. Saracco, "Electromigration Failure in Al Thin Films under Constant and Reversed DC Powering", IEEE Trans. Reliability, vol. R-23, no. 3, Aug. 1974, pp 194 - 201.

[Brown 1986] R.L. Brown, "Multiple Storage Quad Trees: A Simpler Faster Alternative To Bisector List Quad Trees", IEEE Trans. Computer-aided Design, vol. CAD-5, no. 3, July 1986, pp 413 - 419.

[Bryant 1984] R.E. Bryant, "A Switch-Level Model and Simulator for MOS Digital Systems", IEEE Trans. Computers, vol. C-33, no. 2, Feb. 1984, pp 160 - 177.

[d'Heurle 1970] F.M. d'Heurle, I. Ames, "Electromigration in Single-crystal Aluminum Films", Appl. Phys. Lett., vol. 16, p 80.

[d'Heurle 1978-1] F.M. d'Heurle, P.S. Ho, "Electromigration in Thin Films", Thin Films - Interdiffusion and Reactions (Ed: Poate, Tu & Mayer), Wiley- Interscience, 1978, p 244.

[d'Heurle 1978-2] ibid, p 248.

[David 1970] H.A. David, Order Statistics, 2nd Ed., John Wiley & Sons, 1970, p 22.

[DEC 1981] VAX-11 Datatrieve Call Interface Manual, Merrimack, NH: Digital Equipment Corp., 1981.

[Duvvury 1986] C. Duvvury, R.A. McPhee, D.A. Baglee, R.N. Rountree, "ESD Protection Reliability in  $1\mu m$  CMOS Technology", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 199 - 205.

[Eitan 1981] B. Eitan, D. Frohman-Bentchkowsky, "Hot-electron Injection in n- Channel MOS Devices", IEEE Trans. Electron Dev., vol. 28, Mar. 1981, pp 328 - 340.

[English 1972] A.T. English, K.L. Tai, P.A. Turner, "Electromigration in Conductor Stripes under Pulsed DC Powering", Appl. Phys. Lett., vol. 21, no. 8, Oct. 1972, pp 397 - 398.

[English 1983] A.T. English, E. Kinsbron, "Electromigration Transport Mobility Associated with Pulsed Direct Current in Fine-grained Evaporated Al - 0.5%Cu Thin Films", J. App. Phys., vol. 54, Jan. 1983, pp 275 - 280.

[Finkel 1974] R.A. Finkel, J.L. Bentley, "Quad Trees: A Data Structure for Retrieval on Composite Keys", Acta Informatica, vol. 4, 1974, pp 1 - 9.

[Frost 1987] D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol. R-36, June 1987, pp 234 - 242.

[Gardner 1987] D.S. Gardner, J.D. Meindl, K.C. Saraswat, "Interconnection and Electromigration Scaling Theory", IEEE Trans. Electron Devices, vol. ED-34, no. 3, March 1987, pp 633 - 643.

[GE 1986] Macrocell Library 3.1, Document #RTP600R01, Research Triangle Park, NC:1986, General Electric Company, pp 7.4.22 - 7.4.24.

[Gupta 1982] A. Gupta, R.W. Hon, Two papers on Circuit Extraction, Research Report CMU-CS-82-147, Pittsburgh: Dept. of Computer Science, Carnegie-Mellon University, 1982, pp 1 - 10.

[Hall 1986] J.E. Hall, D.E. Hocevar, P. Yang, M.J. McGraw, "SPIDER: A CAD System for Checking Current Density and Voltage Drop in VLSI Metallization Patterns", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Harrison 1988] J.W. Harrison, private communication.

[Hayes 1984] J.P. Hayes, "Fault Modeling for Digital MOS Integrated Circuits", IEEE Trans. Computer-aided Design, vol. CAD-3, no. 3, July 1984, pp 200 - 207.

[Hohol 1986] T.S. Hohol, L.A. Glasser, "RELIC: A Reliability Simulator for Integrated Circuits", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Horowitz 1983] M. Horowitz, R.W. Dutton, "Resistance Extraction from Mask Layout Data", IEEE Trans. Computer-aided Design, vol. CAD-2, no. 3, July 1983, pp 145-150.

[Huntingdon 1961] H.B. Huntington, A.R. Grone, "Current Induced Marker Motion in Gold Wires", J. Phys. Chem. Solids, vol. 20, 1961, p 76.

[Ishiuchi 1986] H. Ishiuchi, T. Watanabe, T. Tanaka, K. Kishi, M. Ishikawa, N. Goto, K. Kohyama, H. Noji, O.Ozawa, "Soft Error Rate Reduction in Dynamic Memory with Trench Capacitor Cell", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 235 - 238.

[Iyer 1984] S.S. Iyer, C-Y. Ting, "Electromigration Studies of Submicrometer Linewidth Al-Cu Conductors", IEEE Trans. Electron Dev., vol. 31, 1984, pp 1468 -1472.

[Johnson 1970] N.L. Johnson, S. Katz, Continuous Univariate Distributions: 1, Houghton-Mifflin, 1970, p 253.

[Kemp 1988] K.G. Kemp, K.F. Poole and D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

[Kinsbron 1978] E. Kinsbron, C.M. Melliar-Smith, A.T. English, T. Chynoweth, "Failure of Small Thin Film Conductors due to High Current-density Pulses", 16th Ann. Reliab. Phys. Symp., New York: IEEE, 1978, pp 248 - 254.

[Kinsbron 1980] E. Kinsbron, "A Model for the Width Dependence of Electromigration Lifetimes in Aluminum Thin Film Stripes", App. Phys. Lett., vol. 36, 1980, pp 968 - 970.

[LaCombe 1986] D.J. LaCombe, E.L. Parks, "The Distribution of Electromigration Failures", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 1 - 6.

[Lycoudes 1980] N.E. Lycoudes, C.C. Childers, "Semiconductor Instability Failure Mechanisms", IEEE Trans. Reliability, vol. R-29, Aug. 1980, pp 237 - 249.

[Maly 1985] W. Maly, "Modeling of Lithography Related Yield Losses for CAD of VLSI Circuits", IEEE Trans. Computer-aided Design, vol. CAD-4, no. 3, July 1985, pp 166 - 177.

[McCreight 1980] E.M. McCreight, "Efficient Algorithms for Enumerating Intersecting Intervals and Rectangles", Research Report CSL-80-9, Palo Alto, CA: Xerox PARC, 1980.

[McPherson 1986] J.W. McPherson, "Stress-dependent Activation Energy", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 12-18.

[Mead 1981] C. Mead and L. Conway (Eds.), Introduction to VLSI Systems, Reading, Mass.: Addison-Wesley, 1980.

[Miller 1978] R.J. Miller, "Electromigration Failure under Pulse Test Conditions", 16th Ann. Reliab. Phys. Symp., New York: IEEE, 1978, pp 241 - 247.

[Nikawa 1981] K. Nikawa, "Monte Carlo Calculations Based on the Generalized Electromigration Failure Model", 19th Int. Reliab. Phys. Symp., New York: IEEE, 1981, pp 175 - 181.

[O'Connor 1981] P.D.T. O'Connor, Practical Reliability Engineering, Heyden, 1981, p3.

[Ogura 1981] S. Ogura, P.J. Tsang, W.W. Walker, D.L. Critchlow, J.F. Shepard, "Elimination of Hot Electron Gate Current by the Lightly Doped Drain-source Structure", IEDM Tech. Digest, 1981, pp 651 - 654.

[Partridge 1985] J. Partridge, G. Littlefield, "Aluminum Electromigration Parameters", 23rd Int. Reliab. Phys. Symp., New York: IEEE, 1985, p 119.

[Prokop 1972] G.S. Prokop, R.R. Joseph, "Electromigration Failure at Aluminum-Silicon Contacts", J.Appl. Phys., vol. 43, no. 6, June 1972, pp

2595 - 2602.

[Razdan 1986] R. Razdan, A.J. Strojwas, "A Statistical Design Rule Developer", IEEE Trans. Computer-aided Design, vol. CAD-5, no. 4, Oct. 1986, pp 508 - 520.

[Rosenberg 1985] J.B. Rosenberg, "Geographical Data Structures Compared: A Study of Data Structures Supporting Region Queries", IEEE Trans. Computer- aided Design, vol. CAD-4, no. 1, Jan. 1985, pp 53 - 67.

[Sabnis 1986] A.G. Sabnis, "Hot Carrier Damage Mechanisms", Tutorial notes: 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 1.1 - 1.21.

[Sai-Halasz 1982] G. Sai-Halasz, M.R. Wordeman, R.H. Dennard, "Alpha-particle-induced Soft Error Rate in VLSI Circuits", IEEE Trans. Electron Dev., vol. 29, Apr 1982, pp 725 - 731.

[Schoen 1980] J.M Schoen, "A Model of Electromigration Failure under Pulsed Condition", J. Appl. Phys., vol. 51, no. 1 Jan. 1980, pp 508 - 512.

[Scoggan 1975] G.A. Scoggan, B.N. Agarwala, P.P. Peressini, A. Brouillard, "Width Dependence of Electromigration Life in Al-Cu, Al-Cu-Si and Ag Conductors", Proc. 13th Int. Reliab. Phys. Symp., IEEE, 1975, pp 151 - 158.

[Stapper 1983] C.H. Stapper, "Modeling of Integrated Circuit Defect Densities", IBM J. Res. Develop., vol. 27, Nov. 1983, pp 549 - 557.

[Sze 1981] S.M. Sze, Physics of Semiconductor Devices, 2nd Edition, Wiley-Interscience, 1981, p 378. [Takeda 1982] E. Takeda, et al, "Sub-

micrometer MOSFET Structure for Minimizing Hot-carrier Generation", IEEE J. Solid State Circuits, vol. 17, Apr. 1982, pp 241 - 247.

[Tasch 1978] A.F. Tasch, Jr., P.K. Chatterjee, H-S. Fu, T.C. Holloway, "The HI-C RAM Cell Concept", IEEE Trans. Electron. Dev., vol. ED-25, no. 1, Jan. 1978, pp 33 - 41.

[Terman 1985] C. Terman, RNL 4.2 User's Guide, Seattle, WA: UW/NW VLSI Consortium, Sieg Hall, FR-35, University of Washington, 1985,

pp 2 - 9.

[Towner 1983] J.M. Towner, E.P. van de Ven, "Aluminum Electromigration under Pulsed DC Conditions", 21st Int. Reliab. Phys. Symp., New York: IEEE, 1983, pp 36 - 39.

[Ullman 1984] J.D. Ullman, Computational Aspects of VLSI, Rockville, MD: Computer Science Press, 1984, pp 382 - 393.

[Vaidya 1980] S. Vaidya, T.T. Sheng, A.K. Sinha, "Linewidth Dependence of Electromigration in Evaporated Al-0.5%Cu", App. Phys. Lett., vol. 36, 1980, pp 464 - 466.

[Vladimirescu 1981] A. Vladimirescu, K. Zhang, A.R. Netwon, D.O. Pederson, A. Sangiovanni-Vincentelli, SPICE User's Guide, Berkeley, CA: Dept. of Electrical Engineering and Computer Sciences, University of California, 1981.

[Woods 1984] M.H. Woods, "The Implications of Scaling on VLSI Reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp 6-1 to 6-30. endlist

## Appendix A

## DERIVATION OF THE FAILURE RATE OF A SERIES-CONNECTED SYSTEM

$$h_s(t) = \frac{p_s(t)}{1 - P_s(t)} \tag{A.1}$$

$$= \frac{\frac{d}{dt}P_s}{1 - P_s(t)} \tag{A.2}$$

$$= \frac{\frac{d}{dt} \left(1 - \prod_{i=1}^{n} \left[1 - P_i(t)\right]\right)}{\prod_{j=1}^{n} \left[1 - P_j(t)\right]}$$
(A.3)

$$= \frac{\sum_{i=1}^{n} \left( \frac{d}{dt} P_i(t) \prod_{k=1}^{i-1} \left[ 1 - P_k(t) \right] \prod_{l=i+1}^{n} \left[ 1 - P_l(t) \right] \right)}{\prod_{i=1}^{n} \left[ 1 - P_i(t) \right]}$$
(A.4)

$$= \sum_{i=1}^{n} \frac{\frac{d}{di} P_i(t)}{[1 - P_i(t)]} \tag{A.5}$$

$$= \sum_{i=1}^{n} h_i(t) \tag{A.6}$$

#### Appendix B

# FITTING A LOGNORMAL DISTRIBUTION TO THE MINIMUM ORDER STATISTIC

It will be shown that the minimum order statistic predicts approximately lognormal behavior for early failures. Consider a conductor consisting of n identical segments, each having parameters  $t_{50}$  and  $\sigma$ . From equations (2.6), (2.8) and (2.10), the failure rate of the conductor based on a series model is

$$h_s(t) \approx \frac{n}{\sqrt{2\pi}\sigma t} \exp\left(\frac{1}{2} \left[\frac{\ln(t) - \ln(t_{50})}{\sigma}\right]^2\right)$$
 (B.1)

Writing

$$x = \ln(t) \tag{B.2}$$

$$\mu = \ln(t_{50}) \tag{B.3}$$

$$G_s(x) = \ln(h_s(\exp(x)))$$
 (B.4)

yields a second order polynomial in x:

$$G_s(x) = \ln(n) - \ln(\sqrt{2\pi}\sigma) - x - 0.5 \left[\frac{\mu - x}{\sigma}\right]^2$$
 (B.5)

The derivative of  $G_s(x)$  is

$$\frac{d}{dx}G_s = \frac{\mu - x}{\sigma^2} - 1\tag{B.6}$$

Alternatively, the conductor may be modeled as a single element described by a lognormal distribution with parameters  $t'_{50}$  and  $\sigma'$ :

$$h'(t) \approx \frac{1}{\sqrt{2\pi}\sigma't} \exp\left(\frac{1}{2} \left[\frac{\ln(t) - \ln(t'_{50})}{\sigma'}\right]^2\right)$$
 (B.7)

Writing

$$x = \ln(t) \tag{B.8}$$

$$\mu' = \ln(t'_{50}) \tag{B.9}$$

$$G'(x) = \ln(h'(\exp(x)))$$
 (B.10)

we obtain:

$$G'(x) = -\ln(\sqrt{2\pi}\sigma') - x - 0.5 \left[\frac{\mu' - x}{\sigma'}\right]^2$$
 (B.11)

The derivative of G'(x) is

$$\frac{d}{dx}G'(x) = \frac{\mu' - x}{\sigma'^2} - 1$$
 (B.12)

G'(x) may be fitted to  $G_s(x)$  in the vicinity of the point a by setting

$$G'(x) = G_s(x) \tag{B.13}$$

and

$$\frac{d}{dx}G'(x) = \frac{d}{dx}G_s(x) \tag{B.14}$$

with x = a.  $\sigma_s$  Is determined by solving the resultant transcendental equation

$$2\ln\left(\frac{n\sigma'}{\sigma}\right) = \left[\frac{\mu - a}{\sigma}\right]^2 \left[1 - \left(\frac{\sigma'}{\sigma}\right)^2\right]$$
 (B.15)

With  $\sigma'$  known,  $\mu'$  may be determined as

$$\mu' = a + \frac{\sigma'}{\sigma}(\mu - a) \tag{B.16}$$

The median time to failure is

$$t_{50}' = \exp(\mu') \tag{B.17}$$

The results of the curve fit appear in Fig. 2.5. The normalized variation of  $t'_{50}$  and  $\sigma'$  with n appear in Figs. 2.6 and 2.7 respectively.

## Appendix C

# VARIATION OF $t_{50}$ AND $\sigma$ WITH n

The Pdf of a series connected system of n identical elements is

$$P_s(t) = 1 - (1 - P(t))^n$$
 (C.1)

Let  $t_{50s}$  be the Mtf and  $\sigma_s$  the Standard Deviation of the system lifetime. If  $t = t_{50s}$ , then  $P_s(t) = 0.5$ . Equation (C.1) may be solved for the corresponding value of P(t) as follows:

$$P(t_{50s}) = 1 - (1 - P_s(t_{50s}))^{\frac{1}{n}}$$
 (C.2)

$$= 1 - (0.5)^{\frac{1}{n}} \tag{C.3}$$

Using equation (C.3), we may determine  $t_{50s}$  for a known distribution P(t). The normalized Mtf is shown as a function of n in Fig. C.1.

Determining  $\sigma_s$  of the minimum order statistic is complex for the lognormal distribution. The following approximate method was used. The prototype normal distribution P(x), where  $x = \ln(t)$  was approximated by a 3-parameter Weibull distribution, with c = 3.288 [Johnson 1970]:

$$P(x) = 1 - \exp\left(-\left[\frac{x-\beta}{\alpha}\right]^{c}\right) \tag{C.4}$$

Substituting (C.4) in (C.1) yields the Cdf of the system, which is also a Weibull distribution:

$$P_s(x) = 1 - \exp\left(-\left[\frac{x-\beta}{\alpha'}\right]^c\right)$$
 (C.5)

where

$$\alpha' = \alpha n^{-\frac{1}{c}} \tag{C.6}$$



Figure C.1:  $t_{50s}/t_{50}$  vs. n

The standard deviation of  $P_s(x)$  is

$$\sigma_s = \alpha n^{-\frac{1}{c}} \left[ \Gamma(2c^{-1} + 1) - (\Gamma(c^{-1} + 1))^2 \right]^{-\frac{1}{2}}$$
 (C.7)

and therefore

$$\frac{\sigma_s}{\sigma} = n^{-\frac{1}{c}}$$

$$= n^{-0.304}$$
(C.8)

Fig. C.2 shows the variation in  $\sigma_s/\sigma$  as a function of n.



Figure C.2:  $\sigma_s/\sigma$  vs. n

## Appendix D

## RELIABILITY ANALYSIS OF A 3100 GATE CMOS STANDARD CELL DEVICE

# D.1 Approximate Models of Contact, Via and Step Segments

It is assumed that the failure rates of these segment types have the same dependency on time, current density and temperature as a Run segment. In all three cases, a conductor crosses a beveled discontinuity. It is assumed that the Aluminum has been deposited to a uniform height d in the vertical direction. If the height of the discontinuity is h and the bevel angle  $\Theta$ , then the metal thickness on the sidewall is

$$d_o = d\cos(\Theta) \tag{D.1}$$

The length of the sidewall is

$$L_o = \frac{h}{\sin(\Theta)} \tag{D.2}$$

Only the sidewall is considered to contribute to the failure rate. Steps, Vias and Contacts are therefore represented as Runs having length =  $L_o$  and thickness =  $d_o$ .



Figure D.1: Layout of  $V_{DD}$  and Ground buses

# D.2 Derivation of The Failure Rate of a Power Bus Section

The  $V_{DD}$  and Ground bus structures are shown in Fig. D.1. Gates are assumed to be uniformly distibuted along each branch of the bus, a distance  $L_g$  apart. Each gate draws a current I from the  $V_{DD}$  bus and feeds it into the Ground bus. Numbering the bus segments from 1 (furthest from the bonding pad) to n, the current in the j-th segment is jI.

It is assumed that  $J \leq 5.0 \times 10^5 A/cm^2$ , therefore  $J_{eff} \approx J$ . From equation (2.17), the Mtf of the j-th segment is

$$t_{50j} = \frac{\gamma}{jI} \tag{D.3}$$

Where

$$\gamma = GWd \exp(\frac{E_a}{kT}) \tag{D.4}$$

The failure rate of a bus branch is

$$h_B(t) = \sum_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-0.5 \left[\frac{\ln(t) - \ln(\frac{\gamma}{jI})}{\sigma}\right]^2\right)$$
 (D.5)

It is possible to derive a closed form expression for  $h_B(t)$ . Let  $x = \ln(t)$ . The failure rate of the j-th segment may be written as follows:

$$h_{j}(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5x^{2} + x\ln(\gamma) - x\ln(jI) - 0.5(\ln(\gamma) - \ln(jI))^{2}])$$
(D.6)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5x^2 + x\ln(\gamma) - 0.5\ln^2(\gamma) + \ln(jI)(\ln(\gamma) - x - 0.5\ln(jI))])$$
 (D.7)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5(\frac{\ln(t) - \ln(\gamma)}{\sigma})^{2}] \times \exp([\frac{\ln(\gamma) - \ln(t) - 0.5\ln(jI)}{\sigma}]^{2}\ln(jI))$$
(D.8)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(-0.5\left[\frac{\ln(t) - \ln(\gamma)}{\sigma}\right]^2) \times \exp(\Phi(jI)\ln(jI))$$
 (D.9)

where

$$\Phi(jI) = \frac{\ln\left(\frac{\gamma}{t\sqrt{jI}}\right)}{\sigma^2} \tag{D.10}$$

Therefore

$$h_B(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-0.5 \left[\frac{\ln(t) - \ln(\gamma)}{\sigma}\right]^2\right) \sum_{j=1}^n (jI)^{\Phi(jI)} \quad (D.11)$$
$$= C(t) \sum_{j=1}^n (jI)^{\Phi(jI)} \quad (D.12)$$

The summation may be approximated by an integral for large n, with j a continuous variable:

$$h_B(t) \approx C(t) \int_0^n (jI)^{\Phi(jI)} dj$$
 (D.13)

$$= \frac{C(t)\sigma^2 n}{\ln(\frac{\gamma}{nIt})} \exp\left(\frac{\ln(nI)\ln(\frac{\gamma}{\sqrt{nIt}})}{\sigma^2}\right)$$
 (D.14)

## Appendix E

## A TEST CHIP FOR CALIBRATION OF RELIABILITY MODELS

A test chip for calibrating the reliability models used in this thesis was designed. This device contains the following test structures:

- 1. METAL1 OVER POLY BARS:  $32\mu m \times 250$
- 2. METAL1 OVER POLY BARS:  $16\mu m \times 250$
- 3. METAL1 OVER POLY BARS:  $8\mu m \times 250$
- 4. METAL1 OVER POLY BARS:  $4\mu m \times 250$
- 5. METAL1 OVER POLY BARS:  $2\mu m \times 250$
- 6. 52 VIAS:  $8\mu m \times 8\mu m$
- 7. 44 VIAS:  $4\mu m \times 16\mu m$
- 8. 66 VIAS:  $4\mu m \times 8\mu m$
- 9. 82 VIAS:  $4\mu m \times 4\mu m$
- 10. 76 VIAS:  $2\mu m \times 8\mu m$
- 11. 100 VIAS:  $2\mu m \times 4\mu m$
- 12. 110 VIAS:  $2\mu m \times 2\mu m$
- 13. 152 CONTACTS:  $2\mu m \times 2\mu m$

- 14. 64 CONTACTS:  $8\mu m \times 8\mu m$
- 15. 64 CONTACTS:  $4\mu m \times 16\mu m$
- 16. 88 CONTACTS:  $4\mu m \times 8\mu m$
- 17. 106 CONTACTS:  $4\mu m \times 4\mu m$
- 18. 106 CONTACTS:  $2\mu m \times 8\mu m$
- 19. 134 CONTACTS:  $2\mu m \times 4\mu m$
- 20. RUNS (M2):  $4\mu m \times 1000 \mu m$
- 21. RUNS (M2):  $8\mu m \times 1000 \mu m$
- 22. RUNS (M2):  $16\mu m \times 1000\mu m$
- 23. RUNS (M2):  $32\mu m \times 1000\mu m$
- 24. RUNS (M1):  $2\mu m \times 1000 \mu m$
- 25. RUNS (M1):  $4\mu m \times 1000 \mu m$
- 26. RUNS (M1):  $8\mu m \times 1000 \mu m$
- 27. RUNS (M1):  $16\mu m \times 1000 \mu m$
- 28. RUNS (M1):  $32\mu m \times 1000\mu m$

The layout of the test chip appears in Fig. E.1.





#### Appendix F

## PUBLICATIONS BY THE AUTHOR WHICH RELATE TO THIS THESIS

- D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Device Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol. R-36, June 1987, pp 234 242.
- D.F. Frost, K.F. Poole, D.A. Haeussler, "Reliant: a Reliability Analysis Tool for VLSI Interconnects", Custom Integrated Circuits Conf., New York: IEEE, 1988, pp 27.8.1 27.8.4.
  - The authors were invited to submit this paper, in an extended form, for publication in IEEE Journal of Solid State Circuits. This paper has recently been accepted for publication.
- D.F.Frost, K.F. Poole, "Estimation of VLSI Interconnect Reliability using a Circuit Simulator", Southeastern Symposium on Systems Theory (SSST-87), New York: IEEE, 1987, 5 pages.
- K.G. Kemp, K.F. Poole, D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

Copies of these papers appear on the following pages.

#### A Method for Predicting VLSI-Device Reliability Using Series Models for Failure Mechanisms

David F. Frost, Member IEEE Clemson University, Clemson Kelvin F. Poole, Member IEEE Clemson University, Clemson

Key Words—Order statistic, VLSI device, Failure mechanism.

Reader Aids-

Purpose: Advance the state-of-the-art
Special math needed for explanations: Statistics
Special math needed to use results: None

Results useful to: IC design engineers, CAD tool developers.

Abstract—A series model is used to determine the intrinsic reliability of an integrated circuit. An analysis of electromigration in the interconnect system of a 200 000 transistor VLSI device, shows that the failure rate exceeds 10 FIT (failures per 10<sup>9</sup> hours) within 2 years when operating at a temperature of 80° C. These results indicate the importance of fundamental wear-out mechanisms as factors in VLSI device reliability, under usual operating conditions. The analysis, as applied to a generic chip, predicts that temperature, burn-in, and complexity all adversely affect the device reliability.

The paper demonstrates the feasibility of using the information available in the design database together with specific failure models to predict (during the design phase) the reliability of an IC. These techniques can be used to develop a CAD tool for reliability prediction.

#### 1. INTRODUCTION

The reliability of an integrated circuit (IC) is the probability that it will perform its required function under stated conditions for a stated period of time [1]. Methods of enhancing the reliability of ICs generally fall into one of three categories:

- Improving the reliability of the part by better design of its internal components and/or better manufacturing methods;
  - Using more effective screening procedures;
- Using active or standby redundancy within the IC, enabling it to perform its function despite the failure of some internal components.

The reliability of an IC is traditionally pictured in terms of the bath-tub curve of failure rate versus time (figure 1). Burn-in is used to remove devices that contain gross built-in flaws which normally fail during the infant-mortality phase. Physical mechanisms which cause device failure are often modeled by an Arrhenius relationship:





Fig. 1. Qualitative description of failure rate versus time.

Notation

 $\lambda(T)$  failure rate at temperature T

T temperature

 $T_R$  reference temperature

 $E_a$  activation energy for the particular failure

mechanism

k Boltzmann's constant

Operation at an elevated temperature increases the failure rate, thus accelerating the passage of the infant-mortality phase.

The reliability of devices after screening is currently predicted using semi-empirical failure rate models based on the measured lifetimes of a large number of devices. By far the most widely-used of these models is Mil-Hdbk-217D [2]:

$$\lambda = \Pi_Q \left[ C_1 \Pi_T \Pi_V + (C_2 + C_3) \Pi_E \right] \Pi_L$$
(failures per 10<sup>6</sup> hrs.) (1.2)

Notation

 $\Pi_{\mathbf{Q}}$  quality factor dependent on burn-in procedure applied

 $\Pi_T$  temperature acceleration factor

 $\Pi_V$  voltage derating stress factor

 $\Pi_E$  application environment factor

 $\Pi_L$  learning factor

 $C_1, C_2, C_3$  complexity failure-rates dependent on the number of equivalent gates, number of pins, and package type.

Values for these constants are tabulated [2] for various devices and technologies. Data for this model are constantly under revision to accommodate new technologies, and proposals for improvements to the model appear from time to time [3 - 8].

Correct application of Mil-Hdbk-217 requires an understanding of the underlying assumptions and inherent limitations of this type of model [9], which was developed to answer the system design engineer's need to predict the reliability of a system containing a large number of components, including ICs. The individual IC is treated strictly as a single component and the model does not relate the failure rate to its specific internal structure (eg, mask layout) or process parameters. The mask layout is reflected indirectly in the complexity factor  $C_1$ , and all constants in the equation have been determined as average values for a specific family of fabrication processes (eg, CMOS). The IC designer requires a model which can predict the reliability-implications of structural design decisions such as scaling the dimensions of a transistor or using narrower conductors. The Mil-Hdbk-217 model is obviously of no value in making such predictions. Also, its ability to predict the reliability of current and future VLSI devices is open to suspicion. For example, consider commercial grade MOS VLSI devices manufactured in a mature technology, operating at room temperature in a ground, benign environment. Application of (1.2) yields the results shown in table I.

Table I
Failure Rate vs. Gate Count for MOS Devices
According to Mil-Hdbk-217D.

| G (gate count)                                                           | λ<br>(Fit)   |
|--------------------------------------------------------------------------|--------------|
| 10 <sup>3</sup><br>10 <sup>4</sup><br>10 <sup>5</sup><br>10 <sup>6</sup> | 1268<br>3134 |
| 105                                                                      | 8501         |
| 10°                                                                      | 23960        |

1 Fit = 1 failure/10<sup>9</sup> hours

These results are unrealistic, as they imply that a typical 16-bit microprocessor ( $10^4$  gates) in a personal computer application would have a failure rate of 3134 FIT, which is unacceptably high. The authors of the model are aware of this and a new VLSI reliability model for G > 3000 is currently under development [10].

This paper shows that by taking wear-out failure into account, it is possible to analyze a VLSI design and to provide an accurate assessment (during the design phase) of the wear-out limited reliability of the IC. A series model for calculating the reliability of an IC, is presented. Examples of the application of this method to electromigration

in complex VLSI interconnects are given. Our results show that electromigration wear-out produces an unacceptably high failure rate for VLSI as we approach the 10<sup>6</sup>-transistor chip. The trends to increased complexity and concomitant reduction of feature size accelerate wear-out and could ultimately limit the useful life of the component.

#### 2. A SERIES MODEL OF IC RELIABILITY

#### Assumptions

- 1. The IC mask produces many basic elements that are not identically distributed.
- 2. The distributions of life for each element are known.
- 3. The failure of any element causes the IC to fail (series system).
- 4. The states of the elements (good, failed) are mutually statistically independent.

#### Notation

 $P_i(t)$  Cdf of element i

 $\lambda_i(t)$  failure rate of element i

 $\lambda_s(t)$  failure rate of the IC ("system")

n number of elements in the IC

 $F_1(t)$  Cdf corresponding to  $\lambda_s(t)$ 

The device failure-time Cdf is the well-known series formula:

$$F_1(t) = 1 - \prod_{i=1}^{n} (1 - P_i(t)). \tag{2.1}$$

The device failure rate can be determined using the fact that the failure rate of a series system equals the sum of the failure rates of its elements [21]:

$$\lambda_s(t) = \sum_{i=1}^n \lambda_i(t). \tag{2.2}$$

For 
$$\lambda_1(t) = \lambda_2(t) = \dots = \lambda_n(t) = \lambda(t)$$
, eq. (2.2) reduces to:

$$\lambda_s(t) = n \, \lambda(t) \tag{2.3}$$

The series model, which is also a minimum-order-statistic model, can be applied at the chip level, but it also describes the relationship of an individual failure mechanism to the dimensions of the structure in which it occurs.

The physical processes which cause wear-out have been widely studied [11-20]. These include:

- oxide shorts [12-13];
- metallization failure due to electromigration or corrosion [14-16];
- threshold-voltage shifting effects in MOS devices
   [17-19];
  - · alpha-particle induced soft errors [20].

In general, failure mechanisms are reactions which cause the directed movement of a physical quantity such as material or charge. For electromigration, a void is formed in a conductor and catastrophic failure occurs when the cross-sectional area of the defect equals the cross-sectional area of the conductor. Consider the conductor as consisting of n "identical" elements connected in series, each potentially containing a defect. The conductor fails when any one element fails, and so the reliability of the conductor is that of the minimum order statistic of the n elements. A similar situation accrues for oxide breakdown, where a defect develops through the oxide layer, leading to a catastrophic short circuit when the length of the defect is equal to the oxide thickness. A large dielectric area can be considered as a parallel connection of n small elements, with each element potentially enclosing a defect. The reliability of the n elements is once again given by the reliability of the minimum order statistic.

A defect occurs in an IC when a failure mechanism has proceeded for a sufficient time (the time to failure) to degrade the circuit performance beyond acceptable limits. Defects fall into two categories:

- Structural defects. These represent abrupt changes in circuit topology caused for example by a conductor becoming an open-circuit.
- Performance defects. Some failure mechanisms produce a continuous degradation of circuit performance until some threshold or acceptability is exceeded. This type of defect can be produced by hot electron injection, where the threshold voltage of an MOS transistor shifts with time until circuit operation becomes marginal.

From the perspective of modelling the IC time to failure, the two defect types may be treated in the same way.

#### 3. ELECTROMIGRATION

Electromigration in metal conductors has been widely studied during the last 15 years [14-16, 22-31]. When an electron current passes through a conductor, some of the momentum of the electrons is transferred to the metal atoms, resulting in a movement of metal in the direction of electron flow. When a flux divergence occurs, the rates of mass transport towards and away from a point differ and void or hillock formation results. Flux divergences may be caused by microscopic inhomogenities in the conductor such as grain boundary triple points or grain size variations. The bulk of the published literature deals with straight conductors and it is assumed that the incidence of electromigration is related to grain-size effects. Flux divergence may also arise in more complex conductor patterns because of variations in the effective conductor width or thickness — current crowding occurs on the inside of a 90° bend, for example. The analysis at high current densities is further complicated by local thermal gradients which influence the rate of mass transport.

#### 3.1 Classification of Conductor Shapes

In applying the model to electromigration, each conductor is fractured into its component elements and the conductor failure rate is calculated as the sum of the failure rates of each individual element. Four basic element types are identified; see figure 2:

- straight segments of length L
- 90° bends
- steps caused by thenon-planar surface beneath the conductor
  - · contact windows or vias.



Fig. 2. Basic conductor shapes found in integrated circuit interconnects: a) straignt section, b) 90° bend, 3) step over a surface discontinuity, d) contact window.

This analysis assumes that the states of all elements are statistically independent. This places two restrictions on the validity of the model.

- 1. It is limited to low current densities ( $< 10^6 A/cm^2$ ) where thermal effects are negligible.
- 2. The minimum element length must be greater than the length of the locality which influences the growth of a single defect. Assuming that voids grow along grain boundaries [22], the mean size of a defect is of the order of the mean grain size, typically less the 3  $\mu m$ . La Combe & Parks [23] found that a hillock always forms within 10  $\mu m$  of a void, indicating that interactions may occur over this distance. However, their measurements were conducted at a current density of  $2 \times 10^6 \, A/cm^2$  and under those conditions thermal interaction would have played a role.

Early failures are associated with highly localized defects while late failures may involve mass transport over larger distances. In the remainder of this section, a basic element length of  $10 \, \mu \mathrm{m}$  is used and the results are valid for the first 10% of all failures.

#### 3.2 Analysis of Straight Segments

Consider a straight conductor element having length  $L_E = 10 \,\mu m$  and variable width W. Measurements of electromigration time to failure show a lognormal pdf of the form:

$$P_E(t) = \frac{1}{\sigma\sqrt{2\pi} t} \exp\left[-\frac{1}{2}\left(\frac{\ln t - \ln t_{50}}{\sigma}\right)^2\right], \quad (3.1)$$

 $t_{50} \equiv \text{median time to failure}$  $\sigma \equiv \text{standard deviation of ln time, which is independent of time.}$ 

The median time to failure is a function of current density, temperature, and conductor width:

$$t_{50} = A(W) J^{-N} \exp\left[\frac{E_a}{kT}\right], \tag{3.2}$$

 $J \equiv \text{current density}$ 

 $E_{\alpha} \equiv \text{activation energy} = 0.54 \ eV [24]$ 

 $N \equiv$  an exponent approximately equal to 1 for

 $J < 4 \times 10^5 \, A/cm^2$  [25]

A(w) = a material constant that is function of width.

Based on the experimental data of Kinsbron [26] the following empirical expression for A(W) was derived for Al - 0.5 wt % Cu conductor, 250  $\mu m$  long and 5000 Å thick:

$$A(W) = 189(21.5W - 66 + 250W^{-1.7})$$
 (3.3)



Fig. 3. Variation of median time to failure (mtf) of n elements as a function of n (normalized to the mtf of a single element). Parameter  $\sigma$  is the standard deviation of a single element.

Electromigration lifetime is generally measured on conductors several hundred microns long, as these fail quicker than short conductors. The Kinsbron data were scaled to a  $10 \ \mu m$  long conductor by solving (2.1) for  $F_1(t) = 0.5$  and n = 250/10 = 25. The scaling ratio depends on  $\sigma$  (see

figure 3) which is in turn a function of W. J is written in terms of width and thickness, as follows:





Fig. 4. Partitioning of a simple conductor joining two contact windows into its component shapes.

$$J = I \times 10^8 / Wd$$

 $I \equiv \text{current } (A);$ 

 $d = \text{thickness } (\mu m).$ 

The median time to failure of the basic conductor element is therefore

$$t_{50}(W, d, I, T) = 1.523 \times 10^{-5} \frac{Wd}{I} \left(W - 3.07 + \frac{11.63}{W^{1.7}}\right) \exp\left(\frac{5800}{T}\right)$$
 (3.4)

Kinsbron also measured  $\sigma$  as a function of width. Scaling of this parameter to shorter conductor lengths may also be achieved using the *series* model, but this is complex for the lognormal distribution. A much simpler scaling procedure is possible by noting that  $\sigma$  is the standard deviation of a s-normal distribution P(x), where  $x = \ln(t)$ . This s-normal

distribution may be closely approximated by the following 3-parameter Weibull distribution, with c=3.288 [27]:

$$P(x) = 1 - \exp[-\{(x + \xi_0)/\alpha\}^c].$$
 (3.5)

Substituting (3.5) in (2.1), yields the Cdf of the device failure time which is also a Weibull distribution:

$$F_1(t) = 1 - \exp\left[-\left\{\frac{[x - \xi_0]}{\alpha'}\right\}^{c'}\right], \qquad (3.6)$$

$$\alpha' \; \equiv \; \frac{\alpha}{n^{1/c}} \;\; .$$

The standard deviation of  $F_1(t)$  is:

$$\sigma_n = \frac{\alpha}{n^{1/c}} \left[ \Gamma(2c^{-1} + 1) - \left[ \Gamma(c^{-1} + 1) \right]^2 \right]^{-1/2}, \tag{3.7}$$

and therefore

$$\frac{\sigma_n}{\sigma} = n^{-1/c} = n^{-0.304}. (3.8)$$

The Kinsbron data were scaled using (3.8) and the following function was fitted to the result:

$$\sigma(W) = \frac{2.192}{W^{2.625}} + 0.787 \tag{3.9}$$

Let the failure rate of the basic element be  $\lambda_E(t, W, d, I, T)$ . If the analysis is limited to the first 10% of failures  $(P_E(t) \le 0.1)$ , the following approximation may be made, with an error of less than 10%:

$$\lambda_{E}(t, W, d, I, T) = P_{E}(t) = \frac{1}{\sigma(W)\sqrt{2\pi} t}$$

$$\cdot \exp\left[-\frac{1}{2}\left(\frac{\ln t - \ln t_{50}(W, d, I, T)}{\sigma(W)}\right)^{2}\right]. \quad (3.10)$$

The failure rate of a conductor of length L is simply—

$$\lambda(t, W, d, I, T) = \frac{L}{L_E} \lambda_E(t, W, d, I, T)$$

$$= n_E \lambda_E(t, W, d, I, T) \tag{3.11}$$

 $n_E = number of basic elements.$ 

#### 3.3 Analysis of Bends, Steps and Windows

In expanding the analysis to include bends, oxide steps, and contact windows it is assumed that the failure rates of these elements have the same dependency on t, I, and T as does the basic element. Each feature x is modelled as having the same reliability as a straight conductor of length  $L_x$ , thickness  $d_x$ , and width  $W_x$ .

Refer to figure 4; the failure rate of a simple conductor which conveys a current *I* between two contact windows is:

$$\lambda_c = \frac{2L_W}{L_E} \lambda_E(t, W_W, d_W, I, T) + n_b \frac{L_b}{L_E} \lambda_E(t, W_b, d, I, T)$$

$$+ \frac{n_o L_o}{L_E} \lambda_E(t, W, d_o, I, T) + \frac{L}{L_E} \lambda_E(t, W, d, I, T).$$
(3.12)

where

 $L_W$ ,  $W_W$ ,  $d_W$  are the equivalent length, width, and thicknesses of the contact windows, respectively;

 $n_b$ ,  $L_b$ ,  $W_b$  are the number of bends and equivalent length and width of each bend, respectively;

 $n_o$ ,  $L_o$ ,  $d_o$  are the number of oxide steps, and equivalent length and thickness of each step, respectively.

A number of researchers have investigated the effect of width and length on the reliability of straight conductors having widths of 1  $\mu m$  or more. However, relatively little reliability data is available for submicron conductors on the one hand and corners, steps and contact windows on the other. As linewidths decrease below 1  $\mu m$ , random variations in linewidth (caused by photolithographic defects or poorly controlled etching processes) will play an increasingly important role in conductor reliability. This issue has not been addressed in the literature and no correlation between reliability and defect density has yet been reported. We are working on a theoretical model for electromigration in a randomly defected conductor.

Returning to bends, steps, and contact windows, the factors which can affect the reliability of these elements include:

- accelerated failure at grain boundaries due to locally increased flux density
  - · flux divergence due to current crowding
  - stress build-up in the passivation layer over a step
- contact electromigration at Al-Si interfaces (in the case of contact windows).

All these effects can be taken into account in models for these segements, which are then included in (3.12). Because of a lack of published experimental data, only the effect of average flux density on grain-boundary migration is considered. The average path length around a 90° bend was taken as 0.785 W, while the average width is 1.12 W. These values were used for  $L_c$  and  $W_c$  respectively.

A simple, first-order model was also used for the case of a conductor running over a beveled step-discontinuity. The height of the step is h and the bevel angle is  $\theta$ . Assuming that the metal was deposited with a uniform height d in

the vertical direction, the thickness on the sidewall of the step is:

$$d_o = d\cos\theta. \tag{3.13}$$

The length of the stepped section is:

$$L_o = \frac{h}{\sin \theta} \,. \tag{3.14}$$

Therefore the step is considered as equivalent to a straight section of length  $L_o$  and thickness  $d_o$ . This model is used to describe the reliability of conductors running over oxide steps and contact window connections.

The IC interconnect reliability can be predicted based on the mask layout as described in a mask-descriptionlanguage file and the predicted branch currents as determined by circuit simulator. Such a program is under development as an adjunct to an existing suite of IC design tools.



Fig. 5. Mask layout of a section of a 3200 gate standard cell circuit, whose reliability is considered here.

#### 4. EXAMPLES

The following is an approximate analysis of an actual IC layout on which some conductor characteristics were measured. The circuit was a 3100-gate CMOS standard-cell design, with double-layer metallization and a minimum feature size of 2  $\mu m$  (figure 5). The I/0 interfaces were not

considered in the calculation. Parameters of the two interconnect layers are listed in table II. Conductor width was 5  $\mu m$  on the (upper) Metal-2 layer and 3  $\mu m$  and 4  $\mu m$  on the (lower) Metal-1 layer, except for the power supply and ground buses. The latter are in the form of interdigitated tree structures with differing widths in the trunk, branch, and twig sections (figure 6a). It was assumed, roughly that all logic interconnects carry the same current I. This does not apply to power and ground buses, where current levels would increase steadily across the length of a branch or twig. The failure rate of a power bus section is derived in the appendix.



Fig. 6. a) Power supply and ground buses of the standard cell device, showing the interdigitated tree-like structure.

b) Simple model of a bus segment.

The circuit was analyzed by applying (3.11) to each conductor segement and equation (A-7) to each branch of the power supply and ground buses. The overall failure rate is shown in figure 7, as a function of time. If we take a failure rate of 10 FIT as an acceptable maximum level [32], we can see that even at a chip temperature of 100° C, the interconnect system of this device is highly reliable. The 10 FIT is not exceeded for more than 30 years, compared to usual product lifetimes of 5-10 years. A breakdown of the failure rate data shows that the power and ground buses contributed roughly 60% of the total, followed by the contact windows with 30%. Straight conductors comprise less than 2% of the total, despite the fact that they contribute 84% of the total conductor length. Interesting as these

TABLE II
Specifications of Circuit Interconnect Layers

| Layer                       | Width | Length (µm) | Number<br>of<br>Corners | Number<br>of<br>Steps | Number<br>of Vias/<br>Contacts | Thickness<br>(Å) |
|-----------------------------|-------|-------------|-------------------------|-----------------------|--------------------------------|------------------|
| Metal I                     | 4     | 367800      | 4590                    | 55296                 | 36018<br>Contacts              | 6000             |
|                             | 3     | 656640      | 2349                    | 0                     | 7884 Vias                      |                  |
| Metal 2                     | 5     | 1107500     | 5580                    | 62137                 | 16935<br>Vias                  | 8000             |
| Gate &                      |       |             |                         |                       |                                |                  |
| Interconnect<br>Polysilicon |       |             |                         |                       |                                | 3500             |
| Interlevel<br>Oxides        |       |             |                         |                       |                                | 6000             |



Fig. 7. Failure rate  $\lambda$  (in FITS) as a function of time for the interconnect system of the 3200 gate standard cell device.

results are, they depend strongly on the layout of the IC and do not apply to all devices.

Having described the reliability analysis of an actual IC of moderate complexity, we conclude by extending our analysis to a complex generic VLSI circuit. The parameters for this circuit are summarized in table III, and the results are shown in figure 8. The interconnect system now reaches a failure rate of 10 FIT within two years for chip temperatures above 80 ° C. Comparison with the previous

example shows that increases in complexity and decreases in linewidths lead inevitably to a degradation of reliability through conductor wear-out. This wear-out process cannot be avoided by rigorous screening procedures: A high temperature burn-in, in fact, accelerates the wear-out. Figure 8 shows how a 168-hour burn-in at 175 °C reduces the time to reach a 10-FIT failure rate at 60 °C from seven to five years. The traditional maximum permissible current density of  $10^5 \ A/cm^2$  is not exceeded anywhere in the device.

TABLE III
Specifications for the Generic VLSI Circuit

| Number of transistors   | 200 000 |
|-------------------------|---------|
| Minimum conductor width | 1.5 μm  |
| Total conductor length  | 10m     |
| Number of bends         | 50 000  |
| Number of steps         | 470 000 |
| Number of contacts      | 600 000 |
| Number of vias          | 397 000 |
| Gate current            | 3E-6 A  |



Fig. 8. Failure rate  $\lambda$  (in FITS) as a function of time for the 200 000 gate generic VSLI device.

Our results are based on experimental data available in the literature. In some cases, data are lacking and first-order models were used instead. The complete characterization of all important features of an IC design, in a form suitable for inclusion in an overall reliability prediction program, still lies in the future.

#### 5. CONCLUSIONS

- 1. At the VLSI level of integration, wear-out mechanisms dominate and the failure rate is not a constant, contrary to the assumptions in Mil-Hdbk-217D model which is based on the flat part of the bathtub curve. A seriesd model for failure, and making use of the best available models describing wear-out mechanisms, is therefore a better approach to determining the reliability of a VLSI device.
- 2. The example of a generic chip shows a general method of analyzing interconnects and the approach is technology-independent. This approach provides information to the designer as to how the reliability of a component is affected by factors under his control. For example, if vias are the major contributors to the high failure rate in a particular design, the failure rate could be reduced by a different layout containing fewer vias. This type of analysis will therefore be an important factor in assessing future designs.
- 3. The results for a 200 000-transistor device show that wear-out is a problem for future VLSI devices even when only one wear-out mechanism (electromigration) is considered. Lifetime of less than 2 years for devices to reach a failure rate of 10 FIT are not acceptable.
- 4. More physical data and more accurate models for the wear-out mechanisms are required to improve the accuracy of the calculation and hence the assurance which can be placed in the reliability predictions. Large sample sizes must be evaluated and particular emphasis must be placed on fitting distributions to the early failures.
- 5. We have demonstrated a technique which provides valuable information to designers as to the reliability of a VLSI circuit. The approach is well suited to integration with an existing set of design tools as most of the data required resides in the design data base. The remaining data are obtained from usual process monitoring and they must be incorporated in a reliability database for the manufacturing process.
- 6. A reliability analysis program will be essential for the development of automated IC layout software, such as routers and silicon compilers.

# APPENDIX: Derivation of the Failure Rate

of a Power-Bus Section

Figure 6 b shows a simplified model of a power-bus section with width  $W_B$ . Gates are assumed to be uniformly distributed along the bus, with a distance  $L_g$  between adjacent gate connections. Each gate draws a current I from a power bus, and feeds a current I into a ground bus. The current in the first bus segment (furthest from the bonding pad) is I, in the second 2I, in the third 3I, etc. Numbering the bus segments from 1 to n, the current in segment i is  $i \cdot I$ . The failure rate of bus-i segment may be written in terms of the basic-element failure rate as:

$$\lambda_{i} = \frac{L_{g}}{L_{E}} \lambda_{E}(t, W_{B}, d, iI, T)$$

$$= \frac{L_{g} 1}{L_{E} \sigma \sqrt{2\pi} t} \exp \left[ -\frac{1}{2} \left( \frac{\ln t - \ln t_{50}}{\sigma} \right)^{2} \right] \quad (A-1)$$

Let  $x \equiv \ln t$ ,  $\mu \equiv \ln t_{50}$ :

The median time to failure from (3.2) is:

$$t_{50} = \alpha (iI)^{-N} \tag{A-2}$$

$$\alpha = A(W)(Wd)^N \exp\left[\frac{E_a}{kT}\right]$$
 (A-3)

Therefore  $\mu = \ln[\alpha(iI)^{-N}]$ . Substituting into (A-1) we obtain—

$$\lambda_{i} = \frac{1}{\sigma\sqrt{2\pi}} \exp\left[-\frac{x^{2}}{2\sigma^{2}} - x + \frac{x\ln(\alpha)}{\sigma^{2}} - \frac{\ln^{2}(\alpha)}{2\sigma^{2}}\right]$$

$$\cdot \exp\left[\left(\frac{N\ln(\alpha)}{\sigma^{2}} - \frac{Nx}{\sigma^{2}} - \frac{N^{2}\ln(iI)}{\sigma_{2}}\right)\ln(iI)\right]$$

$$\cdot \exp \left[ \frac{-\ln^2(t) - 2\sigma^2 \ln(t) + 2\ln(t)\ln(\alpha) - \ln^2(\alpha)}{2\sigma^2} \right]$$

$$(iI)^{\gamma(iI)}$$
 (A-4)

$$=C(t)(iI)^{\gamma(iI)} \tag{A-5}$$

$$\gamma(iI) = \frac{2N\ln(\alpha) - 2Nx - N^2\ln(iI)}{2\alpha^2}$$

$$= \frac{N}{\sigma^2} \ln \left\{ \frac{\alpha (iI)^{-N/2}}{t} \right\}. \tag{A-6}$$

For n bus segments in series, the total failure rate is—

$$\lambda_B = C(t) \sum_{i=1}^n (iI)^{\gamma(iI)}. \tag{A-7}$$

For large values of n, this summation is time-consuming and may be approximated by an integral, with i as a continuous variable:

$$\lambda_B \approx C(t) \int_0^n (iI)^{\gamma(iI)} di$$

$$= \frac{C(t) \sigma^2 n}{N}$$
(A.8)

$$\cdot \frac{\exp\left[\frac{N}{\sigma^2}\ln(nI)\ln\left(\frac{\alpha(nI)^{-N/2}}{t}\right)\right]}{\ln\left(\frac{\alpha}{nIt}\right)}$$
(A-9)

#### 6. ACKNOWLEDGMENT

The assistance of the General Electric Company, Research Triangle Park, North Carolina is gratefully acknowleged. We are pleased to thank the *Editors* and referees for their helpful comments.

#### REFERENCES

- [1] P. D. T. O'Connor, *Practical Reliability Engineering*, Heyden, 1981, p 3.
- [2] Mil-Hdbk-217D, Reliability Prediction of Electronic Equipment, Reliability Analysis Center, RADC, 1982.
- [3] J. E. Arsenault, D. C. Roberts, "MOS semi-conductor random access memory failure rate," *Microelectron. Reliab.*, vol 19, 1979, pp 81-88.
- [4] P. D. T. O'Connor, "Microelectronic system reliability prediction," IEEE Trans. Reliability, vol R-32, 1983 Apr, pp 9-13.
- [5] S. Palo, "Reliability prediction of micro-circuits," Microelectron. Reliab., vol 23, 1983, pp 283-296.
- [6] D. M. Pantic, "Maturity factors in predicting failure rate for linear integrated circuits," *IEEE Trans. Reliability*, vol R-33. 1984 Aug, pp 208-212.
- [7] H. C. Rickers, P. F. Manno, "Microprocessor and LSI microcircuit reliability — Prediction model," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 196-202.
- [8] P. Jaaskelainan, "LSI reliability prediction based on time," Microelectron. Reliab., vol 20, 1980, pp 351-356.
- [9] H. Goldberg, Extending the Limits of Reliability Theory, Wiley Interscience, 1981, pp 58-64.
- [10] W. K. Denson, private communication.
- [11] D. G. Edwards, "Testing for MOS IC failure modes," IEEE Trans. Reliability, vol R-31, 1982 Apr, pp 9-18.
- [12] D. S. Peck, C. H. Ziardt, "The reliability of semiconductor devices in the Bell System," Proc. IEEE, vol 62, 1974 Feb, pp 185-211.
- [13] E. S. Anolick, G. R. Nelson, "Low field time-dependant dielectric integrity," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 217-221.
- [14] F. M. d'Heurle, P. S. Ho. "Electromigration in thin films," Thin Films — Interdiffusion and Reactions (Ed: Poate, Tu & Mayer), Wiley Interscience, 1978, pp 243-304.
- [15] J. R. Black, "Electromigration of Al-Si alloy films," Proc. 16 Int. Reliab. Phys. Symp., IEEE, 1978, pp 233-240.
- [16] P. B. Ghate, "Electromigration-induced failures in VLSI interconnects," Proc. 20 Int. Reliab. Phys. Symp., IEEE, 1982, p 292.
- [17] N. E. Lycoudes, C. C. Childers, "Semiconductor instability failure mechanisms," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 237-249.
- [18] B. Eitan, D. Frohman-Bentchkowsky, "Hot-electron injection in n-channel MOS devices," *IEEE Trans. Electron Dev.*, vol 28, 1981 Mar, pp 328-340.
- [19] E. Takeda, et al., "Submicrometer MOSFET structure for minimizing hot-carrier generation," IEEE J. Solid State Circuits, vol 17, 1982 Apr, pp 241-247.
- [20] G. Sai-Halasz, M. R. Wordeman, R. H. Dennard, "Alpha-particle-induced soft error rate in VLSI circuits," *IEEE Trans. Electron Dev.*, vol 29, 1982 Apr, pp 725-731.

- [21] H. A. David, Order Statistics, 2nd Ed., John Wiley & Sons, 1970, p
- [22] B. N. Agarwala, M. T. Attardo, A. P. Ingraham, "Dependence of electromigration-induced failure time on length and width of aluminum thin films conductors," J. App. Phys., vol 41, 1970, pp 3954-3960.
- [23] D. J. LaCombe, E. L. Parks, "The distribution of electromigration failures," Proc. 24 Int. Reliab. Phys. Symp., IEEE, 1986, pp 1-6.
- [24] J. R. Black, "Physics of electromigration," Proc. 12 Annual Reliab. Phys. Symp., IEEE, 1974, pp 142-149.
- [25] J. W. McPherson, "Stress dependent activation energy," Proc. 24 Int. Reliab. Phys. Symp., IEEE, 1986, pp 12-18.
- [26] E. Kinsbron, "A model for the width dependence of electromigration lifetimes in aluminum thin-film stripes," App. Phys. Lett., vol 36, 1980, pp 968-970.
- [27] N. L. Johnson, S. Katz, Continuous Univariate Distributions 1, Houghton-Mifflin, 1970, p 253.
- [28] R. J. Miller, "Electromigration failure under pulse test conditions," Proc. 16 Int. Reliab. Phys. Symp., IEEE, 1978, pp 241-247.
- [29] G. A. Scoggan, B. N. Agarwala, P. P. Peressini, A. Brouillard, "Width dependence of eletromigration life in Al-Cu, Al-Cu-Si and Ag conductors," *Proc.* 13 Int. Reliab. Phys. Symp., IEEE, 1975, pp 151-158.
- [30] S. Vaidya, T. T. Sheng, A. K. Sinha, "Linewidth dependence of electromigration in evaporated Al-0.5% Cu.," App. Phys. Lett., vol 36, 1980, pp 464-466.
- [31] S. S. Iyer, C.-Y. Ting, "Electromigration studies of submicrometer linewidth Al-Cu conductors," *IEEE Trans. Electron Dev.*, vol 31, 1984, pp 1468-1472.
- [32] Workshop on Submicrometer Device Reliability, Clemson University, Clemson, 1985 November 6-7.

#### **AUTHORS**

David F. Frost; Department of Electrical & Computer Engineering; Clemson University; Clemson, South Carolina 29634-0915 USA.

David Frost (M'81) was born in Cape Town, South Africa on 1951 November 5. He holds the degrees of BSc from the University of Stellenbosch (1974) and MEng from the University of Pretoria (1979), both in Electrical Engineering. Since 1979 he has held a teaching position in the Department of Electrical and Electronics Engineering, University of Stellenbosch. He is on sabbatical leave at Clemson University, South Carolina. His research interests include the design, reliability and testing of integrated circuits.

Dr. Kelvin F. Poole; Department of Electrical & Computer Engineering; Clemson University; Clemson, South Carolina 29634-0915 USA.

Kelvin Poole (M'85) was born in Durban, South Africa on 1943 February 10. He holds the degrees of MSc from the University of Natal, Durban and PhD from the University of Manchester, UK, both in Electrical Engineering. Dr. Poole is an Associate Professor at Clemson University, South Carolina and is interested in the design of integrated circuits and VLSI reliability.

#### RELIANT: A RELIABILITY ANALYSIS TOOL FOR VLSI INTERCONNECTS

David F Frost\*, Kelvin F Poole and David A Haeussler

E&CE Department Clemson University Clemson, SC 29634

#### **ABSTRACT**

RELIANT is a CAD tool which predicts the failure rate of integrated circuit conductors. A circuit layout, device models and electromigration process data are inputs to RELIANT. The interconnect patterns in a Caltech Intermediate Format (CIF) file are fractured into a number of characteristic segment types. An equivalent circuit is extracted and SPICE is used to determine the transient currents in each segment. Using parametric models for electromigration damage, the failure rate of the system is computed. RELIANT provides designers with feedback on the reliability hazards of a design. Results show the application of the tool to a standard cell CMOS component. For modelling large VLSI interconnect systems, the incorporation of a switch-level simulator is discussed.

#### INTRODUCTION

The goal of RELIANT is to provide a prediction of interconnect reliability during the design phase, by modelling the effect of wearout due to electromigration. It has been shown that for the most important failure mechanisms, the shrinking of layout design rules accelerates the wearout process [Woods 1984]. Similarly, an increase in circuit complexity has a negative impact on overall reliability. It is therefore becoming increasingly important to predict the intrinsic (wearout-limited) lifetime of VLSI circuits, particularly in view of the high cost of traditional methods of reliability qualification by burn-in.

The failure rate of integrated circuits is usually described by means of the "bath-tub" curve of instantaneous failure rate vs. time, shown in Fig. 1. The onset of wearout may be quantified by establishing a criterion for the maximum allowable failure rate. A goal of 10 FIT (1 FIT = 10<sup>-9</sup> failures/hr.) has been proposed by the Semiconductor Research Corporation [SRC 1985].

This failure rate is attained in a time much less than the median time to failure  $(t_{50})$  for most conductors. Therefore,  $t_{50}$  predictions alone are not a reliable indicator of intrinsic lifetime. The rate of failure when time t <<  $t_{50}$  is strongly influenced by the shape and variance  $(\sigma)$  of the time-to-failure distribution. These parameters are in turn influenced by factors such as conductor dimensions [Kinsbron 1980] and the presence of defects [Kemp 1988]. The approach used in this paper is to determine the instantaneous failure rate as a function of time, using data about the physical form and dimensions of conductors and the electrical stress (i.e. current density) applied to them. This method yields an assessment of reliability, in contrast to previously reported work [Hall 1986] which used  $t_{50}$  as a criterion for adjusting conductor widths.

#### FAILURE RATE



Fig. 1: Bathtub curve of failure rate vs. time.

<sup>\*</sup> Currently at Stellenbosch University, Stellenbosch, South Africa.

#### OVERVIEW OF RELIANT

RELIANT predicts the instantaneous failure rate of the interconnect pattern as a function of time. The method used is based on the principle of fracturing the interconnect pattern into a number of statistically independent conductor segments. The assumption of statistical independence is valid when t << t50 [LaCombe 1986] and when the current density is low enough to avoid significant thermal interaction. This places an upper limit on current density of approximately 10<sup>6</sup> A/cm<sup>2</sup>. Five commonly-occurring segment types are identified:

- i) straight runs;
- ii) steps resulting from a discontinuity in the wafer surface (when a Metall conductor crosses a Polysilicon stripe, for example);
- iii) contact windows;
- iv) vias between Metall and Metal2;
- v) bonding pads.

These segment types are shown in Fig. 2. Each occurrence of a segment type is characterized by a set of physical parameters, such as length and width.

The relationship between  $t_{50}$ ,  $\sigma$  and physical dimensions of each segment type are determined experimentally, using test structures. When the layout is fractured, an equivalent circuit which reflects the physical topology of the interconnect pattern is extracted. A circuit simulator is then used to determine the transient current flowing in each segment. The instantaneous current density j(t) is given by the instantaneous current i(t), divided by the nominal cross-sectional area of the segment. This time-varying current density is reduced to a single effective value  $J_{eff}$ , which is used to determine  $t_{50}$  for each segment. The instantaneous failure rate is then determined. Assuming that no part of the interconnect pattern is redundant, a minimum order statistical method may be used to compute the failure rate of the interconnect system. This method has been described in a previous paper [Frost 1987].



Fig. 2: Segment types.



Fig. 3: Structure of RELIANT.

The structure of RELIANT is shown in Fig. 3. It consists of three main modules, EXTREM, COMBINE and SIRPRICE. EXTREM fractures the interconnect patterns contained in a CIF layout description file into segments. It produces a database containing a description of each segment (type and physical dimensions) and a SPICE-compatible netlist which includes all parasitic interconnect resistances and capacitances. COMBINE adds external generators and loads to the extracted netlist. SIRPRICE (which includes a modified version of SPICE2G.6) uses this netlist to simulate the current waveforms in each segment and compute the failure rate. A database query language interface enables the failure rate of specific segments, nets or modules to be investigated interactively.

#### **EXTREM**

EXTREM fractures the interconnect pattern in three phases. During the first phase, the CIF layout file (\*.CIF) is parsed. Wires and Polygons are reduced to collections of Boxes and the hierarchy is flattened. In the second phase, the positions of all abutments, contact windows, vias and bonding pads are established. The dominant current flow direction in each box is determined and all steps orthogonal to this direction are identified. In the third phase, each box is scanned in the direction of current flow and the dimensions and type of each segment are computed. Straight runs link up segments of other types. A record for each segment is stored in a physical data base file (\*.DB1). Active devices are identified and a SPICE-compatible netlist, including a  $\pi$ -section RC network for each physical branch of the interconnect pattern, is generated (\* EXT).

#### **COMBINE**

The user defines device models, analysis specifications and external components such as voltage sources and load resistors, by means of the \*.SPC file. COMBINE adds these definitions to the extracted netlist. Labels on the bonding pads of the layout are used to establish links to the external node numbers.

#### SIRPRICE

SIRPRICE accepts the complete netlist produced by COMBINE (\*.SPR) and calls a modified version of SPICE 2G.6. This module performs a transient simulation of the extracted circuit and generates indexed files which contain the current-time data for each resistor corresponding to a branch in the interconnect pattern. The cross-sectional area of each segment is determined from the physical data base file and an effective current density is computed as follows:

$$J_{eff} = \frac{1}{\Gamma t} \int_{0}^{t} \frac{f}{\sinh(\Gamma j(t))} dt$$
 (1)

where  $\Gamma$  is a constant [McPherson 1986]. The median time to failure is given by the following equation:

$$t_{50} = \frac{G}{J_{eff}} Exp \frac{E_a}{kT}$$
 (2)

where  $E_a$  is the activation energy, k is Boltzmann's constant and T is absolute temperature. G is a factor dependent on the physical dimensions of the segment. G and  $\sigma$  are determined from the information in the physical data base file. The failure rate is then calculated, using a suitable failure distribution. Experimental results of electromigration testing show a lognormal dependence and hence this distribution is used in the example discussed in the next section.

SIRPRICE produces various output files: \*.LIS contains an ASCII listing of the failure rate of the whole circuit and the total for each segment type. \*.DB2 is a second database file which includes the reliability data. This provides an interface to other program modules. \*.DTR is an interface to a query language which may be used to interrogate the reliability data base.

#### EXPERIMENTAL RESULTS

RELIANT was used to determine the failure rate of a CMOS one-bit counter. The layout of this circuit is shown in Fig. 4, and the results of the analysis appear in Fig. 5. An analysis of the results indicates that the steps are the major contributor to failure rate. Overall failure rates for this simple circuit containing twenty-six active devices are very low, 7x10<sup>-11</sup> FIT after 20 years.

The effect of layout and complexity is seen by comparing the failure rates for the input stage of this counter with the results in Fig. 5. A failure rate of  $4.5 \times 10^{-17}$  FIT after 20 years is predicted for the four-transistor input circuit and the major contribution is due to contacts.



Fig. 4: Layout of a one-bit binary counter.



Fig. 5: Failure rate vs. time for the one-bit counter.

# RELIABILITY PREDICTION OF COMPLEX INTERCONNECT PATTERNS

The use of SPICE limits the application of RELIANT to VLSI cells. CAD tools for assessing the current densities in interconnect systems reported by other researchers [Hall 1986], [Hohol 1986] share this limitation. A major consideration in the development of RELIANT was that reliability analysis should, where possible, be performed concurrently with simulation for design verification purposes. To meet this objective in the case of VLSI circuits, a new method for determining the current data required to assess the electromigration damage has been investigated. This method consists of extracting approximate current waveforms from a switch-level simulation of a MOS circuit. Transistors are represented by linear static and dynamic resistances and linear capacitors. Interconnects are represented by resistors and capacitors. The analysis method is based on network functions of RC trees. After each transition, the resultant RC networks are analyzed to determine the final state and static currents in all branches. They are then analyzed to determine delay and transient current waveforms. The initial charge distribution is also taken into account. Preliminary results show that an effective current for assessing the electromigration damage, can be predicted.

#### **CONCLUDING REMARKS**

A CAD tool called RELIANT for predicting the failure rate of interconnect systems has been presented. RELIANT requires a circuit layout and experimentally determined values of t50 and  $\sigma$  for all conductor segment types.

Our results show that the reliability of interconnects depends on the specific details of the layout. The program can be used to optimize the layout for reliability, by indicating the features of the interconnect pattern which make the largest contribution to the total failure rate.

#### **REFERENCES**

[Frost 1987] D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol R-36, June 1987, pp 234 - 242.

[Hall 1986] J.E. Hall, D.E. Hocevar, P. Yang, M.J. McGraw, "SPIDER: A CAD System for Checking Current Density and Voltage Drop in VLSI Metallization Patterns", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Hohol 1986] T.S. Hohol, L.A. Glasser, "RELIC: A Reliability Simulator for Integrated Circuits", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Kemp 1988] K.G. Kemp, K.F. Poole and D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

[Kinsbron 1980] E. Kinsbron, "A Model for the Width Dependence of Electromigration Lifetimes in Aluminum Thin-film Stripes", App. Phys. Lett., vol 36, 1980, pp 968 - 970.

[LaCombe 1986] D.J. LaCombe, E.L. Parks, "The Distribution of Electromigration Failures", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986.

[SRC 1985] Workshop on Submicrometer Device Reliability, Clemson University, Clemson, Nov. 6-7, 1985.

[Woods 1984] M.H. Woods, "The Implications of Scaling on VLSI Reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp. 6-1 to 6-30.

#### **ACKNOWLEDGEMENTS**

The authors wish to acknowledge that this work is supported in part by the SRC under contract No. 87-MP-082.

The assistance of the General Electric Company, Research Triangle Park, North Carolina is gratefully acknowledged.

TO THE PROPERTY OF THE PROPERT

David F. Frost and Kelvin F. Poole

Department of Electrical and Computer Engineering Clemson University, Clemson, SC 29634-0915 (803)656-5925

#### Abstract

A quantitative model for the reliability of a system of interconnects is presented. A circuit simulator is used to accurately predict device corrents, from which the reliability of individual conductor segments is determined. The failure rate of the interconnect system is then calculated using a minimum order statistical approach. An example shows the application of this technique to a CMOS circuit.

#### Introduction

As the minimum feature size of VLSI devices steadily decreases, there is a corresponding decrease in the reliability of these devices. Scaling increases the failure rate associated with all the most important known failure mechanisms, such as electromigration, ESD, time-dependant oxide breakdown and hot-carrier effects. It is therefore increasingly important to develop accurate, quantitative models for these mechanisms, in order to optimize circuit designs for reliability. These models must be embedded in reliability analysis software which forms part of the regular suite of IC design tools.

In most digital circuit designs, a set of standard active devices is designed once and used repeatedly throughout the circuit. Immunity to device-related failure such as hot-electron effect is designed in at the device level. Interconnects are qualitatively different to devices, because the interconnect layout of each circuit is unique and must be individually optimized for reliability. Because of the complexity of interconnect patterns, a systems approach to reliability is adopted in this paper. This approach is based on the partitioning of the interconnect mask layout into it's component parts.

The most significant failure mode in Aluminum alloy VLSI interconnects is open-circuit failure due to current-induced electromigration. When current flows

through a conductor, electrons collide with metal ions, transferring some momentum to the latter. This causes a flux of metal ions in the direction of electron flow. Divergences in this flux arise at local inhomogeneities such as grain boundary triple points or vacancy sites. These divergences result in the formation of voids and hillocks in the metal of the conductor. When the cross-sectional area of a void equals that of the conductor, an open circuit failure results. Other forms of electromigration occur at the metalsemiconductor interfaces within contact windows.

traditionally Designers have electromigration damage by limiting the maximum current density to 10s A/cm2. There are two essential problems with this approach. Firstly, the downward pressure on design rules makes it increasingly difficult to maintain this conservative limit on current density. Secondly, observance of this rule does not enable the designer to quantitatively predict reliability, nor does it provide any insight into the design trade-off between reliability and area. Increasingly, circuit layout is performed by automated software such as routers and silicon tools compilers, using layout-optimizing cost functions. Quantitative models for failure mechanisms are essential if cost functions which include reliability are to be developed for a future generation of silicon compilers.

# An Order Statistical Model of Interconnect Reliability

Fig. 1 shows a typical conductor on the surface of a VLSI chip. A number of segments may be identified, such as contact windows to the underlying devices, straight runs of various lengths, 90° bends and steps over discontinuities in the underlying surface. These shapes are shown in more detail in Fig. 2. The majority of VLSI conductor patterns may be broken down into a set of basic segments such as these. The

17



Fig. 1: Partitioning of a conductor into segments.



<u>Fig. 2:</u> Conductor segments considered: a) straight segments, b) 90°bends, c) steps over discontinuities, d) contact windows.

reliability of each segment is expressed in terms of statistical parameters such as median time to failure and standard deviation. These parameters are determined experimentally using special purpose test structures and accelerated testing techniques.

If there is no redundancy in the circuit, the entire chip will fail when any single conductor segment fails. It will also be assumed that there is no interaction between the segments, ie. the probability of failure of any segment is independent of all other segments. Under these conditions, interconnects form a series-connected system and a minimum order statistical

model for reliability is appropriate. The probability of interconnect failure is therefore equal to the probability of failure of the segment having the shortest lifetime. The failure rate of the interconnect system is equal to the sum of the failure rates of all conductor segments:

$$h_{s}(t) = \sum_{i=1}^{n} h_{i}(t)$$
 (1)

The requirement of non-interaction between segments restricts the applicability of this model to current densities below  $4\times10^{\circ}$  A/cm², where negligible thermal interaction due to self-heating occurs. Also, the minimum segment length must be greater than the length of the locality which influences the growth of a single defect. This is approximately equal to the median grain size of the Aluminum in the conductor, usually lµm or less. Boundaries between segments must be chosen along equipotential lines.

#### Reliability Parameters for Electromigration

Electromigration failure is generally described by a lognormal probability density function

$$p(t) = \frac{1}{\sigma \sqrt{2\pi} t} \exp \left[ -\frac{1}{2} \frac{\ln(t) - \ln(t_{50})}{\sigma} \right]^{2}$$
(2)

where  $t_{50}$  = median time to failure and  $\sigma$  = standard deviation. In general,  $t_{50}$  is much greater than the operating life of the circuit and only the early failures are important. If only the first 10% of failures are considered, then the failure rate is approximately equal to the probability density function. The failure rate of the interconnect system is then

$$h_{s}(t) = \sum_{i=1}^{n} \frac{1}{\sigma_{i} \sqrt{2\pi t}} \exp \left[ -\frac{1}{2} \left[ \frac{\ln(t) - \ln(t_{50})}{\sigma_{i}} \right]^{2} \right]$$
(3)

The median time to failure is a function of temperature, current and conductor dimensions. The following expression has been derived from measured data for a segment

10μm long<sup>™</sup>:

$$t_{50} = \frac{1.523E - 5}{J_{1}^{2}n(J)} = 0.07 + \frac{11.63}{L_{1}^{2}n(J)} = 0.07 + \frac{11.63}{L_{1}^{2$$

where J is the latent density in A/cm², wis the width in  $\mu m$ , and T is the absolute temperature. The exponent n(J) is given by the following expressiona:

$$n(J) = \tau J Coth (\tau J)$$
 (5)

where  $\tau=2x10^{-6}$  cm<sup>E</sup>/A. Equation (4) is applicable to DC currents, whereas conductors in ICs generally do not carry only direct current. In analog circuits, currents are often described by sinusoidal or other periodic functions, while in digital circuits they tend to switch between discrete values with exponential transitions resulting from the charge and discharge of circuit capacitances. These current waveforms are easily predicted using a circuit simulator and the resultant data made available in the design data base. In order to use this data, the following expression for  $t_{MO}$  has been derived in terms of time-varying current:

$$t_{50} = \frac{1.523E - 5 \left[w - 3.07 + \frac{11.63}{u^{1.7}}\right] \exp\left[\frac{5800}{T}\right]}{\sum_{j=1}^{m} \frac{p_{j}}{t_{jf}} \int_{0}^{t_{i}(t_{j})/A} \int_{0}^{t_{i}(t_{j})/A} dt_{j}} (6)$$

Equation (6) is based on the assumption of a static relationship between current density and electromigration damage". The denominator contains the summation of the electromigration damage produced by operating the circuit in n different modes. A circuit simulator is used to simulate each operating mode. The circuit operation in mode j is simulated from time  $t_3 = 0$  to  $t_4 = t_3 r$ . The current  $i(t_4)$  is the output data of each simulation, while  $p_3$  is a weighting factor representing the probability of the circuit being in mode j during normal operation. The constant A is the cross-sectional area  $(cm^2)$ , which is written as

$$A = (w d) 1E-8 \tag{7}$$

where w and d are the width and thickness, respectively (in  $\mu m)_{\star}$ 

(4) -- The standard deviation of the conductor segment is a function of width only.

$$\sigma = \frac{2.192}{2.625} + 0.787$$
 (8)

The results in equations (6) and (8) may be extended to straight conductors of any length by dividing the conductors into a number of 10 µm segments. Other conductor shapes such as corners, steps and contact windows may be approximately modelled as straight sections having equivalent values of width, thickness and length (this ignores the effect of stress concentration caused by the shape of segment). These values are then used in equation (3) to compute the overall interconnect failure rate.

#### Examples

A computer program has been written to solve equations (6) and (8). The input data for this program is a SPICE<sup>5</sup> output data file containing the current-time data collected during a transient simulation. At present, conductor dimensions are entered manually; a circuit extractor will eventually automate this task.

The circuit diagram and layout of a CMOS 4bit Carry Look Ahead Unit are shown in Figs. 3 and 4 respectively. This is a standard cell component which forms part of a cell library in a double layer metal, 1.5µm technology. The design was simulated using SPICE and transient device current data was collected in several output files. An example of a simulated current waveform (in this case, the current in the power supply bus) is shown in Fig. 5. The designer enters the physical parameters of the conductor which he is designing, including an initial estimate of width. He also supplies the weighting factor p, applicable to each simulation. The failure rate for the conductor is calculated and if the designer is not satisfied he can input a new value of width. The total failure rate for all interconnects h. is calculated using equation (3), Fig. 6 shows the failure rate of the Carry Look Ahead Unit as a function of time. The median time to failure is much greater than 15 years and so the failure rate increases monotonically with time over the useful lifetime of the component. A failure rate of 10 FITS (1 FIT =  $10^{-9}$  failures/hr.) is considered to the maximum value acceptable. In this case, the failure rate is still less than 1 FIT after



Fig. 3: Circuit diagram of Carry Look Ahead Unit.



Fig. 4: Layout of Carry Look Ahead Unit.



Fig. 5: Current in power supply bus vs. time.

10 years of operation and so the design of this cell is highly reliable. However, Fig. 6 also shows that if 119 of these cells were produced in a 17x7 grid on the surface of a 300 mil x 300 mil chip, the failure rate of the chip would reach 10 FITS in 7½ years. This clearly shows the impact of increased complexity on reliability. The chip failure rate increases not only because of the summation of a greater number of cell failure rates, but also because the current level in the power supply and ground buses increases with increasing number of gates. While the failure rates of individual cells can be calculated in

advance, the failure rate of the entire chip can only be determined once the placement and routing is complete. As die sizes increase, the design of power supply and ground buses may become increasingly important for reliability.



Fig. 6: Failure rate vs. time for a single cell and for a 17x7 cell array.

#### Conclusions

A technique for predicting the reliability of a system of VLSI interconnects has been presented. The models used can accommodate complex VLSI interconnect patterns and actual operating conditions. This technique is useful to IC designers who wish to design interconnects for an optimum tradeoff between area and reliability. It is also of value in the development of new CAD tools for automated layout.

#### Acknowledgements

The assistance of David Abercrombie in preparing the example used, is gratefully acknowledged.

#### References

- [1] M.H. Woods, "The implications of scaling on VLSI reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp. 6-1 to 6-30.
- [2] D.F. Frost and K.F. Poole, "An order statistical method for the prediction of VLSI device reliability using models for failure mechanisms", to be published in IEEE Trans. Reliab.
- [3] J.W. McPherson, "Stress dependant activation energy", Proc. 24th Int. Reliab. Phys. Symp., New York: IEEE, pp. 12 18, 1986.
- [4] D.F. Frost and K.F. Poole, "A model for the median time to failure of VLSI interconnects carrying time-dependent currents", submitted for publication to IESE Trans. Reliab.
- [5] A. Vladimirescu, K. Zhang, A.R. Newton, D.O. Pederson, A. Sangiovanni-Vincentelli, SPICE User's Guide, Berkeley: Dept. of Elec. Eng. & Comp. Sci., Univ. of Calif. at Berkeley, 1983.

<u>Title</u>: Failure rate prediction for defect enhanced electromigration wearout of metal interconnects

<u>Authors</u>: Kevin G. Kemp, Student member IEEE Clemson University, Clemson.

Kelvin F. Poole, Member IEEE Clemson University, Clemson.

David F. Frost, Member IEEE Stellenbosch University, S.A.

Key words: Failure rate, reliability prediction, defects, electromigration

#### Reader Aids:

Purpose: Advance state of the art

Special math needed for explanations: Statistics

Special math needed to use results: None

Results useful to: IC design engineers, CAD tool developers

#### Abstract

A method which predicts the effect of defects on the failure rate of conductors due to electromigration wearout is proposed. Topographic defects and grain boundary triple points are identified as the major contributing defects. A random spatial distribution of both defect types is assumed, and a range of defect sizes is used. This analysis predicts that random topographic defects significantly increase conductor failure rate while they have little effect on the median time to failure (T50). In addition, predictions of lifetime versus stripe width agree with other published results and show that the increased lifetime of narrower stripes is due to the blocking mechanism of bamboo type structures. However, this increase is limited by topographic defects, which thus impose a minimum achievable stripe width.

#### 1 Introduction

The lifetime of a conductor subject to electromigration wearout is determined by the physical structure of the conductor and the operating conditions (current density and temperature) applied to it [1]. In this analysis we will consider as a defect any random feature which enhances the failure by electromigration of the conductor. It is generally accepted that electromigration failure is initiated at grain boundary triple points where there is a mass flow divergence of conductor material [1,2]. These triple points are the first kind of defect with which we are concerned. We also consider macroscopic features such as topographic defects introduced during processing which are responsible for mass flow divergences that contribute significantly to conductor failure [3]. In this category we include photolithographic and other random "spot" defects\* which eliminate a portion

<sup>\*</sup>These partial open circuits are simply less extreme cases of fatal conductor defects which contribute to manufactured yield loss. In this analysis we are concerned only with these partial defects which result in failure of the device after it has been in operation for a length of time.

of the conductor and thus increase current density in the immediate vicinity of the defect. This analysis considers the contribution of both grain boundary triple points and topographic defects to conductor failure by electromigration and determines the effect of both on the failure rate of conductors.

#### 2 Theoretical considerations

In developing a method to account for the influence of defects on electromigration we make use of an elemental model which considers the failure probability of a short length  $l_E$  of conductor material. A given stripe of width w and length L is treated as a series connection of N elements of length  $l_E$ . Since failure of the stripe is determined by the failure of the weakest element, the failure probability of the stripe is given by the minimum order statistic of the N elements [2]. Thus the survival probabilities of each of the individual elements. Similarly it may be argued that if each element is subject to a number of independent failure modes then the failure of that element is given by the minimum order statistic of those failure modes. In general then, the probability of failure as a function of time F(t) is given by:

$$F(t) = 1 - \pi \pi [1 - F_{ij}(t)]$$
i i

where  $[1-F_{ij}(t)]$  is the survival probability at time t of the ith element due to failure mode j. In evaluating reliability we are most often concerned with the failure of only the first few devices (usually the first ten percent). In this case the above expression may be reduced to:

$$F(t) = \sum_{i j} F_{ij}(t)$$
 (2)

and the instantaneous failure or hazard rate h(t) is: [2]

$$h(t) = \frac{f(t)}{1 - F(t)} = \sum_{i \neq j} \sum_{i \neq j} h_{ij}(t)$$
(3)

This failure rate h(t) is important in that it provides a useful measure of the reliability of a device or system, and is often used in reliability specifications.

In carrying out such an analysis it is necessary to carefully define the various modes which contribute to the failure mechanism under consideration.

The individual failure modes considered in this analysis are:

- i. Failure by bulk electromigration of an "ideal" element containing no grain boundaries and no geometric defects.
- ii. Failure by grain boundary electromigration of an element which contains only grain boundary triple points.
- iii. Failure of an element due to a topographic defect only.

  For each of these failure modes a model is used to express the failure probability of the element (as a function of time) in terms of its geometry and operating conditions. It should however be noted that we are not concerned with the accuracy of the models used in this paper, but rather are presenting a methodology for determining the contribution of each failure mode to the overall reliability.

#### 2.1 Element with no defects

The model for the failure of an elemental conductor follows a lognormal density function given by: [1,4]

$$f(t) = \frac{1}{\sqrt{2\pi} \sigma t} \exp \left[ \frac{1}{2} \left[ \frac{\ln t - \ln T_{50}}{\sigma} \right]^2 \right]$$
 (4)

where: f(t) = failure probability as a function of time

t = time

σ = lognormal standard deviation

 $T_{50}$  = median time to failure

In practice it is not possible to make conductor element perfectly free of grain boundary triple points and topographic defects, therefore it is not usually possible to measure these bulk  $T_{50}$  and  $\sigma$  values. (Although the bulk  $T_{50}$  value may be estimated by comparing bulk and grain boundary electromigration activation energies [5]). Since the bulk  $T_{50}$  value is significantly larger than that for stripes containing grain boundaries, failure due to grain boundary electromigration predominates and the bulk  $T_{50}$  and  $\sigma$  values are not significant in the analysis except to set the lifetime upper limit.

#### 2.2 Element with grain boundaries only

The preferential failure occurring at grain boundary triple points is also modeled using a lognormal density function, for which the  $T_{50}$  and  $\sigma$  values may be determined from lifetests on conductor stripes. In this analysis a random distribution of triple points over the conductor is assumed in order to determine the probability of an individual element containing a triple point. The triple point density is determined by the mean grain size  $(x_G)$ , thus:

$$Pr\{triple point\} = 1 - exp \frac{(-w\ell_E)}{\frac{2}{\kappa_G}}$$
 (5)

where: w = stripe width

 $\ell_E$  = element length

x<sub>G</sub> = average grain size

The model also takes into account the probability of "bamboo" structure blocking grains [6,7] which effectively prevent grain boundary electromigration by restricting the distance over which material may be transported. This probability is modeled as:

$$Pr\{blocking grain\} = 1 - exp \left[ \frac{-\ell_{C}(x_{G}-w)}{x_{G}^{2}} \right] ; w < x_{G}$$

$$0 ; w > x_{G}$$
(6)

where:  $\ell_C$  = mean critical distance for mass transport. The effective triple point probability is thus given by

#### 2.3 Element with topographic defects only

The contribution of topographic defects is also found by considering a random spatial distribution, except that a range of possible defect sizes is taken into account. For the purpose of this model, these defects are all assumed to be caused by particles, with a size distribution given by [8]:

$$f(x) = A/x^3 \qquad x > x_0$$

$$Bx \qquad x < x_0 \qquad (7)$$

where: f(x) = density of defects of diameter x

A, B = constants

 $x_0$  = minimum reproducible spot size

The  $A/x^3$  function corresponds to the distribution of particle sizes according to the standard environmental class curves of MIL STD 209B [9], and  $x_0$  represents a minimum reproducible spot size determined by the resolution of the photolithographic process.

The defect size distribution is calculated from the particle size distribution by considering the probability of a given spot falling on a conductor element. This closely follows the method adopted by Stapper [8], except that we are concerned here with defects that eliminate only a portion of the conductor. This analysis thus takes into account defects which are smaller than the conductor width, as well as those which fall on the edge of the conductor, and yields the defect size distribution function depicted in Fig. 1.

The defect model also takes into account the effect on conductor lifetime for each size of defect. The observed relationship

$$T_{50} = A J^{-n} \exp(E/kT)$$
 (with n constant) (8)

is applicable to current densities below 1E6 A/cm [10]. However, if a significant portion of a conductor has been removed by a defect then temperature gradients in the vicinity of the defect significantly increase this factor. Results reported by [3], [10] and [11] were used to determine a relationship between defect size and failure time, which is expressed as:

$$T_{50}(w) = T_{50}(w_0) \exp(1 - \frac{w_0}{w}) \qquad w \le w_0$$
 (9)

where:  $w_0$  = nominal stripe width

w = remaining stripe width at the defect

 $T_{50}$  (w) = median time to failure of defected element

 $T_{50}$   $(w_0)$  = median time to failure of undefected element The failure probability of the element due to topographic defects is then calculated from the above lifetime and defect size distribution functions.

#### 3 Comparison with experimental results

### $T_{50}$ and $\sigma$ variation with linewidth

The above models were used to simulate the lifetime of stripes having the same dimensions as those in the experiments conducted by [6] and [7]. The variation of  $T_{50}$  with linewidth, depicted in Fig. 2, closely follows that reported by these authors. The "U" shape located around w=2 $\mu$ m derives from the probability distribution of grain boundary triple points and "blocking" grains, which is in exact agreement with the mechanism described by Kinsbron [6]. Values for  $\sigma$  were found to increase with decreasing stripe width, as reported by [6] and [7].

However, further simulation reveals that the increasing lifetime of narrow stripes in the presence of topographic defects is abruptly reversed at some point (typically below 0.1 $\mu$ m), and that  $T_{50}$  approaches zero as the width is reduced further (Fig. 3). This reversal is due to an increase in the severity of topographic defects for narrower stripes, and the location of the turning point is determined by the topographic defect density and size distribution. Topographic defect densities of the order of  $0.5/cm^2$  per level are required for modern semiconductor processes [12], and this value produces a turning point in the vicinity of 0.1 $\mu$ m. It is also noted that topographic defects do not affect the  $T_{50}$  and  $\sigma$  values of stripes that are wider than this minimum value imposed by topographic defects. Interconnect stripes produced in modern semiconductor processes are of the order of 2-5 $\mu$ m, therefore the effect of topographic defects is not apparent from  $T_{50}$  measurements carried out on such stripes, provided they are manufactured under low topographic defect density conditions.

#### Effect of topographic defects on the lognormal failure curve

Although we have shown that topographic defects do not significantly change  $T_{50}$  for a conductor stripe, they nevertheless have an important effect on F(t), the time dependent cumulative failure probability. This failure probability is usually plotted on lognormal axes to reflect the approximately lognormal nature of the electromigration wearout mechanism, and Fig. 4 shows a series of lognormal plots obtained from our simulations using various topographic defect densities. At relatively high densities (>1E3/cm²) a deviation in the form of a "tail" appears at the lower end of the lognormal curve, which results from the early failure of those devices which contain topographic defects. Such "tails" are evident in some published lifetest results [3,4,7], and on the basis of this simulation we believe that these published results reflect the early failure of devices due to topographic defects.

#### Effect of topographic defects on failure rate

The early failures which cause the "tail" in the cumulative failure probability also result in a corresponding increase in the instantaneous failure rate h(t). This is true even at very low defect densities (of the order of 0.5/cm²), where the deviation from the lognormal curve is not readily apparent. Simulations of instantaneous failure rate for stripes containing topographic defects yield curves which have a "bathtub" shape, identical to that referred to in the literature [9]. Fig 5 shows a series of such failure rate curves plotted on lognormal axes for different conductor lengths and topographic defect densities. Each curve results from the combined effect of two functions - a decreasing failure rate due to the accelerated wearout of a small fraction of defected devices, and a

mound-shaped (lognormal) function due to wearout of the remaining undefected ones. The most seriously defected devices fail during the initial decreasing portion of the bathtub curve, and may be screened out during burn-in testing, while the lowest point on the curve represents the minimum failure rate achievable during the lifetime of the conductor. In addition, the initial decreasing portion of the failure rate curve corresponds to the lower end of the (lognormal) cumulative failure curve where the deviation from the lognormal is observed.

#### Implications for interconnect reliability

It is apparent from Fig. 5 that the minimum failure rate (at the lowest point on the curve) increases with both conductor length and topographic defect density. The reliability objective for VLSI circuits is to meet a failure rate criterion of 10 FITs (1 FIT = 1E-9 failures/hour) during the operational lifetime of the circuit [9]. Topographic defects therefore play an important part in determining whether this requirement will be satisfied in respect of the interconnect system of such a circuit. The significance of the method presented here is that it provides a means of determining the relative effect of different types of defects on conductor failure rate, and thus allows more meaningful predictions to be made of interconnect reliability than has previously been possible.

#### 4 Conclusion

- 1. Our analysis shows that the trend of increasing  $T_{50}$  as predicted by experimental results from progressively narrower conductors is eventually limited by topographic defects.
- 2. Topographic defects are responsible for early device failures and hence lifetest measurements which yield  $T_{50}$  and  $\sigma$  do not provide the information necessary to predict failure rates over the entire lifetime

- of a device. However, our analysis shows that it is possible to predict early failure rates in the presence of defects from a knowledge of the defect distribution and the failure characteristics of defected devices.
- 3. The aim of this work is to develop a method of predicting the reliability of the interconnect system for a VLSI device. The method developed to assess the effect of defects on the failure rate of a single conductor will be combined with that described in an earlier paper [13] to determine the overall failure rate of a complex system.

#### 5 References

- [1] M.J. Attardo, R. Rutledge and R.C. Jack, "Statistical metallurgical model for electromigration failure in aluminum thin-film conductors,"

  J. Appl. Phys., 42, 1971, p. 4343.
- [2] B.N. Agarwala, M.J. Attardo and A.P. Ingraham, "Dependence of electromigration-induced failure time on length and width of aluminum thin-film conductors," J. Appl. Phys., 41, 1970, p. 3954.
- [3] J.R. Lloyd, P.M. Smith and G.S. Prokop, "The role of metal and passivation defects in electromigration-induced damage in thin film conductors," Thin Solid Films, 93, 1982, p. 385.
- [4] P.M. Smith, J.R. Lloyd and G.S. Prokop, "Lot-to-lot variations in electromigration performance for thin film microcircuits," J. Vac. Sci. Technol., A 2 (2), 1984, p. 220.
- [5] R.E. Hummel, R.T. Dehoff and H.J. Geier, "Activation energy for electrotransport in thin aluminum films by resistance measurements," J. Phys. Chem. Solids, 37, 1976, p. 73.
- [6] E. Kinsbron, "A model for the width dependence of electromigration lifetimes in aluminum thin-film stripes," Appl. Phy. Lett. 36, 1980, p. 968.
- [7] S.S. Iyer and C.-Y. Ting, "Electromigration lifetime studies of submicrometer-linewidth Al-Cu conductors," IEEE Transactions on Electron Devices, 31 No. 10, 1984, p. 1468.
- [8] C.H. Stapper, "Modeling of defects in integrated circuit photolithographic patterns," IBM J. Res. Develop. Vol. 82 No. 4, 1984, p. 461.
- [9] M.H. Woods, "MOS VLSI reliability and yield trends," Proc. IEEE, Vol. 74, No. 12, 1986, p. 1715.

- [10] J.D. Venables and R.G. Lye, "A statistical model for electromigration induced failure in thin film conductors," Proc. 10th Ann. Rel. Phys. Symp. IEEE, New York, 1972, p. 159.
- [11] R.A. Sigsbee, "Electromigration and metallization lifetimes," J. Appl. Phys., 44, 1973, p. 2533.
- [12] J.F. McDonald et al, "Yield of wafer-scale interconnections," VLSI Systems Design, December 1986, p. 62.
- [13] D.F. Frost and K.F. Poole, "A method for predicting VLSI-device reliability using series models for failure mechanisms," IEEE Trans. on Reliability, Vol. R-36, No. 2, 1987, p. 234.











# DIGITISED

# A CAD TOOL FOR THE PREDICTION OF VLSI INTERCONNECT RELIABILITY

by

David Frank Frost

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electronic Engineering, University of Natal 1988

#### Abstract

This thesis proposes a new approach to the design of reliable VLSI interconnects, based on predictive failure models embedded in a software tool for reliability analysis.

A method for predicting the failure rate of complex integrated circuit interconnects subject to electromigration, is presented. This method is based on the principle of fracturing an interconnect pattern into a number of statistically independent conductor segments. Five commonly-occurring segment types are identified: straight runs, steps resulting from a discontinuity in the wafer surface, contact windows, vias and bonding pads. The relationship between median time-to-failure (Mtf) of each segment and physical dimensions, temperature and current density are determined. This model includes the effect of time-varying current density. The standard deviation of lifetime is also determined as a function of dimensions. A minimum order statistical method is used to compute the failure rate of the interconnect system. This method, which is applicable to current densities below  $10^6 A/cm^2$ , combines mask layout and simulation data from the design data base with process data to calculate failure rates.

A suite of software tools called Reliant (RELIability Analyzer for iNTerconnects) which implements the algorithms described above, is presented. Reliant fractures a conductor pattern into segments and extracts electrical equivalent circuits for each segment. The equivalent circuits are used in conjunction with a modified version of the SPICE circuit simulator to determine the currents in all segments and to compute reliability. An interface to a data base query system provides the capability to access reliability data interactively. The performance of Reliant is evaluated, based on two CMOS standard cell layouts. Test structures for the calibration of the reliability models are provided.

Reliant is suitable for the analysis of leaf cells containing a few hundred transistors. For MOS VLSI circuits, an alternative approach based on the use of an event-driven switch-level simulator is presented.

# Autobiography

David Frank Frost was born in Cape Town, South Africa on 5 November, 1951. He matriculated from Hottentots-Holland High School, Somerset West in 1969. He received the B.Sc. (Eng.) degree from the University of Stellenbosch in 1974 and the M.Eng. (Electronic) degree from the University of Pretoria in 1979. From 1976 to 1978, he was employed in the Microelectronics Division of the National Electrical Engineering Research Institute, designing analog and digital bipolar integrated circuits. Since 1979, he has held the position of Senior Lecturer in the Department of Electrical & Electronic Engineering at the University of Stellenbosch. He has presented a variety of courses in electronics and IC processing and design, and was responsible for the establishment of processing and CAD laboratories at Stellenbosch. In 1986 he spent a 12 month sabbatical period in the Department of Electrical & Computer Engineering at Clemson University in the USA.

The work described in this thesis was carried out at Stellenbosch and Clemson during the period June 1985 to June 1988.

### Preface

The development of CAD tools for the prediction of VLSI reliability is an emerging field of research. It has developed amid growing concern in the semiconductor industry about the reliability of the increasingly complex integrated circuits (ICs) being produced today. Historically, ICs have always been considered to be components having a high inherent reliability, in fact the move toward VLSI has been made possible by rapid increases in the reliability of the devices produced. However, reductions in minimum feature size and the use of thinner oxides and shallower diffusions produce increased stresses within ICs, accelerating the process of wearout. The probability of IC failure due to wearout is further enhanced by an exponential growth in the complexity of a single die. The growing importance of Application Specific Integrated Circuits (ASICS), with their limited production volumes, are forcing the industry to re-evaluate costly traditional methods of reliability qualification through testing and burn-in procedures. These factors increase the need for CAD-based methods to predict the reliability of an integrated circuit, before manufacture.

This is the first work to propose and implement a Reliability Analysis Software Tool for predicting the failure rate of an IC during the design phase. Several original contributions to knowledge are contained in this work and these are summarized below.

- The application of a system reliability model to the problem of IC interconnect reliability is discussed in chapter 3 and the following points should be noted.
  - 1. The identification of 5 commonly-occurring features of interconnect patterns, called segment types. These are straight runs, steps (caused when a conductor crosses a discontinuity in the wafer surface), contact windows, vias and bonding pads. The interconnect pattern is fractured into a collection of statistically independent conductor segments, each of which may be classified according to the 5 types mentioned above.

- 2. The development of suitable reliability models for these segments.
- 3. The use of a minimum order statistical approach to calculate the reliability of the interconnect pattern, when subject to electromigration.
- 4. The evaluation of interconnect reliability in terms of an actual reliability figure-of-merit (instantaneous failure rate), as opposed to the common practice of considering only the current density in each conductor. The method developed in this thesis is superior to the current-density approach in three respects. Firstly, considering only the current density ignores the complex dependence of median time to failure (Mtf) and standard deviation (σ) on conductor dimensions. Secondly, the effect of circuit complexity is not taken into account. This is of primary importance in VLSI. Finally, the interconnect failure rates obtained here may be easily combined with similar figures for other failure modes. The approach used is therefore consistent with the long-term goal of estimating the reliability of the whole IC during the design phase.
- A suite of software tools called Reliant (RELIability Analyzer for iN-Terconnects) which fractures the interconnect pattern into segments, extracts the equivalent circuit of each branch and uses a circuit simulator (SPICE) to determine the reliability of all segments, has been developed. Reliant includes an interface to a data base query system which may be used to access reliability data interactively.
- A method for determining approximate current waveforms in a MOS VLSI circuit using a switch-level simulator, is presented. This approach has a considerable speed advantage when compared to the use of a circuit simulator and enables reliability data to be collected concurrently with the process of design verification.

This thesis represents the author's original work and has not been submitted in any form to another University for the purposes of obtaining a degree. Where use has been made of work carried out by others, this has been duly acknowledged in the text.

### Acknowledgements

The author would like to thank the following for their contributions to this work.

- My supervisor, Prof. K.F Poole, for his constant enthusiasm, support and encouragement.
- Professors J.W. Lathrop and J.W. Harrison of Clemson University for useful discussions on electromigration and reliability.
- Kevin Kemp, for assistance with debugging Reliant and preparing the circuit examples in Chapter 4.
- David Haeussler, for coding part of the LoadQuadTree procedure used in Extrem.
- The University of Stellenbosch, for granting me sabbatical leave for 10½ months during 1986 to work on this project.
- The financial support of the Foundation for Research Development, the Harry Crossley Bursary Fund and Clemson University is also gratefully acknowledged.

## Contents

| 1 | INT                                         | EGRATED CIRCUIT RELIABILITY                                          | 1  |  |  |
|---|---------------------------------------------|----------------------------------------------------------------------|----|--|--|
|   | 1.1                                         | Introduction                                                         | 1  |  |  |
|   | 1.2                                         | Failure Mechanisms in Integrated Circuits                            | 3  |  |  |
|   |                                             | 1.2.1 Electromigration (EM) in Thin Metal Films                      | 4  |  |  |
|   |                                             | 1.2.2 Time-dependent Dielectric Breakdown of Gate Ox-                |    |  |  |
|   |                                             | ides                                                                 | 5  |  |  |
|   |                                             | 1.2.3 Threshold-voltage Shifting Effects in MOS Devices.             | 5  |  |  |
|   |                                             | 1.2.4 Alpha-particle Induced Soft Errors                             | 6  |  |  |
|   |                                             | 1.2.5 Electrostatic Discharge (ESD)                                  | 7  |  |  |
|   | 1.3                                         | A Statistical Approach to Design for Reliability                     | 8  |  |  |
|   | 1.4                                         | Integrating Design-for-Reliability with the CAD Environ-             |    |  |  |
|   |                                             | ment                                                                 | 9  |  |  |
|   | 1.5                                         | Previous Work                                                        | 10 |  |  |
|   | 1.6                                         | Summary of this Thesis                                               | 10 |  |  |
| 2 | A RELIABILITY MODEL OF AN INTERCONNECT SYS- |                                                                      |    |  |  |
|   | TE                                          | MI                                                                   | 12 |  |  |
|   | 2.1                                         | Introduction                                                         | 12 |  |  |
|   | 2.2                                         | A System Model of the Reliability of VLSI Interconnect Pat-          |    |  |  |
|   |                                             | terns                                                                | 13 |  |  |
|   | 2.3                                         | Characterizing the Reliability of Conductor Segments                 | 17 |  |  |
|   | 2.4                                         | Minimum Order Statistics                                             | 17 |  |  |
|   | 2.5                                         | Statistical Models of Electromigration Failure                       | 19 |  |  |
|   |                                             | 2.5.1 Distribution of Conductor Lifetimes                            | 19 |  |  |
|   |                                             | 2.5.2 Mtf Dependency on DC Current Density                           | 21 |  |  |
|   |                                             | 2.5.3 Mtf and Current Pulses                                         | 26 |  |  |
|   |                                             | 2.5.4 A General Model for Electromigration Caused by a               |    |  |  |
|   |                                             | Time-varying Current Density                                         | 29 |  |  |
|   | •                                           | 2.5.5 The Dependency of $t_{50}$ and $\sigma$ on Physical Dimensions | 30 |  |  |
|   | 2.6                                         | Summary                                                              | 33 |  |  |

|              | 2.7 An Example                                                                                                                                                                    | 33                               |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| 3            | RELIANT: A RELIABILITY ANALYZER FOR INTEGR CIRCUIT INTERCONNECTS  3.1 Overview                                                                                                    | <b>36</b><br>36                  |
| 4            | EVALUATION OF RELIANT  4.1 Calibrating the Reliability Models                                                                                                                     | 57<br>57<br>58<br>63<br>65<br>67 |
| 5            | REVIEW                                                                                                                                                                            | 71                               |
| 6            | REFERENCES                                                                                                                                                                        | 73                               |
| A            | DERIVATION OF THE FAILURE RATE OF A SERIES CONNECTED SYSTEM                                                                                                                       | S-<br>80                         |
| В            | FITTING A LOGNORMAL DISTRIBUTION TO TH<br>MINIMUM ORDER STATISTIC                                                                                                                 | E 81                             |
| $\mathbf{C}$ | VARIATION OF $t_{50}$ AND $\sigma$ WITH $n$                                                                                                                                       | 83                               |
| D            | RELIABILITY ANALYSIS OF A 3100 GATE CMOS STADARD CELL DEVICE  D.1 Approximate Models of Contact, Via and Step Segments  D.2 Derivation of The Failure Rate of a Power Bus Section | <b>86</b><br>86                  |
| $\mathbf{E}$ | A TEST CHIP FOR CALIBRATION OF RELIABILIT MODELS                                                                                                                                  | Y<br>89                          |

# F PUBLICATIONS BY THE AUTHOR WHICH RELATE TO THIS THESIS 92

# List of Figures

| 1.1  | Bathtub curve of instantaneous failure rate vs. time                        | 2   |
|------|-----------------------------------------------------------------------------|-----|
| 2.1  | A n-component series-connected system                                       | 14  |
| 2.2  | A n-component parallel-connected system                                     | 15  |
| 2.3  | Set of conductor segment types used to model reliability                    | 16  |
| 2.4  | $p(t)$ and $P(t)$ for the lognormal distribution $(t_{50} = 1, \sigma = 1)$ | 20  |
| 2.5  | $\log(p(t))$ vs. $\log(t)$ for minimum order statistic and lognor-          |     |
|      | mal distribution                                                            | 22  |
| 2.6  | $t_{50}'/t_{50}$ vs. $n$                                                    | 23  |
| 2.7  | $\sigma'/\sigma$ vs. $n$                                                    | 24  |
| 2.8  | $\delta T$ vs. $J$ (conductor thickness = $0.5\mu m$ , dielectric thickness |     |
|      | $=1\mu m)$                                                                  | 26  |
| 2.9  | A series of unidirectional, rectangular current pulses                      | 27  |
|      | $t_{50}$ vs. conductor width $W$                                            | 30  |
|      | $\sigma$ vs. conductor width $W$                                            | 31  |
|      | Typical conductor grain structure for different conductor                   | 0 - |
|      | widths                                                                      | 32  |
| 2.13 | Failure rate of interconnects (3100 gate CMOS circuit)                      | 35  |
| 3.1  | Block diagram of Reliant                                                    | 37  |
| 3.2  | Block diagram of Extrem                                                     | 40  |
| 3.3  | An example showing various intersections                                    | 43  |
| 3.4  | Determination of dominant current flow direction                            | 44  |
| 3.5  | Segmentation and extraction of example in Fig. 3.3                          | 48  |
| 3.6  | Equivalent circuit for the example in Fig. 3.5                              | 49  |
| 3.7  | Multiple storage, adaptive quad tree                                        | 52  |
| 4.1  | User interaction with Reliant                                               | 59  |
| 4.2  | Spice input file geinput.spc                                                | 60  |
| 4.3  | A part of Sirprice input file geinput.spr                                   | 61  |
| 4.4  | Layout of GEINPUT                                                           | 62  |
| 4.5  | Failure rate vs. time for GEINPUT                                           | 63  |

| 4.6  | Layout of GEBINC                                              | 64 |
|------|---------------------------------------------------------------|----|
| 4.7  | Failure rate vs. time for GEBINC                              | 65 |
| 4.8  | Identification of highly-stressed segments using Datatrieve . | 66 |
| 4.9  | Charge movement in a MOS circuit                              | 69 |
| 4.10 | Current source model of power bus                             | 70 |
| C.1  | $t_{50s}/t_{50}$ vs. $n$                                      | 84 |
| C.2  | $\sigma_s/\sigma$ vs. $n$                                     | 85 |
| D.1  | Layout of $V_{DD}$ and $Ground$ buses                         | 87 |
| E.1  | Layout of test chip                                           | 91 |

## List of Tables

| 2.1 | 3100 Gate CMOS standard cell circuit                        | 34 |
|-----|-------------------------------------------------------------|----|
| 4.1 | Dependencies of $t_{50}$ and $\sigma$ for each Segment Type | 57 |
| 4.2 | Summary: Reliant analysis results for GEINV and GEBINC      | 58 |

a market following impact and extractional consistence

# List of Symbols

| $E_a$             | Activation energy $(eV)$                               |
|-------------------|--------------------------------------------------------|
| k                 | Boltzmann's constant: $8.62 \times 10^{-5} eV/K$       |
| T                 | Absolute temperature $(K)$                             |
| q                 | Electron charge $(-1.602 \times 10^{-19}C)$            |
| $\kappa$          | Thermal conductivity of SiO2: $1.4W/m.K$               |
| $\rho$            | Resistivity of Aluminum $(3 \times 10^{-10} \Omega.m)$ |
| J                 | Current density $(A/cm^2)$                             |
| t                 | Time                                                   |
| p(t)              | Probability density function (Pdf)                     |
| P(t)              | Failure probability or cumulative distribution         |
|                   | function (Cdf)                                         |
| R(t)              | Reliability                                            |
| h(t)              | Instantaneous failure rate or hazard function $(FIT)$  |
| $t_{50}$          | Median time to failure (Mtf)                           |
| $\sigma$          | Standard deviation (Sigma)                             |
| $N(x,\mu,\sigma)$ | Normal distribution in $x$ having mean $\mu$           |
|                   | and standard deviation $\sigma$                        |
| Lambda            | Minimum feature size of fabrication process            |
|                   |                                                        |

## Chapter 1

## INTEGRATED CIRCUIT RELIABILITY

### 1.1 Introduction

The reliability of any piece of equipment may be defined as the probability that it will perform it's required function under stated conditions for a stated period of time [O'Connor 1981]. Reliability has always been an important consideration for the designer of electronic systems, particularly those systems having military or aerospace applications. A major factor which limits the complexity of all systems is the maximum reliability achievable within the constraints of the available technology. The great technological advances in electronic engineering such as the development of the transistor (1947) and the integrated circuit (1958) resulted in dramatic improvements in the reliability of electronic equipment, along with improved performance and reduced cost. Each of these developments therefore made possible an increase in system complexity, while maintaining acceptable levels of reliability. However, since 1958 the development of IC technology has resembled an exponential function of time, rather than a series of step functions. As evidence of this, we may consider the annual doubling in the number of active devices realizable on a single chip. This growth has largely been due to reductions in minimum feature size, although improved process control has also allowed the maximum chip size to grow to approximately 10mm by 15mm. ICs containing in excess of 100 000 transistors are now common.

This trend has important implications for reliability. Firstly, it will be shown that device scaling results in an increase in electrical stress, with a resultant loss of reliability due to stress-induced wearout mechanisms. Sec-



Figure 1.1: Bathtub curve of instantaneous failure rate vs. time

ondly, if the individual devices or interconnects are subject to failure according to some random distribution, an increase in complexity (e.g. number of active devices on the chip) must necessarily increase the probability of failure at any given time. This is a fundamental issue which will be considered in detail in chapter 2 of this thesis, where a model for the relationship between reliability and complexity will be presented. It may be mentioned here in passing that when ICs were first developed, it was widely held that the reliability of an IC was independent of the number of active devices on the die, because they were all manufactured simultaneously in a single manufacturing process. This argument is based on a deterministic view of the IC in which, if one active device is "good", all will be "good". In fact, failure remains a random process and no matter how well controlled the manufacturing process may be, a distribution of times-to-failure for the devices on a single die will always be observed.

The reliability of an IC is traditionally pictured in terms of the bathtub curve of instantaneous failure rate versus time (Fig. 1.1). The unit of failure rate is the FIT (1FIT = 1 failure per 10<sup>9</sup> hours, or 0.1% per million hours). Three regions may be discerned on this graph. During the infant mortality phase, ICs exhibit a decreasing failure rate. Failures which occur here are the result of gross built-in flaws or defects in devices or interconnects, which fail rapidly when stressed. Only a small percentage of ICs contain such defects and as they are removed by failure, the failure rate of the remaining components decreases toward zero. During the wearout phase, failure of "good" (i.e. defect-free) components through wearout becomes significant and an increase in failure rate is observed. It must be emphasized that the same wearout mechanisms may be responsible for many of the infant mortality failures, with the presence of defects acting as an acceleration factor. There is also a close relationship between these early reliability failures and yield failures caused by defects. For example, a severe photolithographic defect may cause an open circuit in a conductor, which is classified as a yield failure [Stapper 1983], [Maly 1985]. A less severe defect of the same kind may only cause a local narrowing of the conductor, which would eventually lead to a reliability failure when the conductor was stressed.

Between the aforementioned two regions on the bathtub curve the failure rate approaches it's minimum value, which is the optimum region for operation of the IC. Temperature is a strong acceleration factor in most wearout mechanisms and operation at an elevated temperature may be used to accelerate the passage of the infant mortality phase. Burn-in procedures are commonly used in this way to qualify the reliability of ICs for military and aerospace applications. A typical burn-in procedure consists of operating the component at  $85^{\circ}C$  for 168 hours. Reliability qualification by burn-in can greatly increase the cost of a component, particularly in the case of small production volumes. A failure rate limit of 100FIT is widely used, and 10FIT has been proposed as a target for the near future.

### 1.2 Failure Mechanisms in Integrated Circuits

The most common IC failure mechanisms reported in the literature will now be reviewed. The list is not exhaustive and no attempt is made to classify the failure mechanisms in order of importance. Only qualitative descriptions are given here, mathematical models for electromigration are presented in Chapter 2.

### 1.2.1 Electromigration (EM) in Thin Metal Films

When an electron current flows through a metal conductor, it exerts two forces on the metal ions: an electrostatic force resulting from the interaction between the negatively charged electrons and the positively charged ions and a friction force commonly known as the "electron wind" force. In metals, which are good conductors, the electron wind force dominates [d'Heurle 1978-1]. The result of this force is a movement of material in the direction of electron current flow. Electromigration has been observed to occur by lattice diffusion in several bulk metals. However, in the case of thin polycrystalline films of Al and Au, the dominant mechanism for mass transport is migration along grain boundaries. The ratio between grain boundary flux and lattice flux has been estimated at  $10^6$  for Al [d'Heurle 1970].

Grain boundary electromigration may lead to the failure of a conductor when a flux divergence occurs at a point in the material. A local flux divergence may occur at the interface between three adjacent metal grains in a polycrystalline film (the so-called "triple point"). It may also result from variations in grain size or diffusion coefficient. Temperature gradients caused by localized heating also influence the rate and direction of mass transport. The most common failure mode is the formation of a void, which grows in size until it's diameter equals the width of the conductor, causing an open circuit. Prior to this catastrophic failure, the increase in resistance caused by the growth of the void may gradually degrade the performance of the circuit. A different failure mode occurs when the flux divergence causes mass accumulation, leading to the formation of a hillock or whisker. This may result in a short circuit to an adjacent conductor, or the rupture of an overlaying dielectric.

The process of electromigration is essentially current driven and the operating current density is the primary stress factor. The median time to failure (Mtf) of a conductor is inversely proportional to current density J when J is less than  $10^5 A/cm^2$  and decreases more rapidly for higher current densities. The current densities occurring in modern high-speed VLSI circuits (particularly bipolar) may equal or exceed this value. For example, in a conductor  $2\mu m$  wide and  $0.5\mu m$  thick, a current of only 1mA is sufficient to produce a current density of  $10^5 A/cm^2$ . The downward pressure on design rules can be expected to aggravate the situation [Woods 1984].

Temperature is an acceleration factor in electromigration. Activation energies of the order of 0.54eV for grain boundary electromigration have been reported by several authors (see [Black 1974], for example). The statis-

tical distribution of time to failure has generally been found to be lognormal. The median time to failure and standard deviation are strongly influenced by the dimensions of the conductor and the median grain size [Agarwala 1970], [Scoggan 1975], [Kinsbron 1980], [Vaidya 1980], [Iyer 1984].

# 1.2.2 Time-dependent Dielectric Breakdown of Gate Oxides

Since the development of the MOS transistor, there has been a steady decline in the thickness of the gate oxide used. In the case of transistors used as logic switches, thinner oxides were sought in order to reduce threshold voltages [Sze 1981]. In dynamic RAMs, thinner oxides compensate for the loss of storage capacitance due to shrinking layout dimensions. Oxide thicknesses of 200 - 300Å are typical of modern MOS processes, with 100Å - 200Å layers being used in 1 MBit DRAMS [Baglee 1986-1]. The absolute breakdown field strength of Silicon dioxide is approximately 10MV/cm. The field strength in a dielectric 100Å thick, with 5V applied is 5MV/cm, or 50% of the absolute breakdown value. As the field strength approaches the breakdown value, there is an increase in both the conduction through the oxide due to Fowler-Nordheim tunneling and the incidence of oxide failure. Both these phenomena are associated with the presence of defects in the oxide, such as trapped particles or pinholes. The distribution of time to failure is lognormal, with applied voltage the primary stress factor. Activation energies of 0.6eV to 0.8eV have been reported for an Al-SiO<sub>2</sub> system [Anolick 1980].

## 1.2.3 Threshold-voltage Shifting Effects in MOS Devices

Some physical phenomena may give rise to a shift in the threshold voltage  $V_T$  of a MOS transistor after manufacture. This is not a catastrophic failure, but a sufficiently large shift in  $V_T$  will cause the circuit to malfunction. The time to failure depends on the criterion of failure, i.e. the maximum  $V_T$  shift which will still allow the particular circuit to function correctly. Threshold shifts are caused by mobile ions trapped in the gate oxide [Lycoudes 1980] and hot-carrier injection from the channel into the gate oxide [Eitan 1981], [Takeda 1982], [Sabnis 1986]. The incidence of the former is related to Sodium ion contamination during the fabrication process and may be largely eliminated through careful processing. Hot-carrier injection is prevalent in short-channel transistors, where large electric fields occur

in the vicinity of the drain. When the electric field exceeds approximately 100kV/cm, the carriers absorb more energy from the field than they are able to lose through scattering and their net energy level increases, relative to the energy levels of conduction and valence bands. Because these carriers are no longer in thermal equilibrium with the lattice, they are known as "hot-carriers". The hot-carriers produce more electron-hole pairs through impact ionization. Once a certain energy threshold is reached, this process becomes self-sustaining and avalanche multiplication begins. In a n-channel device, most of the electrons produced by impact ionization are collected by the drain, while the holes flow to the substrate to form the substrate current. The magnitude of the substrate current may be used as a measure of the incidence of hot-carrier formation in transistors. A small percentage of the hot carriers will have sufficient energy to surmount the  $Si-SiO_2$  energy barrier and may be injected into the oxide. Both holes and electrons may enter the oxide in this manner, although the effect of hot-electrons has been more widely reported. The energy barrier for electrons is 3eV, as opposed 4eV for holes, which suggests that most hot-carrier injection is the result of electrons. An injected charge  $\delta Q$  give rise to a shift in threshold voltage of  $\delta V = \delta Q/C_{ox}$ . The density of fast states at the  $Si\text{-}SiO_2$  interface near the drain is also increased.

When constant-voltage scaling is applied to the device, the magnitude of the electric field increases and with it the likelihood of hot-carrier injection. The time to failure has been found to be a negative exponential function of applied voltage [Sabnis 1986]. Solutions to this problem which have been proposed are the graded drain transistor [Takeda 1982] and the lightly doped drain (LDD) transistor [Ogura 1981]. Both these techniques attempt to reduce the electric field in the vicinity of the drain. Unfortunately, it has been found that structures fabricated in this manner are more susceptible to damage by electrostatic discharge (see paragraph 1.2.5) [Duvvury 1986].

### 1.2.4 Alpha-particle Induced Soft Errors

The phenomenon of soft errors in DRAMs caused by incident  $\alpha$ -particles was first reported in 1979 [May 1979]. These particles, which originate in the metals used in the packaging and interconnect layers, may have high energy levels. When an  $\alpha$ -particle passes through Silicon, it produces electron-hole pairs by impact ionization along it's path. This series of electron-hole pairs is known as an  $\alpha$ -particle track. The charge generated in this manner may cause a soft error if it is collected by a memory storage cell, or if reaches one of the bit lines during the time interval when the

bit lines are floating. The latter effect may be reduced by minimizing the ratio of bit-line floating time to cycle time. As the size of DRAM storage cells decreases, the critical charge  $Q_{crit}$  needed to change the state of the cell is reduced accordingly. Therefore scaling the memory cell increases it's susceptibility to soft errors.

Some techniques for reducing the Soft Error Rate (SER) of DRAM cells by structural modification have been proposed. The so- called HI-C technique [Tasch 1978] uses a shallow implant below the storage node to increase the capacitance and to reduce the storage node depletion volume. The result is a memory cell with an increased critical charge and a reduced ability to collect carriers generated by  $\alpha$ -particles. Sai-Halasz has proposed a blanket buried n-type grid in the p-type substrate of the DRAM, which acts as a collector for the excess electrons [Sai-Halasz 1982]. The use of trench capacitor structures has also been found to reduce the SER [Ishiuchi 1986], [Baglee 1986-2].

### 1.2.5 Electrostatic Discharge (ESD)

Damage to an IC through ESD may occur when a static charge buildup on an external object is allowed to discharge through one of the pins on the integrated circuit. Such static charges are caused by triboelectric generation or electrostatic induction [Moss 1986]. The results of the discharge may include dielectric breakdown, junction breakdown, metallization damage, latchup, soft errors and the creation of latent defects, which fail at some future time. Methods of avoiding ESD damage fall into two categories:

- Avoidance of static charge buildup by improved handling and assembly techniques;
- Design of protection networks for the input and output pins. Typical VLSI input protection networks include a thick-oxide nMOS transistor providing primary protection and a diffused resistor and a field plate diode for secondary protection. For output protection, it is possible to design the output driver transistor to safely withstand 5kV discharges [Duvvury 1986].

The trend toward VLSI has increased ESD susceptibility, due to the use of thinner dielectrics and reduced conductor spacings.

# 1.3 A Statistical Approach to Design for Reliability

It has been shown for several important failure mechanisms that the shrinking of layout design rules has a negative impact on reliability. Razdan and Strojwas define layout design rules as follows [Razdan 1986]:

Layout design rules are constraints placed on the designer by the process capabilities. The rules are set such that chips following the rules will have an acceptable yield and the number of circuits which may be placed on the chip is maximized.

This definition should be extended, by including the requirement that the design rules should also result in circuits having an acceptable reliability.

The requirements for optimization of yield and reliability may be quite different. In developing design rules based on yield considerations, the aim is to determine a set of dimensional constraints (such as minimum size, minimum overlap, minimum spacing) for each mask layer which will guarantee some minimum yield figure, given that the circuit is subject to catastrophic failure caused by shorts, breakdown or parasitic effects. Design rules are determined by considering the effect on circuit operation of disturbances in the fabrication process. These disturbances include local variations such as spot defects and global variations caused by underetching or misalignment. The design rules determined in this way generally have the property that they are absolute, in that they do not depend on the electrical variables (voltage, current) in particular parts of the circuit. They are also global, being valid for all parts of a layout or several layouts using the same technology.

Design rules for reliability, on the other hand, must be designed to limit the electrical stress applied to each part of the circuit. For example, a design rule for electromigration which has been widely used in the past, states that the current density in a conductor may not exceed  $10^5 A/cm^2$ . This rule is a relative one, because it requires a priori information about the current in a conductor before the conductor width can be established. It is also a local rule, in that it's application will result in different conductor widths being used in different parts of the circuit.

While design-for-reliability rules such as this have proven valuable to designers in the past, they have limitations. Firstly, they do not give any quantitative indication of the reliability of a structure (e.g. a conductor or transistor) designed according to the stated rules. Consequently, the designer is not able to predict the reliability of the structure during the

design phase, nor is it possible to combine the reliability figures for all structures in order to predict the overall reliability of the IC. Secondly, these rules do not explicitly indicate the relationship between reliability and chip surface area and so it is not possible to globally optimize the trade-off between chip size and reliability. The downward pressure on design rules may demand that future designs be optimized in this manner. Thirdly, it is often difficult to interpret the rules correctly. For example, it is not clear how the maximum current density rule should be applied to circuits where currents are complex functions of time.

This thesis proposes that design-for-reliability should not be based solely on design rules, but that predictive models for failure mechanisms be used to predict the actual reliability of each device or conductor. The decision as to what a critical dimension should be, should be based on the actual reliability implications of that decision. This may be weighed against other considerations, such as area or circuit performance. If this approach is applied to all the significant failure modes, it will become possible to predict the reliability of the IC as a whole, during the design phase.

This approach offers two significant benefits for the IC manufacturer:

- the ability to produce more optimized designs, and
- the ability to predict product reliability without resorting to costly qualification procedures.

# 1.4 Integrating Design-for-Reliability with the CAD Environment

To be effective, this design-for-reliability strategy must be supported by the CAD environment. Software tools must be developed to provide reliability data to the design data base, which may be used to create new designs, or to modify existing ones. Much of the data required for reliability prediction is already available in the design data base and new tools should make use of this wherever possible.

Two paradigms for CAD tools supporting design-for-reliability are proposed.

A Reliability Analysis Tool This tool makes use of the layout description file, simulation input waveforms and process information to compute the reliability of a chip in the presence of various failure mechanisms. Reliability is evaluated in a piecewise manner, with reliability

data on specific failure mechanisms or structures stored individually in the data base. It is possible to interrogate this data base interactively.

Synthesis Tools Tools which perform the operations of device design, placement, routing and compaction, taking reliability into consideration. These tools are designed to optimize the trade-off between reliability and other design parameters.

The Reliant program described in Chapter 3 is an example of a reliability analysis tool.

### 1.5 Previous Work

When work on this thesis began in June 1985, no reports of software tools for IC reliability prediction existed in the literature. In June 1986, the author and Prof. K.F. Poole submitted a paper for publication in the IEEE Transactions on Reliability, describing methods for the implementation of such a program [Frost 87]. In December 1986, two papers were presented at the International Conference on Computer-aided Design (ICCAD- 86) which partially addressed the problem. [Hall 1986] described a program called SPIDER, which determines the current density in interconnects using a modified finite difference approach. [Hohol 1986] described RELIC, a program which computes cumulative wear functions for a number of failure modes. These papers address aspects of the problem, but do not provide a predictive method of assessing the reliability of a VLSI design, in terms of hazard rate, probability density function (Pdf) or cumulative distribution function (Cdf). Hence this work is the only one which offers a solution to this problem.

### 1.6 Summary of this Thesis

Mathematical and statistical models for IC interconnect reliability are developed in Chapter 2. System models for reliability are introduced and a method for predicting the reliability of an IC, based on a systems approach is presented. This model is used to predict the reliability of a VLSI interconnect system which is subject to electromigration. The limitations on the applicability of this model are discussed in detail. Models for the dependence of interconnect reliability parameters on current density, physical dimensions and temperature are developed, based on a critical analysis

of the available literature. Methods for determining these parameters are presented.

Chapter 3 is devoted to the design of the Reliant software analysis tool for VLSI interconnects, which is based on the mathematical and statistical models developed in chapter 2. The structure of this program is described and the choice of algorithms and data structures is explained.

In Chapter 4, the performance of Reliant is evaluated using two VLSI leaf cells. The limitations of reliability prediction tools based on circuit simulators are discussed and an alternative approach to MOS VLSI prediction using a switch-level simulator, is proposed.

Chapter 5 summarizes the results achieved and suggests some directions for future research.

### Chapter 2

## A RELIABILITY MODEL OF AN INTERCONNECT SYSTEM

### 2.1 Introduction

In this chapter, a reliability model of a VLSI interconnect system is developed. The interconnect system is subject to wearout due to electromigration induced void formation.

There were several reasons for the choice of this particular reliability problem. Interconnects are becoming an increasingly important factor in VLSI, occupying a large percentage of the die area and limiting the speed performance. It is expected, therefore that interconnect reliability will have a major influence on overall chip reliability. The large number of papers on electromigration which have appeared since 1970 support this view and also provide a useful source of experimental data. A significant feature of electromigration is that it is not confined to individual active devices. As the interconnect pattern is more closely associated with circuit design rather than device design, a method for designing reliable interconnects is of particular value to the designers of circuits and software tools for automated layout.

The method was developed with the following goals in mind.

- It should be "design-sensitive", reflecting the factors which are influenced by the circuit designer, rather than the process engineer.
- The algorithms used should be simple enough to enable complex circuits to be analyzed in a reasonable time.

Because the electromigration process is not yet fully understood, the
best available physical models should be used. Software should be
constructed in such a way that better models may be incorporated as
they become available.

### 2.2 A System Model of the Reliability of VLSI Interconnect Patterns

The interconnect patterns found in VLSI circuits are highly complex. However, these patterns generally consist of collections of a small number of features, such as straight conductors, contact cuts and vias. These features will be referred to as *segments*. The reliability of each segment depends on the current density in the segment and it's dimensions. Assume that

- the reliability of each segment is known;
- the reliability of each segment is independent of all other segments.

The interconnect pattern as a whole may then be viewed as a system of segments, and the reliability thereof may be determined using the theory of system reliability.

Fig. 2.1 shows a block diagram of a *n*-component series- connected system. A system of this type has no redundancy and will fail when any one of it's components fails. The reliability of a series-connected system is equal to that of the minimum order statistic of the reliability of it's components. The minimum order statistic represents the "weakest" component in the system. In Fig. 2.2, a parallel-connected system having *n* parallel paths is shown. This circuit possesses redundancy, as all *n* components must fail to cause system failure. The reliability of this system is described by the reliability of the maximum order statistic, or "strongest" component. The reliability of complex systems may be determined by the decomposition of parallel and serial paths.

The pattern of a complete interconnect mask layer may be treated as a series-connected system if and only if the failure of any segment therein will lead to the failure of the IC. This restriction therefore excludes chips having built-in redundancy at the functional level. Furthermore, the interconnect pattern itself may not contain redundant paths. In the remainder of this thesis, non-redundant circuits will be assumed.

In defining a set of segment types, the following factors were considered.

• The set must be comprehensive, i.e. it must include all features commonly occurring in VLSI interconnect patterns.



Figure 2.1: A n-component series-connected system

- It must be sufficiently "fine-grained" to model the simplest features which may occur individually in an interconnect pattern.
- The condition of mutual statistical independence of the segments (i.e. the reliability of any segment is independent of the states of all other segments) must always be satisfied.
- It must be possible to characterize the reliability of each segment type independently by means of test structure measurements.

The following five segment types were chosen, based on these considerations (see Fig. 2.3).

- A straight conductor of length L, width W. This will be referred to as a Run segment.
- A Contact segment containing a contact cut of width W, height H, and a surrounding area of metal. A contact only occurs on the lower metal layer (Metal1) in the case of a multilayer metallization process.



Figure 2.2: A n-component parallel-connected system

- A Via segment, containing a via cut of width W, height H, and a surrounding area of metal. A via only occurs in a multilayer metallization process, between Metal1 and Metal2 layers.
- Step segment, which occurs when a metal track runs over a discontinuity in the underlying surface. This segment type was included to model the conductor thinning and stress concentration which may occur in such cases. In the following discussion, a step will only be considered to have occurred when a Metal1 conductor crosses a Polysilicon line at a 90° angle. It is assumed that the intermetal dielectric has sufficiently planarized the surface that steps will not occur on Metal2. Note that the length L of this segment is defined as being in the direction of current flow and the width W in the orthogonal direction.
- A Pad segment is used to model the reliability of a bonding pad.

Each segment has some dimensional parameters associated with it. The process of fracturing the interconnect pattern into a set of segments consists of identifying instances of the segment primitives and determining the dimensional parameters of each instance. The requirement of statistical independence places two restrictions on the points at which a conductor is fractured.



Figure 2.3: Set of conductor segment types used to model reliability

- The length of a segment must be greater than a minimum value. A void which forms in a grain boundary will generally have a diameter smaller than the median grain size, generally 3μm or less. Also, there is usually a hillock of material associated with the formation of a void, which may drift some distance along the conductor. For statistical independence, the minimum segment size must contain the region of void and hillock growth. For the first 10% of failures, it has been found that voids are generally smaller than the grain size and that when a hillock forms, it always does so within 10μm of a void [LaCombe 1986]. It appears reasonable, therefore to take 3μm as the absolute minimum segment length, with 10μm a more conservative value.
- The flux divergence must be approximately zero at the interface between two segments and the current flow direction must be perpendicular to the interface. This ensures that the current flow pattern in a segment is not influenced by the patterns in adjacent segments.

A restriction must also be placed on the maximium current density, to avoid thermal interaction. This is discussed in paragraph 2.5.2. The segment dimensions shown in Fig. 2.3 are derived as follows.

Runs The segment has a length  $L >= 10 \mu m$  and width W.

Contacts and Vias These segments consist of a cut of width  $W_c$  and length  $L_c$ . The surrounding metal width is taken as 2.Lambda, where Lambda is the minimum feature size of the fabrication process. The overall metal width is therefore  $W = W_c + 4.Lambda$ . The distance from the interface to the edge of the cut is taken as W, at which point the flux divergence is approximately zero [Horowitz 1983]. The overall metal length is therefore  $L = L_c + W_c + 4.Lambda$ .

Steps A step has a width W and a length  $L >= 3\mu m$ .

Pads A pad has a length  $L_p$  which is the length of the bond wire contact area and a width  $W_p$ .

# 2.3 Characterizing the Reliability of Conductor Segments

The approach developed in the preceding paragraph assumes that the reliability of each segment is known. This data must be acquired by means of tests on a range of test structures. For accurate reliability prediction, the reliability parameters of the manufacturing process must be fully characterized in this manner, on a regular basis.

### 2.4 Minimum Order Statistics

Models for the reliability of the minimum order statistic of a collection of segments will now be presented.

Each segment has a probability density function (Pdf) of lifetime, p(t). The associated probability that the segment has failed, or cumulative distribution function (Cdf), is P(t). Taking the time at which stress is applied to the segment as 0, the Cdf is the integral of the Pdf from time 0 to time t:

$$P(t) = \int_0^t p(\tau)d\tau \tag{2.1}$$

The reliability of the segment R(t), which is the probability that it has not failed by time t, is

R(t) = 1 - P(t) (2.2)

The hazard function, or instantaneous failure rate h(t) is the rate at which segments would fail, if a large number of segments were evaluated.

$$h(t) = \frac{p(t)}{1 - P(t)} \tag{2.3}$$

Consider an interconnect pattern consisting of n segments, each having a different Pdf  $p_i(t)$  and Cdf  $P_i(t)$ . The Cdf of the interconnect pattern  $P_s(t)$  is given by

$$P_s(t) = 1 - \prod_{i=1}^{n} (1 - P_i(t))$$
 (2.4)

For the special case where all segments are identical, this reduces to

$$P_s(t) = 1 - (1 - P(t))^n (2.5)$$

It can also be shown that the instantaneous failure rate  $h_s(t)$  of the interconnect pattern is simply the sum of the failure rates of the individual segments:

$$h_s(t) = \sum_{i=1}^n h_i(t)$$
 (2.6)

The proof of this theorem which is found in most statistics textbooks uses a Markov chain model, where each element has an exponential lifetime distribution. In fact, this is a general result which is independent of the nature of the distribution used. A proof of this theorem appears in Appendix A.

As mentioned in Chapter 1, a maximum failure rate of 10FIT is considered acceptable for the chip as a whole. This extremely low failure rate is attained when  $P_i(t) \ll 1$ . When considering only the "early failures", the following approximations may be made:

$$P_s(t) \approx \sum_{i=1}^n P_i(t) \tag{2.7}$$

$$h_i(t) \approx p_i(t)$$
 (2.8)

$$p_s(t) \approx \sum_{i=1}^n p_i(t) \tag{2.9}$$

The method described here is not limited to conductor failure. It may be applied across several failure mechanisms, provided that the aforementioned assumptions remain valid. Equations 2.4 and 2.6 may be used to predict the Cdf and failure rate of the IC, taking all failure mechanisms into account.

### 2.5 Statistical Models of Electromigration Failure

The choice of distributions for the conductor segments, and the dependency of statistical parameters on various factors will now be considered.

### 2.5.1 Distribution of Conductor Lifetimes

In the following discussion, the lifetime of a conductor segment refers to the time at which an open-circuit occurs. However, the analysis method presented here may also be applied to other lifetime definitions, e.g. the time at which a 10% resistance increase has taken place.

In the literature, the lognormal distribution is used by most authors to model electromigration failure (see [Black 1974], for example). This distribution is generally found to fit experimental lifetest data well. The form of the lognormal Pdf is

$$p(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-\frac{1}{2} \left[\frac{\ln t - \ln t_{50}}{\sigma}\right]^2\right)$$
 (2.10)

The lognormal distribution is characterized by two parameters, the median time to failure (Mtf)  $t_{50}$  and the standard deviation  $\sigma$ . The lognormal distribution is formed by substituting  $x = \ln(t)$  and  $\mu = \ln(t_{50})$  in a normal distribution  $N(x, \mu, \sigma)$ . Note that unlike the normal distribution, the median is not equal to the mean in a lognormal distribution. Also, the parameter  $\sigma$  is the square root of the variance of the prototype normal distribution and not of the lognormal distribution itself. Fig. 2.4 shows the Pdf and Cdf of a lognormal distribution having  $t_{50} = 1$  and  $\sigma = 1$ .

The theoretical basis for the use of the lognormal distribution to model thermally-activated failure mechanisms such as electromigration is now considered. In a polycrystalline metal thin film with an ideally textured grain, the atomic flux along a grain boundary is [d'Heurle 1978-2]:



Figure 2.4: p(t) and P(t) for the lognormal distribution  $(t_{50} = 1, \sigma = 1)$ 

$$J_b = \frac{N_b \delta D_b Z_b^* q \varepsilon}{dkT} \tag{2.11}$$

where  $N_b$  is the atomic density,  $\delta$  is the effective width of the boundary, d is the average grain size,  $D_b$  is the diffusion coefficient,  $Z_b^*q$  is the effective charge, and  $\varepsilon$  is the applied electric field. The diffusion coefficient is a negative exponential function of temperature:

$$D_b = D_o \exp\left(-\frac{E_a}{kT}\right) \tag{2.12}$$

Substituting 2.12 in 2.11 and writing  $\varepsilon = \rho J$  where  $\rho$  is the resistivity of the metal and J the electron current density, we obtain

$$J_{b} = \frac{N_{b}\delta Z_{b}^{\star}q\rho JD_{o}}{dkT} \exp\left(-\frac{E_{a}}{kT}\right)$$
 (2.13)

The time to failure  $t_L$  is inversely proportional to the rate of mass transport. Neglecting the absolute temperature term below the line in equation (2.13), the following expression for  $t_L$  is obtained:

$$t_L = \frac{C}{J} \exp\left(\frac{E_a}{kT}\right) \tag{2.14}$$

where C is a constant. The inverse exponential dependence on absolute temperature (Arrhenius relationship) shows the influence of thermal energy level on atomic mobility.

Equation (2.14) shows that  $t_L$  would be lognormally distributed if the thermal energy levels are normally distributed. Therefore, apart from it's usefulness in describing measured data, the use of the lognormal distribution would appear to have a good theoretical basis. However, an inconsistency arises when considering conductors of different lengths. As the lognormal distribution does not have the property of closure, a seriesconnected system of segments, each having a lognormal lifetime distribution, will have a lifetime distribution which is not lognormal. This poses a theoretical problem when comparing conductors having different lengths. Consider a conductor of length L, which has a lognormal lifetime distribution. If two of these conductors are connected in series, the resulting conductor of length 2L will not have a lognormal lifetime distribution. In practice, the lognormal distribution is used to fit conductor lifetime data regardless of the conductor length.

It has been proven empirically that the minimum order statistic produces an approximately lognormal distribution for early failures (see Appendix B). A lognormal distribution is fitted to the minimum order statistical distribution in the region  $t << t_{50}$  by choosing appropriate values of  $t_{50}$  and  $\sigma$ . These fitted parameter values are called  $t_{50}'$  and  $\sigma'$  respectively. Fig. 2.5 shows the result for  $n=10^2$ ,  $10^4$  and  $10^6$  elements. A good correspondence is obtained over 7 decades of p(t), for all values of n. This is consistent with the experimental observations of LaCombe, et al that the early failures are lognormally distributed for various conductor lengths [LaCombe 1986]. Figs. 2.6 and 2.7 show the normalized variation of  $t_{50}'$  and  $\sigma'$  with n, respectively. These graphs may be used to determine the parameters of a single via from the parameters of a via chain test structure.

### 2.5.2 Mtf Dependency on DC Current Density

The Mtf of the lifetime distribution is described by equation 2.14, with constant C taking a specific value. This constant will be denoted by the symbol G, to indicate it's dependence on geometric factors:

$$t_{50} = \frac{G}{J} \exp\left(\frac{E_a}{kT}\right) \tag{2.15}$$



Figure 2.5:  $\log(p(t))$  vs.  $\log(t)$  for minimum order statistic and lognormal distribution

This first-order theoretical expression does not adequately model the dependence of Mtf on current density at all current levels. The following expression, originally due to [Black 1968] has been widely used:

$$t_{50} = \frac{G}{J^n} \exp\left(\frac{E_a}{kT}\right) \tag{2.16}$$

where n is an exponent whose reported values in the literature vary from 1 to 5 [Huntingdon 1961], [Black 1968], [Blair 1970]. The measured values of n show an increase from 1 for  $J \leq 10^5 A/cm^2$  to 5 for  $J \geq 2 \times 10^6 A/cm^2$ . The theoretical expression 2.15 therefore only agrees with measured results for  $J \leq 10^5 A/cm^2$ . In addition to the problem of choosing the correct value of n, variations in the value of  $E_a$  with stress have also been reported [Partridge 1985].

Modelling the effects of stress and temperature on Mtf is extremely important for accurate reliability prediction, because measurements are always performed on test structures at elevated temperatures and current densities. The results of these tests are extrapolated to normal operating



Figure 2.6:  $t'_{50}/t_{50}$  vs. n

conditions using expressions such as 2.16. The use of incorrect values of n and  $E_a$  can lead to highly inaccurate predictions, when extrapolating over several hundred degrees of temperature and two or three decades of current density. Recently, McPherson proposed that the variations in n and  $E_a$  may be accounted for by using a generalized Eyring model for electromigration, in which the reaction rate is determined by the stress-dependent free energy of activation [McPherson 1986]. Writing McPherson's result in a slightly different form, we obtain the following expression for Mtf:

$$t_{50} = \frac{G}{J_{eff}} \exp\left(\frac{E_a}{kT}\right) \tag{2.17}$$

where

$$J_{eff} = J_{eff}(J) = \frac{\sinh(\Psi J)}{\Psi}$$
 (2.18)

and

$$\Psi = \Psi_0 + \frac{\Psi_1}{kT} \tag{2.19}$$

Constants  $\Psi_0$ ,  $\Psi_1$  and  $E_a$  are determined by measurement. The effective current density  $J_{eff}$  is asymptotically equal to the nominal current density



Figure 2.7:  $\sigma'/\sigma$  vs. n

J, for  $J << 1/\Psi \ (\approx 5 \times 10^5 A/cm^2)$ . For J above this value,  $J_{eff}$  increases exponentially. Equation 2.17 provides the best model for Mtf currently available.

The existence of a current density threshold in Aluminum, below which no electromigration takes place, has been proposed by Blech [Blech 1976]. This hypothesis is based on measurements of atom drift velocity using the Blech-Kinsbron edge displacement method [Blech 1975]. The test structure consisted of a stripe of Aluminum deposited on a layer of Titanium Nitride. An electric current will tend to concentrate in the stripe, because of it's lower resistivity and the stripe is displaced in the direction of current flow. The rate of displacement is used as an estimate of average ion drift velocity. In testing conductors in this way, Blech found that no displacement occurred at current densities less than  $1.1 \times 10^5 A/cm^2$  for a  $115 \mu m$  long conductor and that this threshold increased with decreasing conductor length. English, et al have repeated this experiment with fine-grained Aluminum and report a threshold which is a factor of 4 lower [English 1983].

Blech explained this effect by suggesting that a compressive stress buildup occurs as the current-carrying portion of the conductor is forced against the non-current carrying portion beyond the anode. The total force applied to the conductor by the electron current is proportional to the product of the current and the conductor length, and this force must exceed the elastic deformation limit of the metal before hillocks form at the anode.

The current density threshold concept has also been used in models of interconnect failure due to void formation [Gardner 1987], [Hohol 1986]. It has not been used here, for the following reasons:

- The conclusion that the overall length of the conductor influences the rate of mass flow, does not correlate with the observation by LaCombe, et al that electromigration failure is a local phenomenon. The Blech-Kinsbron structure will always produce a global movement of material, which will be related to the total force applied to the conductor. Void formation, on the other hand, is determined by the conditions pertaining at a grain boundary triple-point. Therefore the extrapolation of results from the former case to the latter case does not have a sound basis.
- In the case of a conductor whose width is less than the median grain size, the conductor is effectively divided into "bamboo" sections, and the overall conductor length can have no influence on void formation at triple points.

When considering the effect of current density on Mtf of conductor segments, the possibility of thermal interaction between neighboring segments must be considered. As statistical independence is a requirement, the maximum current density must be limited to prevent significant self-heating. Consider a conductor of length L, width W and thickness T on the surface of a Silicon wafer. The insulating  $SiO_2$  layer has a thickness H. The thermal resistance of the dielectric layer is given by the following equation, if fringing effects are ignored:

$$R_{th} = \frac{H}{\kappa LW} \tag{2.20}$$

where  $\kappa$  is the thermal conductivity of  $SiO_2$ . If a current I flows through the conductor, the dissipation is

$$P_d = I^2 R = J^2 \rho L W T \tag{2.21}$$

where  $\rho$  is the resistivity of Aluminum. Assuming an isothermal substrate, the rise in conductor temperature due to self-heating is

$$\delta T = P_d R_{th} = \frac{\rho T H J^2}{\kappa} \tag{2.22}$$



Figure 2.8:  $\delta T$  vs. J (conductor thickness =  $0.5\mu m$ , dielectric thickness =  $1\mu m$ )

The rise in temperature as a function of J is shown in Fig. 2.8 for a conductor  $0.5\mu m$  thick, on a  $1\mu m$  dielectric. The self-heating effect (and hence the possibility of thermal interaction) increases rapidly above  $J=10^6 A/cm^2$ . This places an upper bound on the current density for which accurate results may be expected from the method used.

#### 2.5.3 Mtf and Current Pulses

Thus far, the models for Mtf presented have assumed a constant (DC) current density J flowing through a conductor segment. In order to accurately predict reliability of real-world ICs, the models must include the effect of a time-varying current density j(t). The problem of developing a model for a general time-varying current density has not been solved yet, but several workers have investigated unidirectional and bidirectional current pulses [English 1972], [Bobbio 1974], [Kinsbron 1978], [Schoen 1980], [English 1983], [Towner 1983], [Harrison 1988]. These sources were investigated, with the aim of generalizing the results obtained to time-varying



Figure 2.9: A series of unidirectional, rectangular current pulses

waveforms other than perfectly rectangular pulses.

Fig. 2.9 shows a series of unidirectional, rectangular current pulses of amplitude  $J_p$  having a repetition frequency f = 1/T and pulse width  $t_1$ . The duty cycle d is

$$d = \frac{t_1}{T} \tag{2.23}$$

where  $0 \le d \le 1$ . Let  $t_{50dc}$  be the lifetime of the conductor when a continuous (DC) current of  $J_p$  is applied and  $t_{50p}$  the lifetime when the current pulse series is applied. A second lifetime  $t_{50po}$  may be defined as the total "on" time during lifetime  $t_{Lp}$ .

$$t_{50p}^o = dt_{50p} (2.24)$$

If the DC model for Mtf is applied to the pulse series (which is equivalent to assuming that electromigration is a quasi-static process), then only the total "on" time will influence the lifetime of the conductor. The lifetime in terms of "on" time  $t_{Lo}$  should therefore be independent of duty cycle and could be determined using equation (2.15). The actual lifetime of the

conductor would be inversely proportional to d.

$$t_{50p} = \frac{G}{dJ_{eff}(J_p)} \exp\left(\frac{E_a}{kT}\right) = \frac{t_{50dc}}{d}$$
 (2.25)

Attempts to verify equation (2.25) experimentally have been reported in the literature. Generally, equations of the following form have been used to fit the results obtained:

 $t_{50p} = \frac{t_{50dc}}{d^n} \tag{2.26}$ 

Towner, et al tested Aluminum conductors using a 1kHz square wave with duty cycles of 25%, 50% and 75% and having a peak value of  $2 \times 10^6 A/cm^2$  [Towner 1983]. Their results showed a lifetime enhancement for small values of d, which was modeled empirically by equation (2.26) with n equal to 2.

Miller tested Aluminum conductors at a repetition frequency of 250 kHz, a peak current density of  $4 \times 10^6 A/cm^2$  and duty cycles from 25% to 100% [Miller 1978]. His results are best approximated by n=3.25.

Schoen has attempted to explain Miller's results by proposing the existence of a damage relaxation mechanism, which allows the damage accumulating during the pulse "on" time to be removed during the "off" period [Schoen 1980]. While the possibility of a relaxation mechanism cannot be ruled out, Schoen's analysis assumes total reversibility of the accumulated damage, which is unlikely. Furthermore, in order to match the strong dependence of  $t_{50p}$  on d, he was forced to use an unrealistically small value for the time constant of the relaxation process (3.6ms).

English and Kinsbron have argued that temperature gradients due to self-heating are responsible for the divergent results reported by other workers [English 1983]. They have attempted to avoid this problem by measuring electromigration mass transport at current density levels less than  $10^6 A/cm^2$ , using the Blech-Kinsbron edge displacement technique. Their results are attractive, in that they indicate n=1 as predicted by the DC model. However, this result must be treated with caution because of the indirect method of measurement used.

Little work on the effect of bidirectional current pulses on conductor lifetime has been reported. The lifetime of conductors carrying an alternating current is greatly increased, compared to the DC case [d'Heurle 1971]. This is indicative of a partial reversibility of electromigration damage, when the direction of current flow is reversed. Bobbio, et al have used resistance measurements to demonstrate this effect [Bobbio 1974]. The time constant associated with the reversal was of the order of several hours. However, they did not relate this effect to conductor lifetimes. From the results summarized above, it is clear that there are large variations in the results obtained by different workers and much of the data is inconclusive. A coherent model for pulse- induced electromigration which is supported by experimental data does not exist at present, although one is currently being developed [Harrison 1988]. As the goal of this thesis is to present a method for prediction of interconnect reliability, it was decided to use the best model currently available, but to structure the algorithms and software tools in such a way that improved models could be included, as they are developed.

## 2.5.4 A General Model for Electromigration Caused by a Time-varying Current Density

The model used in this thesis is based on the DC model for Mtf. The damage function f(t) is defined as that fraction of the median lifetime of the conductor which has been consumed by damage at time t. From equation (2.17) it follows that

$$f(t) = \frac{t}{t_{50}} = \frac{tJ_{eff}(j(t))}{G} \exp\left(-\frac{E_a}{kT}\right)$$
 (2.27)

During a time interval  $\delta t$  at time t, the damage function is increased by an amount  $\delta f$ , where

$$\delta f = \frac{J_{eff}(j(t))}{G} \exp\left(-\frac{E_a}{kT}\right) \delta t \tag{2.28}$$

Therefore the total damage to time t is

$$f(t) = \frac{1}{G} \left( \int_0^t J_{eff}(j(\tau)) d\tau \right) \exp\left( -\frac{E_a}{kT} \right)$$
 (2.29)

From equation (2.27), the Mtf may be written in terms of f(t) as follows:

$$t_{50} = \frac{t}{f(t)} = \frac{G}{\frac{1}{t} \int_0^t J_{eff}(j(\tau)) d\tau} \exp\left(\frac{E_a}{kT}\right)$$
 (2.30)

Equation (2.30) reduces to equation (2.25), if j(t) is a square wave having peak value  $J_p$  and duty cycle d and the integral is evaluated across one or more cycles of j(t). This provides the most pessimistic estimate of lifetime for this case, as there is no lifetime enhancement for small values of d included in the model. When considering bidirectional current flow, the integral below the line in equation (2.30) is replaced by it's absolute value:

$$t_{50} = \frac{t}{f(t)} = \frac{G}{\left|\frac{1}{t} \int_0^t J_{eff}(j(\tau)) d\tau\right|} \exp\left(\frac{E_a}{kT}\right)$$
(2.31)



Figure 2.10:  $t_{50}$  vs. conductor width W

This model is probably optimistic for the case of a conductor carrying a symmetrical current waveform, as it predicts an infinite lifetime for such a conductor, independent of the current amplitude.

## 2.5.5 The Dependency of $t_{50}$ and $\sigma$ on Physical Dimensions

Finally, the effect of the physical dimensions on  $t_{50}$  and  $\sigma$  of conductor segments will be considered.

#### Run Segments

The relationship between conductor width,  $t_{50}$  and  $\sigma$  has been widely studied for the case of straight conductors. Figs. 2.10 and 2.11 show  $t_{50}$  and  $\sigma$  as functions of conductor width W, determined experimentally by [Kinsbron 1980]. Similar results have been reported by other workers [Scoggan 1975], [Vaidya 1980], [Iyer 1984].  $t_{50}$  Shows a linear decrease with decreasing W, until W is approximately equal to the median grain size  $(2 - 5\mu m)$ . For



Figure 2.11:  $\sigma$  vs. conductor width W

narrower conductors, the lifetime is improved.  $\sigma$  Increases monotonically with decreasing W, but the rate of increase is much greater when W is less than the median grain size. The form of these functions may be explained in terms of the number of grain boundary triple points which contribute to void formation across the width of a conductor (see Fig. 2.12). When the conductor is several grain sizes wide, the formation of an open circuit requires that a sufficient number of triple points be aligned across the width of the conductor. As the width increases, the probability of this occurring is reduced and the lifetime is enhanced. Also, the wider the conductor, the smaller the variation in lifetime will be between different segments, implying a small value of  $\sigma$ . If W is less than the median grain size, the conductor acquires a "bamboo"-like grain structure, in which the migration of metal is blocked at intervals by grain boundaries perpendicular to the direction of current flow. The narrower the conductor, the smaller the probability of including a triple point within a segment and so  $t_{50}$  increases rapidly with decreasing width.  $\sigma$  Increases at small widths because of the greater variation in lifetime between segments which contain triple points and those which do not. A Monte Carlo simulation of void growth at triple



Figure 2.12: Typical conductor grain structure for different conductor widths

points has provided good qualitative agreement with this model [Nikawa 1981]. A contributing factor to the increase in  $\sigma$  at submicron linewidths is the presence of local width reductions due to photolithographic defects.

Empirical equations may be used to model the dependencies of the geometry factor G and  $\sigma$  on W. Expressions of the following form provide a good fit with measured data in the literature [Frost 1987]:

$$G = G(W) = C_1 W + C_2 + \frac{C_3}{W^n}$$
 (2.32)

$$\sigma = \sigma(W) = \frac{C_4}{W^m} + C_5 \tag{2.33}$$

where  $C_1$ ,  $C_2$ ,  $C_3$ ,  $C_4$ ,  $C_5$ , n and m are constants.

The effect of length on  $t_{50}$  and  $\sigma$  has also been widely studied. Conductor length is modeled implicitly in the minimum order statistical method used here. A conductor of length L is modeled as n unit length conductors in series, where  $L_E = unit length = 10 \mu m$ .

$$n = L/L_E (2.34)$$

The variations in  $t_{50}$  and  $\sigma$  with n are derived in Appendix C. The variations in effective Mtf  $(t'_{50})$  and Sigma  $(\sigma')$  with n were discussed in paragraph 2.5.1.

#### Contact, Via, Step and Pad Segments

Little information on the reliability of these structures is available in the literature. Prokop and Joseph studied the geometry-dependence of Al-Si contact reliability [Prokop 1972], but their results have not been verified by other workers.

### 2.6 Summary

A method of analyzing the reliability of an interconnect pattern has been presented. This may be summarized as follows.

- The interconnect pattern is fractured into a collection of segments: Runs, Contacts, Vias, Steps and Pads.
- The Mtf of each segment is determined using equation 2.31. The geometry factor G is determined by the segment type and it's dimensions.
- Sigma is determined by the segment type and it's dimensions.
- A lognormal distribution is assumed for each segment.
- The reliability of the interconnect system is determined as the minimum order statistic of the reliability of all the segments.
- The method is limited to current densities less than  $10^6 A/cm^2$ .

## 2.7 An Example

An approximate analysis was performed on the interconnect mask of an IC, of which some conductor characteristics were measured. The circuit was a 3100-gate CMOS standard cell design, with double layer metallization and a minimum feature size of  $2\mu m$ . Some parameters of the interconnects layers are summarized in Table 2.1. A constant current of  $3\mu A$  was assumed for all logic interconnects, and the current in the Ground and  $V_{DD}$  buses calculated accordingly. The geometry factor for Runs was based on the

|   | Layer   | Min. Width | Steps | Vias/    | Thickness | Total            |
|---|---------|------------|-------|----------|-----------|------------------|
|   |         | $(\mu m)$  |       | Contacts | Å         | Length $(\mu m)$ |
| Ĭ | Metal 1 | 4          | 55296 | 36018    | 6000      | 367800           |
|   |         |            |       | contacts |           |                  |
|   | Metal 1 | 3          | 0     | 7884     | 6000      | 656640           |
|   |         |            |       | vias     |           |                  |
| ľ | Metal 2 | 5          | 62137 | 16935    | 8000      | 1107500          |
|   |         |            |       | vias     |           |                  |

Table 2.1: 3100 Gate CMOS standard cell circuit

data of [Kinsbron 1980]. Simple geometrical models were used to determine geometry factors for other segment types (see Appendix D).

The interconnect failure rate is shown in Fig. 2.13, as a function of time. The interconnect pattern is seen to be highly reliable: a failure rate of 10FIT is not exceeded for more than 30 years. A breakdown of the failure rate data shows that the power and ground buses contributed roughly 60% of the total, followed by the contact windows with 30%. Runs in logic interconnects contribute less than 2% of the total, despite the fact that they comprise 84% of the total conductor length.



Figure 2.13: Failure rate of interconnects (3100 gate CMOS circuit)

## Chapter 3

## RELIANT: A RELIABILITY ANALYZER FOR INTEGRATED CIRCUIT INTERCONNECTS

#### 3.1 Overview

In this chapter, the design of the Reliant suite of software tools is described. Predicting the reliability of an interconnect pattern requires a data base consisting of the following information:

- the mask layout,
- current waveforms for each conductor,
- process reliability data,
- an environmental specification.

A high priority when designing Reliant was to develop a tool which could easily be integrated into a typical CAD environment. This is important because reliability prediction tools are not currently used by circuit designers, and they will not be easily accepted if they are cumbersome and difficult to use. Accordingly, the following principles were laid down.

• When Reliant makes use of data already available in the design environment, it should be able to access this data in it's existing form, without additional user input.



Figure 3.1: Block diagram of Reliant

• The time required to compute reliability must not significantly increase the overall design time. Where possible, reliability calculation must take place concurrently with other forms of design verification.

Fig. 3.1 shows a block diagram of Reliant. It is assumed that the circuit designer has created a layout of the IC in Caltech Intermediate Form (CIF) [Mead 1981]. This file is called designname.cif, where designname represents the name of the design to be analyzed. It is also assumed that the designer has written a SPICE input file with which the design may be simulated. This file (designname.spc) must include device models, analysis specifications and definitions of all external components which will be connected to the circuit. A description of the file format may be found in [Vladimirescu 1981].

Reliant consists of three modules: Extrem, Combine and Sirprice. Extrem fractures the interconnect pattern into segments and stores the seg-

ment properties in a database file (designname.db1). It simultaneously extracts an equivalent circuit from the layout, modelling the parasitic series resistances and capacitances of all conductive layers. A partial transistor extraction is also performed. Currently, this module has been implemented for nMOS and CMOS technologies. A SPICE- compatible netlist description of the extracted circuit is saved in designname.ext. This file contains a resistor for every conductor segment identified by Extrem.

In order to simulate the extracted circuit, the external components, models and analysis specifications must added. The Combine module uses the definitions in *designname*.spc, together with the extracted netlist to create an augmented netlist called *designname*.spr.

Sirprice simulates circuit behavior and stores the current waveform for each segment. This information is used together with the segment database to calculate the failure rate of each segment. Sirprice produces several output data files. The file designname.dat is the normal SPICE output file, and designname.lis is a print file listing the failure rate of each segment type, for the first 20 years. File designname.db2 is a reliability database which provides an interface to future postprocessors, and segments.dat is an interface to a commercial database package (VAX Datatrieve) which is used to examine the database interactively.

Reliant was developed in a VAX/VMS environment. It consists of approximately 8000 lines of VAX Pascal, plus some modifications to the Fortran 77 source code of Spice 2G.5.

# 3.2 Extrem: A Circuit Extractor for Electromigration Modelling

#### 3.2.1 Overview

Extrem has two functions. Firstly, it fractures the interconnect pattern of each metal layer into a set of segments and determine the type and dimensions of each segment occurrence. Secondly, it creates a netlist from which the circuit may be simulated and the current in each segment determined. Extrem processes the layout in three phases. During the first phase, the CIF layout file is parsed. Wires and polygons are reduced to collections of boxes and the hierarchy is flattened. In the second phase, the intersections of objects on the various mask layers with one another are determined. In the third phase, the segments and equivalent circuit are determined from the list of intersections for each box. Segmentation of a layout  $\Omega$  may be

represented by the following four mappings:

$$\Omega(H) \to \Omega(C)$$
 (3.1)

$$\Omega(C) \to \Omega(I)$$
 (3.2)

$$\Omega(I) \to \Omega(S)$$
 (3.3)

$$\Omega(I) \to \Omega(T, R, C)$$
 (3.4)

A general set of hierarchically organized CIF objects  $\Omega(H)$ , is mapped onto a set of fully instantiated Boxes and Labels,  $\Omega(C)$ .  $\Omega(C)$  is mapped onto a set of intersections  $\Omega(I)$ , which in turn is mapped onto a set of segment instances  $\Omega(S)$ , in a data base. The intersections are also mapped onto a set of transistors, parasitic resistances and capacitances  $\Omega(T,R,S)$ , which forms the extracted circuit. Each segment in the data base is linked to a resistor in the extracted circuit, by means of a unique ID number.

## 3.2.2 Algorithms for Segmentation and Circuit Extraction

Fig. 3.2 shows a block diagram of Extrem. The CIF Parser performs the mapping in equation (3.1). All CIF commands and some common user extensions are parsed. Cell calls and transformations are fully instantiated. Polygons, boxes and wires are checked for non-Manhattan line segments, and error messages indicating the coordinates of such segments are generated. Only Manhattan objects are processed. Polygons and wires are fractured into sets of boxes before processing. Round flashes are ignored.

The segmentation algorithm will now be described in more detail. In the following analysis, it is assumed that the parser has

- fully instantiated all symbol calls;
- replaced Polygons and Wires with collections of Boxes;
- removed any non-Manhattan geometry.

These assumptions limit the set of objects which must be processed to rectilinear Boxes and Labels. An IC layout may now be described as a set of objects

$$C = \{c_j(x_{min}, y_{min}, x_{max}, y_{max}, \Theta, \lambda) | j = 1, 2, 3..n_s\}$$
(3.5)



Figure 3.2: Block diagram of Extrem

string and  $\lambda$  is the mask layer. If an object  $s_j$  is a Box,  $\Theta$  will be a null string. If  $s_j$  is a Label,  $(x_{min}, y_{min}) = (x_{max}, y_{max})$ .

Each mask layer  $\lambda$  has the property of Type, which defines it's function in the fabrication process. The set of possible values of Type depends on the technology used. For example, a p-Well CMOS process with two metal layers would be defined as follows: If  $T_l$  represents the Type of the l-th mask layer, then

$$T_l \in \{Metal1, Metal2, Poly, Diffusion, Cuts, Glass, Vias, Pwell, Pplus, BuriedContact, Labels\}$$
 (3.6)

For a nMOS depletion load process with double layer metal,

$$T_l \in \{Metal1, Metal2, Poly, Diffusion, Cuts, Glass, Vias, Implant, BuriedContact, Labels\}$$
 (3.7)

The set of layer types having the property of electrical conduction is a subset of the full set of layer types. For the MOS processes described above, the conductive layers are *Metall*, *Metal2*, *Diffusion* and *Poly*.

The Labels layer is reserved for Label objects, which are used to define the bonding pads of an IC. The other layers contain only Boxes.

The electrical characteristics of an IC are determined by relative positions of the mask objects. Two objects intersect if they enclose the same set of points in 2- dimensional Cartesian space. The identification and classification of intersections forms the basis of the segmentation algorithm. If two Manhattan rectangles intersect, the area of intersection is also a Manhattan rectangle. The following algorithm creates a record of every intersection of an object on layer i with an object on layer j.

#### Algorithm 3.1: Find All Intersections

For each object on layer i Do

For each object on layer j Do

If  $(x_{min}^i \leq x_{max}^j)$  And  $(x_{max}^i \geq x_{min}^j)$ And  $(y_{min}^i \leq y_{max}^j)$  And  $(y_{max}^i \geq x_{min}^j)$  Then

Begin

Determine intersection type;

Create intersection record with  $x_{max} = min(x_{max}^i, x_{max}^j)$   $y_{max} = min(y_{max}^i, y_{max}^j)$   $x_{min} = max(x_{min}^i, x_{min}^j)$   $y_{min} = max(y_{min}^i, y_{min}^j)$ ;

End:

The intersections are classified according to the types of the layers containing the objects. For nMOS and CMOS technologies, the following intersection types are defined.

- Def. 3.1: If  $T_i = T_j$  and  $T_i$ ,  $T_j \in \{conductive layers\}$  then the intersection is an Abutment. Note that overlapping objects on the same layer have been removed by preprocessing. Therefore intersecting objects on the same layer can only abut along a single edge.
- Def. 3.2: If  $(T_i = Cuts \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Cuts)$  then the intersection type is ContactDiff.
- Def. 3.3: If  $(T_i = Cuts \text{ and } T_j = Poly)$  or  $(T_i = Poly \text{ and } T_j = Cuts)$  then the intersection type is ContactPoly.

- Def. 3.4: If  $(T_i = Cuts \text{ and } T_j = Metal1)$  or  $(T_i = Metal1 \text{ and } T_j = Cuts)$  then the intersection type is ContactM1.
- Def. 3.5: If  $(T_i = Vias \text{ and } T_j = Metal1)$  or  $(T_i = Metal1 \text{ and } T_j = Vias)$  then the intersection type is ViaM1.
- Def. 3.6: If  $(T_i = Vias \text{ and } T_j = Metal2)$  or  $(T_i = Metal2 \text{ and } T_j = Vias)$  then the intersection type is ViaM2.
- Def. 3.7: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is enclosed by objects on the Pplus layer, then the intersection type is Pchannel.
- Def. 3.8: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is not enclosed by objects on the Pplus layer, then the intersection type is Nchannel.
- Def. 3.9: If  $(T_i = Poly \text{ and } T_j = Diffusion)$  or  $(T_i = Diffusion \text{ and } T_j = Poly)$  and the intersection is enclosed by objects on the *Implant* layer, then the intersection type is Ndepletion.
- Def. 3.10: If  $(T_i = Metal1 \text{ and } T_j = Labels)$  or  $(T_i = Metal2 \text{ and } T_j = Labels)$  or  $(T_i = Labels \text{ and } T_j = Metal1)$  or  $(T_i = Labels \text{ and } T_j = Metal2)$ , then the intersection type is Pin.

The identification and classification of intersections provides the basis for determining the current flow patterns in the conductive layers. This is illustrated by the example in Fig. 3.3. If we consider a single box on a conductive layer, the intersections represent areas where the electrical variables are modified by interaction with the variables of other objects. For example, at an Abutment, current may enter or leave the box horizontally. At a Via1, Via2, ContactM1, ContactPoly or ContactDiff there is a vertical current flow between the box and a different conductive layer. At a Pchannel, Nchannel or Ndepletion intersection, the horizontal current flow in a Diffusion box is modulated by the voltage on the intersecting Poly. The Pin intersection represents a bonding wire connecting to the external environment. When an intersection is identified, a node number is assigned to it. A copy of the intersection record, including the node number, is attached to both objects.

Before the segments of a conducting layer can be determined, the dominant current flow direction in each object must be established. As can be seen from the example, current flow is not always parallel to the x and y



Figure 3.3: An example showing various intersections

axes, even when the layout is limited to Manhattan geometry. Furthermore, when there are more than two intersections the current flow pattern depends on the relative magnitudes of the current components at the intersections. Assuming that these boundary values are known, an accurate numerical solution is possible by solving Laplace's equation in two dimensions [Horowitz 1983], [Barke 1985]. This approach was rejected, for the following reasons:

- it is computionally intensive;
- a priori information about the boundary values is not available;
- it is only necessary to determine the dominant current flow direction, in order to replace the object with a collection of segments. The current flow direction at every point in the object is not required.



Figure 3.4: Determination of dominant current flow direction

The segments defined in Chapter 2 are Manhattan structures, symmetrical about the direction of current flow. Therefore the axis-direction (x or y) which most closely approximates the real current flow direction in each part of the object, must be found. A probabilistic method of determining current flow direction may be derived by considering the structures in Fig. 3.4. If there are n intersections, the total number of potential current paths between two intersections is

$$C_2^n = \frac{n!}{2!(n-2)!} \tag{3.8}$$

When the intersections of an object are widely scattered in the x-direction but not in the y-direction, these potential current vectors lie close to a line parallel to the x-axis, described by the equation

$$y = y_{avg} = \frac{1}{n} \sum_{i=1}^{n} y_i \tag{3.9}$$

Similarly, if the scatter is greatest in the y-direction, the potential current flow vectors are approximated by the line

$$x = x_{avg} = \frac{1}{n} \sum_{i=1}^{n} x_i \tag{3.10}$$

The variances of the x and y coordinates of the centers of the intersections provide a measure of the scatter in each direction.

Algorithm 3.2: Finding the Dominant Direction of Current Flow Within an Object

- 1. Determine  $x_{avg}$  and  $y_{avg}$  using equations 3.9 and 3.10.
- 2. Determine Var(x) and Var(y) as follows:

$$Var(x) = \sum_{i=1}^{n} (x_i - x_{avg})^2$$
 (3.11)

$$Var(y) = \sum_{i=1}^{n} (y_i - y_{avg})^2$$
 (3.12)

3. If Var(x) < Var(y) then dominant direction = x, else dominant direction = y.

Before proceeding with the segmentation of the current object, the possibility of steps in the *Metall* layer must be investigated (it is necessary to determine current flow directions in the *Metall* layer, before steps can be identified).

Def. 3.11: A Step intersection occurs when the edge of a Poly object crosses a Metall object at right angles to the direction of current flow in the Metall object.

Steps are detected as follows. The *Metal1* and *Poly* layers are checked for intersections, using Algorithm 3.1. The edges of the overlap region are compared with the width of the *Metal1* object. If an edge crosses more than 75% of the conductor width, an intersection record is generated. A single *Metal1* conductor crossing a *Poly* stripe will contain two *Step* intersections.

The segmentation of all Box objects on conductive layers is now performed. First, the intersection records for a given Box are sorted in ascending center coordinate order in the dominant direction of current flow. As there are typically less than 10 intersections per box in a VLSI layout, a simple bubble sort is used. In order to extract the features occurring on all conductive layers, a larger set of segments is defined:

$$SegmentTypes = \{MetalRun, NonMetalRun, \\ ContactDiff, ContactPoly, \\ ContactM1, Step, ViaM1, \\ ViaM2, Pad, Pchannel, \\ Nchannel, Ndepletion\}$$
 (3.13)

The following partial mapping between intersections and segments exists:

$$ContactDiff \rightarrow ContactDiff$$
 (3.14)

$$ContactPoly \rightarrow ContactPoly$$
 (3.15)

$$ContactM1 \rightarrow ContactM1$$
 (3.16)

$$Step \rightarrow Step$$
 (3.17)

$$ViaM1 \rightarrow ViaM1$$
 (3.18)

$$ViaM2 \rightarrow ViaM2$$
 (3.19)

$$Pin \rightarrow Pad$$
 (3.20)

$$Pchannel \rightarrow Pchannel$$
 (3.21)

$$Nchannel \rightarrow Nchannel$$
 (3.22)

$$Ndepletion \rightarrow Ndepletion$$
 (3.23)

Each intersection produces a segment of a specific type. The unmapped segment types are the *MetalRuns* and *NonMetalRuns*. These occur between the intersections and are used to link up the segments of other type. The five metal interconnect segment types defined in Chapter 2 form the following subset of *SegmentTypes*.

$$MetalSegmentTypes = \{MetalRun, ContactM1, \\ Step, ViaM2, Pad\}$$
 (3.24)

A record for each element of MetalSegmentTypes is stored in the data base, for use during the reliability prediction phase. This record has the format:

Extrem assigns values to all the parameters in Segment except for Mtf, Sigma and Failure Rate, which are determined by Sirprice.

The segmentation algorithm may be summarized as follows.

#### Algorithm 3.3: Segmentation of an Object

```
For i := 1 To n Do
     Begin
     create a Current Segment record using the
     intersection \rightarrow segment mapping;
     With CurrentSegment Do
           Begin
           Determine the dimensions;
           Determine the series resistance and parallel
           capacitance;
           If Type \in \{MetalSegments\} Then
           save in designname.db1;
           Write \pi-section RC equivalent network to
           designname.ext;
           End;
      If i > 1 Then
           Begin
           create a Run segment linking the right(top) edge
           of the PreviousSegment to the left(bottom) edge
           of the CurrentSegment;
           With RunSegment Do
                 Begin
```



Figure 3.5: Segmentation and extraction of example in Fig. 3.3

Determine the dimensions; Calculate the series resistance and parallel capacitance; If  $Type \ \epsilon \ \{MetalSegments\}$  Then save in designname.db1; Write  $\pi$ -section RC equivalent network to designname.ext; End;

End;

PreviousSegment := CurrentSegment; End;

Fig. 3.5 illustrates the segmentation of the layout example in Fig. 3.3. The segment ID number is shown in the middle of each segment. Fig. 3.6 shows the extracted equivalent circuit for this layout.



Figure 3.6: Equivalent circuit for the example in Fig. 3.5

The netlist is defined as a SPICE subcircuit called designname, plus an instance X1 of this subcircuit. The user-defined Labels on the bonding pads are the external net names and these are mapped onto the internal node names which have been assigned by Extrem. Channel segments are represented by instances of the default pMOS and nMOS devices. Any other segment i is represented by a  $\pi$ -section network consisting of  $R_i$ ,  $C_{iL}$  and  $C_{iU}$  or an L- section  $R_i$ ,  $C_i$ . Resistors representing a metal layer are designated R\$i. The Sirprice simulator automatically stores the currents in all resistors having the R\$ prefix, for use in the reliability calculation (see paragraph 3.4).

### 3.2.3 Data Structures for Region Queries

Because of the large number of intersections which must be found in a VLSI design, appropriate data structures for region queries must be used. If n objects are stored in a linear list, the problem of finding all objects intersecting a given window requires O(n) computation time. A number of other data structures have been developed for region queries which yield

performance better than O(n). These techniques fall into three groups.

- 1. Scan-line Techniques [McCreight 1980], [Gupta 1982], [Ullman 1984]. In scan-line techniques, the static two dimensional intersection problem is transformed into a dynamic one-dimensional intersection problem. A scan line parallel to the x-axis is drawn across the layout. The x<sub>min</sub> and x<sub>max</sub> coordinates of any objects which intersect the scan line form a set of intervals which may intersect the window. To find the intersecting intervals rapidly, a balanced search tree is used. The scan line then advances one coordinate step in the y-direction. If an object no longer intersects the scan line, it's interval is deleted from the current interval set, while new intervals are added. The process is repeated until the scan line has traversed all the objects. The scan line algorithm will report the objects intersecting a window in O(log(n+i)) time, where n is the number of objects and i the number of intersections found.
- 2. Quad Trees [Brown 1986], [Finkel 1974]. Quad trees are widely used for the manipulation of 2-dimensional geographical data. A quad tree is built by recursively dividing a coordinate space into 4 equally-sized quadrants, and then classifying the objects according to the quadrant in which they fall. The root node of the tree represents the coordinate space of the whole layout. Each node has four pointers, each pointing to a quadrant. If an object falls entirely within a single quadrant, it is loaded onto the tree at the corresponding node. In a perfect quad tree, the quadrants are subdivided and the objects redistributed among the child quadrants until each quadrant contains only one object. This may result in a very large data structure containing little data. More efficient memory usage is obtained by making the tree building algorithm adapt according to the number of objects in a particular part of the tree. A threshold is placed on the number of objects allowed on a node and when this is exceeded, the node is subdivided and the objects are redistributed. This results in an unbalanced tree which is deepest in the areas having the greatest density of objects. Intersection searching time will depend on the number of levels traversed, but is  $O(\log(n))$  on average. Objects which are bisected by the boundaries of a quad are dealt with by maintaining bisector lists at each node, or by multiple storage in adjacent quads.
- 3. Multi-dimensional Binary Search Trees [Rosenberg 1985], [Bentley 1975]. These data structures, which are also known as k-d trees,

are used in data bases for associative queries based on multiple search keys. K-d trees may be used in the present application by defining the four corner coordinates of a rectangle as keys. Intersection searching time is O(Log(n)). A full description of k-d trees is beyond the scope of this thesis. Rosenberg has reported that this structure is faster than an adaptive quad tree at finding intersections, but has a higher memory usage.

The multiple storage, adaptive quad tree was chosen because it is easy to implement, provides good performance and uses memory efficiently. Fig. 3.7 depicts the structure of a tree before and after the intersection search. The tree nodes are defined as follows:

Node = Record

MinquadHor, MinquadVer, MaxquadHor, MaxquadVer:

Integer;

Internal\_Node : Boolean;
PtrTollBoxes : ListBox\_pntr;

Count: Integer;

PtrTosubNode1, PtrTosubNode2, PtrTosubNode3,

PtrTosubNode4: NodePntr;

End;

The first four variables define the extents of the quad. PtrToLLBoxes is a pointer to the list of objects attached to the node. Instead of storing multiple representations of an object crossing a quad boundary, multiple pointers to the same object record are used. This reduces the memory requirement considerably, but precautions must be taken to prevent multiple reporting of the same object, from different quads. In a quad tree, an internal node contains pointers to subnodes, while a leaf node contains a pointer to a list of objects. Internal Node is a flag which identifies the node as internal or leaf. The tree is initialized with the root node a leaf. Count is a tally of the number of objects listed on the node, which is incremented when an object is stored at the node. When Count reaches a threshold (in this case 30), the node is redefined as internal and four subnodes are created. The objects are redistributed among the subnodes. Because a node can change state from leaf to internal, the same data structure is used for both node types. If Internal\_Node is True, PtrTosubNode1 through PtrTosubNode4 point to the 4 subnodes. If Internal\_Node is False, PtrToLLBoxes points to a linked list of pointers to object records. The linked list elements are defined as follows:



Figure 3.7: Multiple storage, adaptive quad tree

ListBox = Record

Ptr\_To\_BoxInfo: Box;

Ptr\_To\_Next : ListBox\_pntr;

End;

The pointer Ptr\_To\_BoxInfo points to BoxInfo, a complete record of an object.

BoxInfo = Record

MinBoxHor: Integer; MinBoxVer: Integer; MaxBoxHor: Integer; MaxBoxVer: Integer;

BoxLabel: Packed Array [1..8] Of Char;

Mark: Boolean;

AllIntersectionsFound: Boolean; IntersectionList: IntersectionPointer;

End;

BoxInfo contains the object's coordinates and label. Mark is a flag, which is set when the object is found for the first time during a search operation. The search algorithm only reports intersections with unmarked objects, preventing multiple reporting of the same intersection. The price paid for using multiple representation is that all Mark flags must be reset before every search, which entails an extra traversal of the quad tree. The flag AllIntersectionsFound is set when an object has been checked for intersection against all other objects. IntersectionList points to a linked list of intersection records, each having the following form:

Intersection = Record

Next: IntersectionPointer;

Done: Boolean;

Size: Array[Direction] Of Integer; Centre: Array[Direction] Of Integer;

NodeNumber: Integer;

Case Tipe: IntersectionType Of
ContactDiff,ContactPoly,
ContactM1,ViaM1,ViaM2:
(CoincidentNode: Integer);
Pin: (PinNumber: Integer);

```
Step: (UpperNode: Integer); End;
```

When an intersection is found, an intersection record is added to the intersection list of both objects.

#### 3.2.4 Technology Dependence

Up to ten different technologies may defined in the module Technology.Pas, which is then linked to the rest of the modules making up Extrem. Currently, only a p-well CMOS process with double layer metal has been implemented. Other MOS processes could be included with minor modifications. Bipolar processes would require new transistor extraction algorithms.

## 3.3 Preparing the Netlist for Simulation

The user prepares a SPICE file called designname.spc. This file must contain the following:

- MODEL definitions for the transistors;
- .TRAN analysis specifications and options;
- external components such as voltage sources and load resistors;
- a definition of subcircuit designname (this may be empty);
- an instance X1 of subcircuit designname.

The node numbers used in the subcircuit call must correspond to the labels on the bonding pads of the layout. The subcircuit is a convenient way of "encapsulating" the IC and preventing the user from duplicating node names assigned by the extractor. Typically, the user will first simulate the circuit as designed, to verify it's functionality. Once functionality is established, the layout is generated and the equivalent circuit extracted using Extrem. Combine copies all the information in the .spc file to designname.spr, but replaces the contents of the subcircuit definition with the circuit extracted by Extrem. The extracted circuit is then simulated using Sirprice, simultaneously verifying both layout correctness and interconnect reliability.

# 3.4 Sirprice: A Simulator for Electromigration Damage

Sirprice performs a simulation of the extracted circuit and determines the reliability of all conductor segments. The functioning of this module is illustrated by the following Pascal pseudo-code program.

```
Begin
Rspice(designname, Success);
If Success Then
     Begin
      Initialize Failure Rates of whole circuit, Runs, Vias,
      Contacts, Steps and Pads to 0;
      For I := 1 To NumberOfSegments Do
           Begin
           Compute Mtf(I);
           Compute Sigma(I);
           For Time := 1 To 20 Years Do
           Compute Failure Rate of Segment I;
           Store Mtf, Sigma and Failure Rate of Segment I;
           Update Failure Rate for Segments of this type;
            Update Failure Rate for all Segment types;
           End;
      Create Datatrieve File;
      End
Else
      Write Diagnostic Message;
End;
```

RSPICE is a slightly modified version of SPICE2G.5. The following alterations to the Fortran 77 source code have been made.

- RSPICE is defined as a subroutine, and can be called from within a Pascal procedure. A flag is returned to indicate the error status of the simulation.
- At every internal timestep, the time and the current in each resistor R\$i is saved in a linked list.
- The analysis temperature is saved.

If RSPICE terminates successfully, the reliability of each *Metal1* and *Metal2* segment is computed, using the methods set out in Chapter 2. This involves the integration of information from the segment data base (.db1) file, current-time data collected by RSPICE and the reliability characteristics of the technology used. The latter is provided by a technology definition module Stechnology.Pas, which contains all the constants used in the equations discussed below.

The Mtf is determined using Equation (2.31), with the geometry factor G given by Equation (2.32) and T equal to the analysis temperature specified by the user in the .spc file.  $\sigma$  Is determined using Equation (2.33). The failure rate is then calculated for the first 20 years. Mtf,  $\sigma$  and the 20 failure rate values are added to the segment record and stored in a new data base file, designname.db2. Running totals of the 20 failure rate values for each segment type and the total values for the interconnect pattern are updated. Finally, a call interface to VAX Datatrieve is generated from the .db2 data base [DEC 1981]. The application of Datatrieve for interactive examination of reliability data is described in the next section.

## Chapter 4

## EVALUATION OF RELIANT

## 4.1 Calibrating the Reliability Models

Calibration of Reliant for a specific fabrication process requires the lifetime measurement of a number of test structures. Table 4.1 summarizes the dependencies of  $t_{50}$  and  $\sigma$  on the various constants which must be determined. If simple linear regression models for dependency on W, L,  $W_c$  and  $L_c$  are assumed, a total of 14 test structures and 26 independent tests are required. The sample size for each test must provide statistically meaningful results, with the emphasis on the early failures. An evaluation of a fabrication process is beyond the scope of this thesis, whose goal is to present methods and tools for use by designers in an industrial environment. Many semi-conductor manufacturers include electromigration monitors in their process evaluation programs and may have sufficient data available to perform a calibration. In the absence of such data, Appendix E details a comprehensive test chip which may be used for this purpose.

| Segment          | t <sub>50</sub>       | σ          | No. of     | No. of |
|------------------|-----------------------|------------|------------|--------|
| Type             |                       |            | Structures | Tests  |
| Run (Metal1)     | $W, E_a, \Psi$        | W          | 2          | 4      |
| Run ( $Metal2$ ) | $W,E_a,\Psi$          | W          | 2          | 4      |
| Step             | $W,E_a,\Psi$          | W          | 2          | 4      |
| Contact          | $W_c, L_c, E_a, \Psi$ | $W_c, L_c$ | 3          | 5      |
| Via              | $W_c, L_c, E_a, \Psi$ | $W_c, L_c$ | 3          | 5      |
| Pad              | $W,E_a,\Psi$          | W          | 2          | 4      |
|                  |                       |            | 14         | 26     |

Table 4.1: Dependencies of  $t_{50}$  and  $\sigma$  for each Segment Type

| Parameter             | $\overline{GEINV}$    | $\overline{GEBINC}$  |
|-----------------------|-----------------------|----------------------|
|                       | @ $T = 27^{\circ}C$   | @ $T = 80^{\circ}C$  |
| Run time (VAX11/785): |                       |                      |
| 1) Spice              | 33s                   | 1520s                |
| 2) Extrem             | 1.7s                  | 7.4s                 |
| 3) Combine            | 0.5s                  | 1.7s                 |
| 4) Sirprice           | 44s                   | 1610s                |
| (2+3+4)               | 44.2s                 | 1619s                |
| 6) Reliant/Spice      | 1.33:1                | 1.06:1               |
| No. of nodes          | 88                    | 522                  |
| No. of transistors    | 4                     | 26                   |
| Area                  | $2727 \mu m^2$        | $12827 \mu m^2$      |
| No. of segments       | 100                   | 601                  |
| No. of metal segments | 58                    | 351                  |
| Failure rate $(FIT)$  |                       |                      |
| 0 t = 20 yrs.         | $4.2 \times 10^{-15}$ | $9.7 \times 10^{-2}$ |

Table 4.2: Summary: Reliant analysis results for GEINV and GEBINC

### 4.2 Two Examples

Reliant was used to evaluate the reliability of a CMOS standard cell 4-bit binary ripple counter with reset [GE 1986]. A CIF layout file was supplied by the manufacturers. This file was first modified with a mask editor to remove some non-Manhattan geometry. The BINR4 counter consists of a double inverter input stage (which buffer the clock and generates it's complement) and four counter stages. First, the four-transistor input stage (GEINPUT) was simulated for two clock cycles at a temperature of 27°C. Fig. 4.1 shows the user interaction with Reliant.

The user's simulation file geinput.spc appears in Fig. 4.2 and Fig. 4.3 shows the Sirprice input file geinput.spr. Bonding pads were added to the layout before simulation, as shown in Fig. 4.4. Fig. 4.5 is a graph of the tabulated results in geinput.lis. The failure rate is  $4.2 \times 10^{-15} FIT$  after 20 years and the major contributors to the failure rate are the contacts.

A single counter stage plus the double inverter (GEBINC) was then analyzed for two clock cycles at a temperature of  $80^{\circ}C$ . The layout and analysis results appear in Figs. 4.6 and 4.7 respectively. The Failure rate is  $9.7 \times 10^{-2} FIT$  after 20 years and the major contributors are the steps.

Table 4.2 summarizes the results of the two analyses. For both circuits, the major contribution to run time was the circuit simulation phase. Cir-

```
$ r extrem
EXTREM Version 1.0
The following technologies have been defined:
( 1) cmospw2m
Technology?
Design Name?
geinput
Include parasitic capacitances? (Y/N)
Parsing geinput.cif....
No errors detected in CIF file
Building quad search trees....
Finding all intersections between layers....
Extracting conductor segments and equivalent circuit....
Segment records stored in geinput.DB1
Spice netlist stored in geinput.EXT
CIF output stored in geinput.K
Total cpu time: 1.6E+00 seconds
$ r combine
COMBINE Version 1.0
Design Name?
geinput
Reading external circuit and analysis specs from geinput.SPC....
Spice netlist stored in geinput.SPR
Total cpu time: 4.5E-01 seconds
$ r sirprice
SIRPRICE Version 1.0
The following technologies have been defined:
(1) cmospw2m
Technology?
Design Name?
geinput
Simulating circuit in geinput.SPR....
Simulation completed.
Regular Spice output stored in geinput.DAT
Calculating reliability...
Creating Datatrieve records...
Total cpu time: 4.2E+01 seconds
```

Figure 4.1: User interaction with Reliant

```
USERS SIMULATION OF GEINPUT
VIN 3 0 PWL(0 0 1N 5 3N 5 4N 0 6N 0)
VDD 1 0 DC 5
VSS 2 0 DC 0
CL1 4 0 0.01P
CL2 5 0 0.01P
X1
+ 2 1 5 4 3
+GEINPUT
.SUBCKT GEINPUT
+ 2 1 5 4 3
M1 4 3 1 1 PMOS L=3U W=22U
M2 4 3 2 2 NMOS L=3U W=22U
M3 5 4 2 2 NMOS L=3U W=22U
M4 5 4 1 1 PMOS L=3U W=22U
.ENDS GEINPUT
.MODEL NMOS NMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.16E-6 XJ=0.14E-6
+ CJ=1.6E-4 CJSW=1.8E-10 UO=550 VTO=1.022 CGSO=1.3E-10
+ CGD0=1.3E-10 NSUB=4E15 NFS=1E10
+ VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 THETA=0.06 KAPPA=0.4 ETA=0.14
.MODEL PMOS PMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.3E-6 XJ=0.42E-6
+ CJ=7.7E-4 CJSW=5.4E-10 UO=180 VTO=-1.046 CGSO=4E-10
+ CGDO=1.3E-10 TPG=-1 NSUB=7E15 NFS=1E10
+ VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 ETA=0.06 THETA=0.03 KAPPA=0.4
.OPTIONS NODE
.TRAN .05N 6N
.PRINT TRAN V(3) V(4) V(5)
 .END
```

Figure 4.2: Spice input file geinput.spc

```
USERS SIMULATION OF GEINPUT
VIN 3 0 PWL(0 0 1N 5 3N 5 4N 0 6N 0)
VDD 1 0 DC 5
VSS 2 0 DC 0
CL1 4 0 0.01P
CL2 5 0 0.01P
* Circuit extracted from geinput.cif by Extrem
X1
+ 2 1 5 4 3
+GEINPUT
.SUBCKT GEINPUT
+ 2 3 1 4 5
C1 57 0 6.2E-14
R$1 1 57 1.6E-07
C2L 6 0 6.2E-15
C2U 57 0 6.2E-15
R89 52 86 1.0E-06
                         L=3U W=22U
M3 87 42 86 999 PMOS
R90 87 55 1.0E-06
R91 55 41 1.0E-06
C93L 41 0 4.0E-15
C93U 88 0 4.0E-15
R93 41 88 2.4E+00
M4 89 43 88 999 PMOS
                         L=3U W=22U
R94 89 53 1.0E-06
R95 53 51 1.0E-06
C96L 41 0 1.0E-15
C96U 56 0 1.0E-15
R96 41 56 1.1E+01
VPBULK 999 0 5
 .ENDS GEINPUT
 .MODEL NMOS NMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.16E-6 XJ=0.14E-6
+ CJ=1.6E-4 CJSW=1.8E-10 U0=550 VT0=1.022 CGSO=1.3E-10
 + CGD0=1.3E-10 NSUB=4E15 NFS=1E10
 + VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 THETA=0.06 KAPPA=0.4 ETA=0.14
 .MODEL PMOS PMOS LEVEL=3 RSH=0 TOX=275E-10 LD=0.3E-6 XJ=0.42E-6
+ CJ=7.7E-4 CJSW=5.4E-10 UO=180 VTO=-1.046 CGSO=4E-10
 + CGDO=1.3E-10 TPG=-1 NSUB=7E15 NFS=1E10
 + VMAX=12E4 PB=0.7 MJ=0.5 MJSW=0.3 ETA=0.06 THETA=0.03 KAPPA=0.4
 .OPTIONS NODE
 .TRAN .05N 6N
 .PRINT TRAN V(3) V(4) V(5)
 .END
```

Figure 4.3: A part of Sirprice input file geinput.spr



Figure 4.4: Layout of GEINPUT

cuit extraction and netlist generation amounted to 5% of the total run time for GEINPUT and 0.5% for GEBINC. Extractor efficiency is usually evaluated in terms of the number of transistors extracted per second. Because Extrem extracts segments rather than devices, efficiency must be measured in segments per second. The extraction rate was 59 segments/s for GEINPUT and 81 segments/s for GEBINC. These results compare favorably with those reported elsewhere [Gupta 1982].

The run time of Sirprice is greater than that of Spice, because of the additional reliability calculations which are performed. The larger data structures also lead to more page faults in a virtual memory system. The time difference is O(n), where n is the number of nodes. As the simulation time is  $O(n^2)$ , this difference becomes negligibly small for large circuits.

Fig. 4.8 shows how the Datatrieve interface was used to optimize the interconnect reliability of GEBINC. First, segment 79 was found to have the highest failure rate at a specific time (arbitrarily chosen as 20 years). The failure rate of this segment was  $7.775 \times 10^{-3} FIT$ . All segments having a failure rate greater than  $5.0 \times 10^{-3}$  FIT were then found. The 11 segments thus identified had a total failure rate of  $8.1 \times 10^{-2} FIT$ , which represents



Figure 4.5: Failure rate vs. time for GEINPUT

81% of the total failure rate for all 351 segments. When their center coordinates were listed, these were found to be clustered along a line between points (66,31.5) and (85.5,31.5). On examination of the layout with a mask editor it was found that the 11 segments belonged to a single conductor, which had a width of  $3\mu m$ . This width was increased to  $4\mu m$  and the analysis repeated. The failure rate after 20 years was then  $4.5 \times 10^{-3} FIT$ , representing a twentyfold improvement for a mere 0.6% increase in area.

### 4.3 Limitations

The following limitations of Reliant have been identified.

1. The probabilistic method of determining current flow direction sometimes produces transistors with the length and width exchanged. This occurs when the channel width is much larger than the source or drain width. A solution would be to use conventional circuit extraction techniques to identify the transistors.



Figure 4.6: Layout of GEBINC

- 2. Initially, Spice DC convergence problems (pivot element < PIVTOL) were experienced with GEBINC. This was cured by placing a lower limit of  $1\Omega$  on the value of any extracted resistor. Transient convergence problems were eliminated by ramping  $V_{DD}$ .
- 3. Because of the fine grain of the segmentation algorithm, even simple layouts result in extracted circuits containing several hundred nodes. For the two examples shown, there are 20 25 extracted nodes per transistor. The use of SPICE to simulate these circuits limits the application of Sirprice to VLSI cells containing no more than a few hundred transistors. As an experiment, nodal capacitances were omitted. This resulted in a 20% improvement in speed but produced failure rates which differed by orders of magnitude from the results described above. An alternative solution would be to compact the nodes within a conductor branch, by redistributing the nodal capacitances to the branch ends and combining the branch resistors in series. A third possibility is to make use of event-driven simulation techniques to speed up the reliability analysis. This option is discussed in the



Figure 4.7: Failure rate vs. time for GEBINC

following paragraph.

# 4.4 Extending the VLSI Capability of Reliant

Because the time required to solve a set of n circuit equations is  $O(n^2)$ , the behavior of logic circuits containing more than a few hundred elements is usually analyzed using event-driven simulation techniques. In an event-driven simulator, the nodal voltages and impedance levels are discretized to represent a number of pre-defined logic states. An event occurs whenever the state of a node changes. The effect of the event on other nodes is given by Boolean expressions defining the logic functions of the circuit and time delays associated with each state change. When an event occurs, the next state of all nodes is evaluated and a list of pending events is compiled for nodes whose state will change. The time delays are computed and the pending events are scheduled by means of a queue.

Two kinds of event-driven simulator are in general use: the logic simu-

```
DTR> READY SEGMENTS READ
DTR> FIND ALL SEGMENTS
[351 records found]
DTR> FIND SEGMENTS WITH LAMBDA20=MAX(LAMBDA20)
[1 record found]
DTR> SELECT
DTR> PRINT ID, SEGMENTTYPE, XMIN, YMIN, LAMBDA20
    ID
                                     YMIN
                                                 LAMBDA20
           SEGMENTTYPE
                        XMIN
    79
              STEP
                          10650
                                      3000
                                                 7.7754E-03
DTR> FIND ALL SEGMENTS WITH LAMBDA20>5E-3
[11 records found]
DTR> SELECT
DTR> PRINT ALL LAMBDA20
 LAMBDA20
 7.5505E-03
 7.5667E-03
 7.6158E-03
 6.0731E-03
 7.6322E-03
 7.6815E-03
 6.1255E-03
 7.6980E-03
 7.7201E-03
 7.7366E-03
 7.7754E-03
DTR> PRINT ALL SEGMENTTYPE, (XMIN+XMAX)/2, (YMIN+YMAX)/2
SEGMENTTYPE
 STEP
                   6600.000
                                    3150.000
 STEP
                   6900.000
                                    3150.000
 STEP
                   7800.000
                                    3150.000
 STRAIGHT
                    7350.000
                                    3150.000
                                    3150.000
 STEP
                    8100.000
 STEP
                    9000.000
                                     3150.000
 STRAIGHT
                    8550.000
                                      3150.000
                   9300.000
 STEP
                                     3150.000
 STEP
                   9700.000
                                    3150.000
 STEP
                  10000.000
                                    3150.000
 STEP
                   10700.000
                                    3150.000
```

Figure 4.8: Identification of highly-stressed segments using Datatrieve

DTR>

lator and the switch-level simulator. In the former, the circuit is modeled as a collection of modules, each having a primitive Boolean function, e.g. AND, OR. As each Boolean function may correspond to a circuit containing several transistors, there is not a close correspondence between the logic simulator model of circuit function and the topology of the circuit.

### 4.4.1 Switch-level Simulation Techniques

The switch-level simulator represents each MOS transistor as a switch whose state (open, closed) is determined by the state of a controlling node (0,1) [Bryant 1984], [Hayes 1984]. Therefore, no abstractions are made about the network topology, only about the transistor model. In most switch-level simulators, the conductance of the transistor in the on state is also modeled. This parameter is sometimes referred to as the strength of the transistor and is represented by a set of discrete values. The state of a node is determined by an ordering of the strengths of the transistors connected to the node. In Terman's RNL simulator, a semi-analog approach is followed. The total resistances between a given node,  $V_{DD}$  and Ground are determined and a voltage divider method is used to determine the voltage on the node. A threshold function is applied to this voltage to determine the logic state of the node. Nodal capacitances to ground are used together with the transistor conductance values to compute the delay at each node, by means of a single time constant RC model. The variation in channel resistance with  $V_{ds}$  is modeled by defining static and dynamic resistors for each transistor [Terman 1985].

# 4.4.2 A Method for Estimating Interconnect Reliability using a Switch-level Simulator

Switch-level simulators do not provide the user with explicit voltage or current waveform information, although an approximate nodal voltage value or waveform may be implicitly assumed in the methods of next state and delay determination. The possibility of extracting sufficient information about branch current waveforms to estimate interconnect reliability is now considered. It is assumed that circuit extraction has been performed on the layout, and the equivalent circuit contains the following elements:

- MOS transistors defined by linear static and dynamic resistances;
- parasitic interconnect resistances;
- a capacitance from each node to ground.

The steady-state behavior of the network is derived from a linear network consisting of transistor static resistances and parasitic interconnect resistances. The steady-state nodal voltages after each event may be determined by one of the following methods.

- If all resistive networks are proper trees to  $V_{DD}$  or Ground, equivalent resistances to each of the global nodes may be determined by a tree search algorithm. This is the method used in the RNL simulator.
- For general resistive networks, the circuit equations must be solved.
  However, this could be done much faster than in Spice because the
  networks are linear and only those parts of a network affected by an
  event need be analyzed.

With all node voltages known, the steady-state currents in the interconnect resistors may be determined.

The transient behavior of the network is derived from a linear network consisting of transistor dynamic resistances, parasitic interconnect resistances and a capacitor  $C_i$  to ground from each node i. Transient currents are the result of charging or discharging of the nodal capacitances. The damage function expression in equation (2.29) may be written as follows:

$$f(t) = B \int_0^t J_{eff}(j(\tau)) d\tau \tag{4.1}$$

where B is a constant. Therefore

$$f(t) = B \int_0^t \sinh(\Psi j(\tau)) d\tau$$
 (4.2)

$$= B \int_0^t \frac{1}{\Psi} \left( \Psi j(\tau) + (\Psi j(\tau))^2 + \cdots \right) d\tau \tag{4.3}$$

$$= B\left[\frac{Q}{A} + \frac{\Psi}{A^2 2!} \int_0^t i^2(\tau) d\tau + \cdots\right]$$
 (4.4)

(4.5)

where A is the cross-sectional area of the conductor and Q is the total charge through the conductor. Therefore as a first approximation

$$f(t) \approx \frac{BQ}{A} \tag{4.6}$$

This approximation is accurate if  $j(t) \ll 1/\Psi = 5.0 \times 10^5 A/cm^2$ . When current densities are below this threshold (a reasonable assumption for logic interconnects in MOS circuits) it is only necessary to predict the



Figure 4.9: Charge movement in a MOS circuit

total charge movement accurately. The precise current waveform is not important. The charge movement into node i is

$$Q_i = C_i \left( V_i^{\infty} - V_i^0 \right) \tag{4.7}$$

where  $V_i^0$  and  $V_i^\infty$  are the initial and final voltages on node i, respectively. If the resistive network is a proper tree, the charge movements may be determined by searching the tree and accumulating the charges from the leaf nodes to the root. This is illustrated by the example in Fig. 4.9.

Nets such as  $V_{DD}$  and Ground are considered global if the voltage at every point in the net is independent of the current flowing into or out of the net. Once the static and dynamic currents in the logic networks have been determined, the equivalent circuits of the global nets may be analyzed with the logic currents modeled as current sources (see Fig. 4.10).



Figure 4.10: Current source model of power bus

### Chapter 5

### REVIEW

This thesis has demonstrated the feasibility of predicting the reliability of VLSI interconnects during the design phase. Models for failure rate of conductors based on a lognormal distribution of lifetime have been developed and a methodology for determining the failure rate of complex interconnect patterns presented. The intrinsic lifetime of IC interconnects may be determined in this manner. The implementation of this methodology in a software tool for reliability analysis has been described and the functioning of the tool has been demonstrated for two VLSI leaf cells. It has been shown that reliability analysis may be achieved concurrently with layout verification by simulating an extracted equivalent circuit, and that the overhead for reliability prediction is small. However, the use of a circuit simulator limits the size of circuits which may be analyzed to a few hundred transistors at most. A method of overcoming this limitation has been proposed. This method uses an event-driven switch-level simulator to model the logic interconnects and global nets are accommodated using a simple currentsource model.

The role of local and global variations in conductor width due to process disturbances (e.g. spot defects, over/underetching) has not been considered in this thesis. A worst-case analysis of global variations may be made by preprocessing the mask description file to include maximum overetching of conductors. Spot defects are related to the incidence of early failures and the inclusion of these effects necessitates the use of a different lifetime distribution to model infant mortality. It should be noted that the methodology proposed here places no restriction on the distribution used. A paper (co-authored by the present author) describing the distribution of early EM failures is currently in the review process and a draft copy is included in Appendix F. This aspect needs further study, for example, to develop efficient means of parameter estimation for the early-failure distribution.

In this regard, the relationship between yield failures due to spot defects and early reliability failures should be investigated. These phenomena are obviously closely related, and the high cost of lifetime testing makes the characterization of the early lifetime distribution from yield data an extremely attractive possibility.

In it's present form, Reliant provides a good indication of the intrinsic lifetime and relative reliability of conductor segments, and should prove a useful design tool in an industrial environment, for identifying highly stressed areas of an interconnect layout. Historically, minimum feature sizes and failure rates measured in industry have shown a steady decrease, while circuit complexity has increased dramatically. These results would appear to contradict the conclusions reached in this thesis that decreasing linewidths and increasing complexity must lead to higher failure rates. It must be borne in mind that currently, Reliant only models the wearout portion of the bathtub curve. The measured decrease in failure rates is ascribable to a reduction in defect density and concomitant decrease in infant mortality failures. The inclusion of local and global process variations into Reliant would produce a bathtub-shaped failure rate prediction, with results more in line with those measured in industry.

The Reliant program has the potential to be extended in three directions. For MOS VLSI circuits, the switch-level simulation techniques presented in Chapter 4 may be implemented. This extension could probably be based on an existing simulator such as RNL, but would definitely require some internal modifications to this tool. To address the bipolar area, new device extraction algorithms must be written. Because of the high current densities existing in ECL circuits, this may prove to be the main area of application for Reliant. Finally, a long term goal should be the inclusion of all significant failure modes to provide an overall reliability prediction.

### Chapter 6

### REFERENCES

[Agarwala 1970] B. N. Agarwala, M.T. Attardo, A.P. Ingraham, "Dependence of Electromigration-induced Failure Time on Length and Width of Aluminum Thin Film Conductors", J. App. Phys., vol. 41, 1970, pp 3954 - 3960.

[Anolick 1980] E.S. Anolick, G.R. Nelson, "Low Field Time-dependent Dielectric Integrity", IEEE Trans. Reliability, vol. R-29, Aug. 1980, pp 217 - 221.

[Baglee 1986-2] D.A. Baglee, "Reliability of Trench Capacitors for VLSI Memories", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 215 - 219.

[Baglee 1986-1] D.A. Baglee, "Oxide Reliability in VLSI Technology", Tutorial notes: 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 6-1 to 6-17.

[Barke 1985] E. Barke, "Resistance Calculation from Mask Artwork Data by Finite Element Method", 22nd Design Automation Conf., New York: IEEE, 1985, pp 305-311.

[Black 1968] J.R. Black, "Electromigration Failure Modes in Aluminum Metallization for Semiconductor Devices", Proc. IEEE, vol. 57, 1969, p 1587.

[Black 1974] J.R. Black, "Physics of Electromigration", Proc. 12th Annual Reliab. Phys. Symp., IEEE, 1974, pp 142 - 149.

[Blair 1970] J.C. Blair, P.B. Ghate, C.T. Haywood, "Electromigration induced Failures in Aluminum Film Conductors", Appl. Phys. Lett., vol. 17, 1970, p 281.

[Blech 1976] I.A. Blech, "Electromigration in Thin Aluminum Films on Titanium Nitride", J. App. Phys., vol. 47, Apr. 1976, pp 1203 - 1208.

[Blech 1975] I.A. Blech, E.Kinsbron, "Electromigration in Thin Gold Films on Molybdenum Surfaces", Thin Solid Films, vol. 25, 1975, p 327.

[Bobbio 1974] A. Bobbio, A. Ferro, O. Saracco, "Electromigration Failure in Al Thin Films under Constant and Reversed DC Powering", IEEE Trans. Reliability, vol. R-23, no. 3, Aug. 1974, pp 194 - 201.

[Brown 1986] R.L. Brown, "Multiple Storage Quad Trees: A Simpler Faster Alternative To Bisector List Quad Trees", IEEE Trans. Computer-aided Design, vol. CAD-5, no. 3, July 1986, pp 413 - 419.

[Bryant 1984] R.E. Bryant, "A Switch-Level Model and Simulator for MOS Digital Systems", IEEE Trans. Computers, vol. C-33, no. 2, Feb. 1984, pp 160 - 177.

[d'Heurle 1970] F.M. d'Heurle, I. Ames, "Electromigration in Single-crystal Aluminum Films", Appl. Phys. Lett., vol. 16, p 80.

[d'Heurle 1978-1] F.M. d'Heurle, P.S. Ho, "Electromigration in Thin Films", Thin Films - Interdiffusion and Reactions (Ed: Poate, Tu & Mayer), Wiley- Interscience, 1978, p 244.

[d'Heurle 1978-2] ibid, p 248.

[David 1970] H.A. David, Order Statistics, 2nd Ed., John Wiley & Sons, 1970, p 22.

[DEC 1981] VAX-11 Datatrieve Call Interface Manual, Merrimack, NH: Digital Equipment Corp., 1981.

[Duvvury 1986] C. Duvvury, R.A. McPhee, D.A. Baglee, R.N. Rountree, "ESD Protection Reliability in  $1\mu m$  CMOS Technology", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 199 - 205.

[Eitan 1981] B. Eitan, D. Frohman-Bentchkowsky, "Hot-electron Injection in n- Channel MOS Devices", IEEE Trans. Electron Dev., vol. 28, Mar. 1981, pp 328 - 340.

[English 1972] A.T. English, K.L. Tai, P.A. Turner, "Electromigration in Conductor Stripes under Pulsed DC Powering", Appl. Phys. Lett., vol. 21, no. 8, Oct. 1972, pp 397 - 398.

[English 1983] A.T. English, E. Kinsbron, "Electromigration Transport Mobility Associated with Pulsed Direct Current in Fine-grained Evaporated Al - 0.5%Cu Thin Films", J. App. Phys., vol. 54, Jan. 1983, pp 275 - 280.

[Finkel 1974] R.A. Finkel, J.L. Bentley, "Quad Trees: A Data Structure for Retrieval on Composite Keys", Acta Informatica, vol. 4, 1974, pp 1 - 9.

[Frost 1987] D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol. R-36, June 1987, pp 234 - 242.

[Gardner 1987] D.S. Gardner, J.D. Meindl, K.C. Saraswat, "Interconnection and Electromigration Scaling Theory", IEEE Trans. Electron Devices, vol. ED-34, no. 3, March 1987, pp 633 - 643.

[GE 1986] Macrocell Library 3.1, Document #RTP600R01, Research Triangle Park, NC:1986, General Electric Company, pp 7.4.22 - 7.4.24.

[Gupta 1982] A. Gupta, R.W. Hon, Two papers on Circuit Extraction, Research Report CMU-CS-82-147, Pittsburgh: Dept. of Computer Science, Carnegie-Mellon University, 1982, pp 1 - 10.

[Hall 1986] J.E. Hall, D.E. Hocevar, P. Yang, M.J. McGraw, "SPIDER: A CAD System for Checking Current Density and Voltage Drop in VLSI Metallization Patterns", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Harrison 1988] J.W. Harrison, private communication.

[Hayes 1984] J.P. Hayes, "Fault Modeling for Digital MOS Integrated Circuits", IEEE Trans. Computer-aided Design, vol. CAD-3, no. 3, July 1984, pp 200 - 207.

[Hohol 1986] T.S. Hohol, L.A. Glasser, "RELIC: A Reliability Simulator for Integrated Circuits", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Horowitz 1983] M. Horowitz, R.W. Dutton, "Resistance Extraction from Mask Layout Data", IEEE Trans. Computer-aided Design, vol. CAD-2, no. 3, July 1983, pp 145-150.

[Huntingdon 1961] H.B. Huntington, A.R. Grone, "Current Induced Marker Motion in Gold Wires", J. Phys. Chem. Solids, vol. 20, 1961, p 76.

[Ishiuchi 1986] H. Ishiuchi, T. Watanabe, T. Tanaka, K. Kishi, M. Ishikawa, N. Goto, K. Kohyama, H. Noji, O.Ozawa, "Soft Error Rate Reduction in Dynamic Memory with Trench Capacitor Cell", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 235 - 238.

[Iyer 1984] S.S. Iyer, C-Y. Ting, "Electromigration Studies of Submicrometer Linewidth Al-Cu Conductors", IEEE Trans. Electron Dev., vol. 31, 1984, pp 1468 -1472.

[Johnson 1970] N.L. Johnson, S. Katz, Continuous Univariate Distributions: 1, Houghton-Mifflin, 1970, p 253.

[Kemp 1988] K.G. Kemp, K.F. Poole and D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

[Kinsbron 1978] E. Kinsbron, C.M. Melliar-Smith, A.T. English, T. Chynoweth, "Failure of Small Thin Film Conductors due to High Current-density Pulses", 16th Ann. Reliab. Phys. Symp., New York: IEEE, 1978, pp 248 - 254.

[Kinsbron 1980] E. Kinsbron, "A Model for the Width Dependence of Electromigration Lifetimes in Aluminum Thin Film Stripes", App. Phys. Lett., vol. 36, 1980, pp 968 - 970.

[LaCombe 1986] D.J. LaCombe, E.L. Parks, "The Distribution of Electromigration Failures", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 1 - 6.

[Lycoudes 1980] N.E. Lycoudes, C.C. Childers, "Semiconductor Instability Failure Mechanisms", IEEE Trans. Reliability, vol. R-29, Aug. 1980, pp 237 - 249.

[Maly 1985] W. Maly, "Modeling of Lithography Related Yield Losses for CAD of VLSI Circuits", IEEE Trans. Computer-aided Design, vol. CAD-4, no. 3, July 1985, pp 166 - 177.

[McCreight 1980] E.M. McCreight, "Efficient Algorithms for Enumerating Intersecting Intervals and Rectangles", Research Report CSL-80-9, Palo Alto, CA: Xerox PARC, 1980.

[McPherson 1986] J.W. McPherson, "Stress-dependent Activation Energy", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 12-18.

[Mead 1981] C. Mead and L. Conway (Eds.), Introduction to VLSI Systems, Reading, Mass.: Addison-Wesley, 1980.

[Miller 1978] R.J. Miller, "Electromigration Failure under Pulse Test Conditions", 16th Ann. Reliab. Phys. Symp., New York: IEEE, 1978, pp 241 - 247.

[Nikawa 1981] K. Nikawa, "Monte Carlo Calculations Based on the Generalized Electromigration Failure Model", 19th Int. Reliab. Phys. Symp., New York: IEEE, 1981, pp 175 - 181.

[O'Connor 1981] P.D.T. O'Connor, Practical Reliability Engineering, Heyden, 1981, p3.

[Ogura 1981] S. Ogura, P.J. Tsang, W.W. Walker, D.L. Critchlow, J.F. Shepard, "Elimination of Hot Electron Gate Current by the Lightly Doped Drain-source Structure", IEDM Tech. Digest, 1981, pp 651 - 654.

[Partridge 1985] J. Partridge, G. Littlefield, "Aluminum Electromigration Parameters", 23rd Int. Reliab. Phys. Symp., New York: IEEE, 1985, p 119.

[Prokop 1972] G.S. Prokop, R.R. Joseph, "Electromigration Failure at Aluminum-Silicon Contacts", J.Appl. Phys., vol. 43, no. 6, June 1972, pp

2595 - 2602.

[Razdan 1986] R. Razdan, A.J. Strojwas, "A Statistical Design Rule Developer", IEEE Trans. Computer-aided Design, vol. CAD-5, no. 4, Oct. 1986, pp 508 - 520.

[Rosenberg 1985] J.B. Rosenberg, "Geographical Data Structures Compared: A Study of Data Structures Supporting Region Queries", IEEE Trans. Computer- aided Design, vol. CAD-4, no. 1, Jan. 1985, pp 53 - 67.

[Sabnis 1986] A.G. Sabnis, "Hot Carrier Damage Mechanisms", Tutorial notes: 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986, pp 1.1 - 1.21.

[Sai-Halasz 1982] G. Sai-Halasz, M.R. Wordeman, R.H. Dennard, "Alpha-particle-induced Soft Error Rate in VLSI Circuits", IEEE Trans. Electron Dev., vol. 29, Apr 1982, pp 725 - 731.

[Schoen 1980] J.M Schoen, "A Model of Electromigration Failure under Pulsed Condition", J. Appl. Phys., vol. 51, no. 1 Jan. 1980, pp 508 - 512.

[Scoggan 1975] G.A. Scoggan, B.N. Agarwala, P.P. Peressini, A. Brouillard, "Width Dependence of Electromigration Life in Al-Cu, Al-Cu-Si and Ag Conductors", Proc. 13th Int. Reliab. Phys. Symp., IEEE, 1975, pp 151 - 158.

[Stapper 1983] C.H. Stapper, "Modeling of Integrated Circuit Defect Densities", IBM J. Res. Develop., vol. 27, Nov. 1983, pp 549 - 557.

[Sze 1981] S.M. Sze, Physics of Semiconductor Devices, 2nd Edition, Wiley-Interscience, 1981, p 378. [Takeda 1982] E. Takeda, et al, "Sub-

micrometer MOSFET Structure for Minimizing Hot-carrier Generation", IEEE J. Solid State Circuits, vol. 17, Apr. 1982, pp 241 - 247.

[Tasch 1978] A.F. Tasch, Jr., P.K. Chatterjee, H-S. Fu, T.C. Holloway, "The HI-C RAM Cell Concept", IEEE Trans. Electron. Dev., vol. ED-25, no. 1, Jan. 1978, pp 33 - 41.

[Terman 1985] C. Terman, RNL 4.2 User's Guide, Seattle, WA: UW/NW VLSI Consortium, Sieg Hall, FR-35, University of Washington, 1985,

pp 2 - 9.

[Towner 1983] J.M. Towner, E.P. van de Ven, "Aluminum Electromigration under Pulsed DC Conditions", 21st Int. Reliab. Phys. Symp., New York: IEEE, 1983, pp 36 - 39.

[Ullman 1984] J.D. Ullman, Computational Aspects of VLSI, Rockville, MD: Computer Science Press, 1984, pp 382 - 393.

[Vaidya 1980] S. Vaidya, T.T. Sheng, A.K. Sinha, "Linewidth Dependence of Electromigration in Evaporated Al-0.5%Cu", App. Phys. Lett., vol. 36, 1980, pp 464 - 466.

[Vladimirescu 1981] A. Vladimirescu, K. Zhang, A.R. Netwon, D.O. Pederson, A. Sangiovanni-Vincentelli, SPICE User's Guide, Berkeley, CA: Dept. of Electrical Engineering and Computer Sciences, University of California, 1981.

[Woods 1984] M.H. Woods, "The Implications of Scaling on VLSI Reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp 6-1 to 6-30. endlist

# Appendix A

# DERIVATION OF THE FAILURE RATE OF A SERIES-CONNECTED SYSTEM

$$h_s(t) = \frac{p_s(t)}{1 - P_s(t)} \tag{A.1}$$

$$= \frac{\frac{d}{dt}P_s}{1 - P_s(t)} \tag{A.2}$$

$$= \frac{\frac{d}{dt} \left(1 - \prod_{i=1}^{n} \left[1 - P_i(t)\right]\right)}{\prod_{j=1}^{n} \left[1 - P_j(t)\right]}$$
(A.3)

$$= \frac{\sum_{i=1}^{n} \left( \frac{d}{dt} P_i(t) \prod_{k=1}^{i-1} \left[ 1 - P_k(t) \right] \prod_{l=i+1}^{n} \left[ 1 - P_l(t) \right] \right)}{\prod_{i=1}^{n} \left[ 1 - P_i(t) \right]}$$
(A.4)

$$= \sum_{i=1}^{n} \frac{\frac{d}{di} P_i(t)}{[1 - P_i(t)]} \tag{A.5}$$

$$= \sum_{i=1}^{n} h_i(t) \tag{A.6}$$

# Appendix B

# FITTING A LOGNORMAL DISTRIBUTION TO THE MINIMUM ORDER STATISTIC

It will be shown that the minimum order statistic predicts approximately lognormal behavior for early failures. Consider a conductor consisting of n identical segments, each having parameters  $t_{50}$  and  $\sigma$ . From equations (2.6), (2.8) and (2.10), the failure rate of the conductor based on a series model is

$$h_s(t) \approx \frac{n}{\sqrt{2\pi}\sigma t} \exp\left(\frac{1}{2} \left[\frac{\ln(t) - \ln(t_{50})}{\sigma}\right]^2\right)$$
 (B.1)

Writing

$$x = \ln(t) \tag{B.2}$$

$$\mu = \ln(t_{50}) \tag{B.3}$$

$$G_s(x) = \ln(h_s(\exp(x)))$$
 (B.4)

yields a second order polynomial in x:

$$G_s(x) = \ln(n) - \ln(\sqrt{2\pi}\sigma) - x - 0.5 \left[\frac{\mu - x}{\sigma}\right]^2$$
 (B.5)

The derivative of  $G_s(x)$  is

$$\frac{d}{dx}G_s = \frac{\mu - x}{\sigma^2} - 1\tag{B.6}$$

Alternatively, the conductor may be modeled as a single element described by a lognormal distribution with parameters  $t'_{50}$  and  $\sigma'$ :

$$h'(t) \approx \frac{1}{\sqrt{2\pi}\sigma't} \exp\left(\frac{1}{2} \left[\frac{\ln(t) - \ln(t'_{50})}{\sigma'}\right]^2\right)$$
 (B.7)

Writing

$$x = \ln(t) \tag{B.8}$$

$$\mu' = \ln(t'_{50}) \tag{B.9}$$

$$G'(x) = \ln(h'(\exp(x)))$$
 (B.10)

we obtain:

$$G'(x) = -\ln(\sqrt{2\pi}\sigma') - x - 0.5 \left[\frac{\mu' - x}{\sigma'}\right]^2$$
 (B.11)

The derivative of G'(x) is

$$\frac{d}{dx}G'(x) = \frac{\mu' - x}{\sigma'^2} - 1$$
 (B.12)

G'(x) may be fitted to  $G_s(x)$  in the vicinity of the point a by setting

$$G'(x) = G_s(x) \tag{B.13}$$

and

$$\frac{d}{dx}G'(x) = \frac{d}{dx}G_s(x) \tag{B.14}$$

with x = a.  $\sigma_s$  Is determined by solving the resultant transcendental equation

$$2\ln\left(\frac{n\sigma'}{\sigma}\right) = \left[\frac{\mu - a}{\sigma}\right]^2 \left[1 - \left(\frac{\sigma'}{\sigma}\right)^2\right]$$
 (B.15)

With  $\sigma'$  known,  $\mu'$  may be determined as

$$\mu' = a + \frac{\sigma'}{\sigma}(\mu - a) \tag{B.16}$$

The median time to failure is

$$t_{50}' = \exp(\mu') \tag{B.17}$$

The results of the curve fit appear in Fig. 2.5. The normalized variation of  $t'_{50}$  and  $\sigma'$  with n appear in Figs. 2.6 and 2.7 respectively.

# Appendix C

# VARIATION OF $t_{50}$ AND $\sigma$ WITH n

The Pdf of a series connected system of n identical elements is

$$P_s(t) = 1 - (1 - P(t))^n$$
 (C.1)

Let  $t_{50s}$  be the Mtf and  $\sigma_s$  the Standard Deviation of the system lifetime. If  $t = t_{50s}$ , then  $P_s(t) = 0.5$ . Equation (C.1) may be solved for the corresponding value of P(t) as follows:

$$P(t_{50s}) = 1 - (1 - P_s(t_{50s}))^{\frac{1}{n}}$$
 (C.2)

$$= 1 - (0.5)^{\frac{1}{n}} \tag{C.3}$$

Using equation (C.3), we may determine  $t_{50s}$  for a known distribution P(t). The normalized Mtf is shown as a function of n in Fig. C.1.

Determining  $\sigma_s$  of the minimum order statistic is complex for the lognormal distribution. The following approximate method was used. The prototype normal distribution P(x), where  $x = \ln(t)$  was approximated by a 3-parameter Weibull distribution, with c = 3.288 [Johnson 1970]:

$$P(x) = 1 - \exp\left(-\left[\frac{x-\beta}{\alpha}\right]^{c}\right) \tag{C.4}$$

Substituting (C.4) in (C.1) yields the Cdf of the system, which is also a Weibull distribution:

$$P_s(x) = 1 - \exp\left(-\left[\frac{x-\beta}{\alpha'}\right]^c\right)$$
 (C.5)

where

$$\alpha' = \alpha n^{-\frac{1}{c}} \tag{C.6}$$



Figure C.1:  $t_{50s}/t_{50}$  vs. n

The standard deviation of  $P_s(x)$  is

$$\sigma_s = \alpha n^{-\frac{1}{c}} \left[ \Gamma(2c^{-1} + 1) - (\Gamma(c^{-1} + 1))^2 \right]^{-\frac{1}{2}}$$
 (C.7)

and therefore

$$\frac{\sigma_s}{\sigma} = n^{-\frac{1}{c}}$$

$$= n^{-0.304}$$
(C.8)

Fig. C.2 shows the variation in  $\sigma_s/\sigma$  as a function of n.



Figure C.2:  $\sigma_s/\sigma$  vs. n

# Appendix D

# RELIABILITY ANALYSIS OF A 3100 GATE CMOS STANDARD CELL DEVICE

# D.1 Approximate Models of Contact, Via and Step Segments

It is assumed that the failure rates of these segment types have the same dependency on time, current density and temperature as a Run segment. In all three cases, a conductor crosses a beveled discontinuity. It is assumed that the Aluminum has been deposited to a uniform height d in the vertical direction. If the height of the discontinuity is h and the bevel angle  $\Theta$ , then the metal thickness on the sidewall is

$$d_o = d\cos(\Theta) \tag{D.1}$$

The length of the sidewall is

$$L_o = \frac{h}{\sin(\Theta)} \tag{D.2}$$

Only the sidewall is considered to contribute to the failure rate. Steps, Vias and Contacts are therefore represented as Runs having length =  $L_o$  and thickness =  $d_o$ .



Figure D.1: Layout of  $V_{DD}$  and Ground buses

# D.2 Derivation of The Failure Rate of a Power Bus Section

The  $V_{DD}$  and Ground bus structures are shown in Fig. D.1. Gates are assumed to be uniformly distibuted along each branch of the bus, a distance  $L_g$  apart. Each gate draws a current I from the  $V_{DD}$  bus and feeds it into the Ground bus. Numbering the bus segments from 1 (furthest from the bonding pad) to n, the current in the j-th segment is jI.

It is assumed that  $J \leq 5.0 \times 10^5 A/cm^2$ , therefore  $J_{eff} \approx J$ . From equation (2.17), the Mtf of the j-th segment is

$$t_{50j} = \frac{\gamma}{jI} \tag{D.3}$$

Where

$$\gamma = GWd \exp(\frac{E_a}{kT}) \tag{D.4}$$

The failure rate of a bus branch is

$$h_B(t) = \sum_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-0.5 \left[\frac{\ln(t) - \ln(\frac{\gamma}{jI})}{\sigma}\right]^2\right)$$
 (D.5)

It is possible to derive a closed form expression for  $h_B(t)$ . Let  $x = \ln(t)$ . The failure rate of the j-th segment may be written as follows:

$$h_{j}(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5x^{2} + x\ln(\gamma) - x\ln(jI) - 0.5(\ln(\gamma) - \ln(jI))^{2}])$$
(D.6)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5x^2 + x\ln(\gamma) - 0.5\ln^2(\gamma) + \ln(jI)(\ln(\gamma) - x - 0.5\ln(jI))])$$
 (D.7)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(\sigma^{-2}[-0.5(\frac{\ln(t) - \ln(\gamma)}{\sigma})^{2}] \times \exp([\frac{\ln(\gamma) - \ln(t) - 0.5\ln(jI)}{\sigma}]^{2}\ln(jI))$$
(D.8)

$$= \frac{1}{\sqrt{2\pi}\sigma t} \exp(-0.5\left[\frac{\ln(t) - \ln(\gamma)}{\sigma}\right]^2) \times \exp(\Phi(jI)\ln(jI))$$
 (D.9)

where

$$\Phi(jI) = \frac{\ln\left(\frac{\gamma}{t\sqrt{jI}}\right)}{\sigma^2} \tag{D.10}$$

Therefore

$$h_B(t) = \frac{1}{\sqrt{2\pi}\sigma t} \exp\left(-0.5 \left[\frac{\ln(t) - \ln(\gamma)}{\sigma}\right]^2\right) \sum_{j=1}^n (jI)^{\Phi(jI)} \quad (D.11)$$
$$= C(t) \sum_{j=1}^n (jI)^{\Phi(jI)} \quad (D.12)$$

The summation may be approximated by an integral for large n, with j a continuous variable:

$$h_B(t) \approx C(t) \int_0^n (jI)^{\Phi(jI)} dj$$
 (D.13)

$$= \frac{C(t)\sigma^2 n}{\ln(\frac{\gamma}{nIt})} \exp\left(\frac{\ln(nI)\ln(\frac{\gamma}{\sqrt{nIt}})}{\sigma^2}\right)$$
 (D.14)

# Appendix E

# A TEST CHIP FOR CALIBRATION OF RELIABILITY MODELS

A test chip for calibrating the reliability models used in this thesis was designed. This device contains the following test structures:

- 1. METAL1 OVER POLY BARS:  $32\mu m \times 250$
- 2. METAL1 OVER POLY BARS:  $16\mu m \times 250$
- 3. METAL1 OVER POLY BARS:  $8\mu m \times 250$
- 4. METAL1 OVER POLY BARS:  $4\mu m \times 250$
- 5. METAL1 OVER POLY BARS:  $2\mu m \times 250$
- 6. 52 VIAS:  $8\mu m \times 8\mu m$
- 7. 44 VIAS:  $4\mu m \times 16\mu m$
- 8. 66 VIAS:  $4\mu m \times 8\mu m$
- 9. 82 VIAS:  $4\mu m \times 4\mu m$
- 10. 76 VIAS:  $2\mu m \times 8\mu m$
- 11. 100 VIAS:  $2\mu m \times 4\mu m$
- 12. 110 VIAS:  $2\mu m \times 2\mu m$
- 13. 152 CONTACTS:  $2\mu m \times 2\mu m$

- 14. 64 CONTACTS:  $8\mu m \times 8\mu m$
- 15. 64 CONTACTS:  $4\mu m \times 16\mu m$
- 16. 88 CONTACTS:  $4\mu m \times 8\mu m$
- 17. 106 CONTACTS:  $4\mu m \times 4\mu m$
- 18. 106 CONTACTS:  $2\mu m \times 8\mu m$
- 19. 134 CONTACTS:  $2\mu m \times 4\mu m$
- 20. RUNS (M2):  $4\mu m \times 1000 \mu m$
- 21. RUNS (M2):  $8\mu m \times 1000 \mu m$
- 22. RUNS (M2):  $16\mu m \times 1000 \mu m$
- 23. RUNS (M2):  $32\mu m \times 1000\mu m$
- 24. RUNS (M1):  $2\mu m \times 1000 \mu m$
- 25. RUNS (M1):  $4\mu m \times 1000 \mu m$
- 26. RUNS (M1):  $8\mu m \times 1000 \mu m$
- 27. RUNS (M1):  $16\mu m \times 1000 \mu m$
- 28. RUNS (M1):  $32\mu m \times 1000\mu m$

The layout of the test chip appears in Fig. E.1.





# Appendix F

# PUBLICATIONS BY THE AUTHOR WHICH RELATE TO THIS THESIS

- D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Device Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol. R-36, June 1987, pp 234 242.
- D.F. Frost, K.F. Poole, D.A. Haeussler, "Reliant: a Reliability Analysis Tool for VLSI Interconnects", Custom Integrated Circuits Conf., New York: IEEE, 1988, pp 27.8.1 27.8.4.
  - The authors were invited to submit this paper, in an extended form, for publication in IEEE Journal of Solid State Circuits. This paper has recently been accepted for publication.
- D.F.Frost, K.F. Poole, "Estimation of VLSI Interconnect Reliability using a Circuit Simulator", Southeastern Symposium on Systems Theory (SSST-87), New York: IEEE, 1987, 5 pages.
- K.G. Kemp, K.F. Poole, D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

Copies of these papers appear on the following pages.

# A Method for Predicting VLSI-Device Reliability Using Series Models for Failure Mechanisms

David F. Frost, Member IEEE Clemson University, Clemson Kelvin F. Poole, Member IEEE Clemson University, Clemson

Key Words—Order statistic, VLSI device, Failure mechanism.

Reader Aids-

Purpose: Advance the state-of-the-art Special math needed for explanations: Statistics Special math needed to use results: None

Results useful to: IC design engineers, CAD tool developers.

Abstract—A series model is used to determine the intrinsic reliability of an integrated circuit. An analysis of electromigration in the interconnect system of a 200 000 transistor VLSI device, shows that the failure rate exceeds 10 FIT (failures per 10<sup>9</sup> hours) within 2 years when operating at a temperature of 80° C. These results indicate the importance of fundamental wear-out mechanisms as factors in VLSI device reliability, under usual operating conditions. The analysis, as applied to a generic chip, predicts that temperature, burn-in, and complexity all adversely affect the device reliability.

The paper demonstrates the feasibility of using the information available in the design database together with specific failure models to predict (during the design phase) the reliability of an IC. These techniques can be used to develop a CAD tool for reliability prediction.

#### 1. INTRODUCTION

The reliability of an integrated circuit (IC) is the probability that it will perform its required function under stated conditions for a stated period of time [1]. Methods of enhancing the reliability of ICs generally fall into one of three categories:

- Improving the reliability of the part by better design of its internal components and/or better manufacturing methods;
  - Using more effective screening procedures;
- Using active or standby redundancy within the IC, enabling it to perform its function despite the failure of some internal components.

The reliability of an IC is traditionally pictured in terms of the bath-tub curve of failure rate versus time (figure 1). Burn-in is used to remove devices that contain gross built-in flaws which normally fail during the infant-mortality phase. Physical mechanisms which cause device failure are often modeled by an Arrhenius relationship:





Fig. 1. Qualitative description of failure rate versus time.

Notation

 $\lambda(T)$  failure rate at temperature T

T temperature

 $T_R$  reference temperature

 $E_a$  activation energy for the particular failure

mechanism

k Boltzmann's constant

Operation at an elevated temperature increases the failure rate, thus accelerating the passage of the infant-mortality phase.

The reliability of devices after screening is currently predicted using semi-empirical failure rate models based on the measured lifetimes of a large number of devices. By far the most widely-used of these models is Mil-Hdbk-217D [2]:

$$\lambda = \Pi_Q \left[ C_1 \Pi_T \Pi_V + (C_2 + C_3) \Pi_E \right] \Pi_L$$
(failures per 10<sup>6</sup> hrs.) (1.2)

Notation

 $\Pi_{\mathbf{Q}}$  quality factor dependent on burn-in procedure applied

 $\Pi_T$  temperature acceleration factor

 $\Pi_V$  voltage derating stress factor

 $\Pi_E$  application environment factor

 $\Pi_L$  learning factor

 $C_1, C_2, C_3$  complexity failure-rates dependent on the number of equivalent gates, number of pins, and package type.

Values for these constants are tabulated [2] for various devices and technologies. Data for this model are constantly under revision to accommodate new technologies, and proposals for improvements to the model appear from time to time [3 - 8].

Correct application of Mil-Hdbk-217 requires an understanding of the underlying assumptions and inherent limitations of this type of model [9], which was developed to answer the system design engineer's need to predict the reliability of a system containing a large number of components, including ICs. The individual IC is treated strictly as a single component and the model does not relate the failure rate to its specific internal structure (eg, mask layout) or process parameters. The mask layout is reflected indirectly in the complexity factor  $C_1$ , and all constants in the equation have been determined as average values for a specific family of fabrication processes (eg, CMOS). The IC designer requires a model which can predict the reliability-implications of structural design decisions such as scaling the dimensions of a transistor or using narrower conductors. The Mil-Hdbk-217 model is obviously of no value in making such predictions. Also, its ability to predict the reliability of current and future VLSI devices is open to suspicion. For example, consider commercial grade MOS VLSI devices manufactured in a mature technology, operating at room temperature in a ground, benign environment. Application of (1.2) yields the results shown in table I.

Table I
Failure Rate vs. Gate Count for MOS Devices
According to Mil-Hdbk-217D.

| G (gate count)                                                           | λ<br>(Fit)   |
|--------------------------------------------------------------------------|--------------|
| 10 <sup>3</sup><br>10 <sup>4</sup><br>10 <sup>5</sup><br>10 <sup>6</sup> | 1268<br>3134 |
| 105                                                                      | 8501         |
| 10°                                                                      | 23960        |

1 Fit = 1 failure/10<sup>9</sup> hours

These results are unrealistic, as they imply that a typical 16-bit microprocessor ( $10^4$  gates) in a personal computer application would have a failure rate of 3134 FIT, which is unacceptably high. The authors of the model are aware of this and a new VLSI reliability model for G > 3000 is currently under development [10].

This paper shows that by taking wear-out failure into account, it is possible to analyze a VLSI design and to provide an accurate assessment (during the design phase) of the wear-out limited reliability of the IC. A series model for calculating the reliability of an IC, is presented. Examples of the application of this method to electromigration

in complex VLSI interconnects are given. Our results show that electromigration wear-out produces an unacceptably high failure rate for VLSI as we approach the 10<sup>6</sup>-transistor chip. The trends to increased complexity and concomitant reduction of feature size accelerate wear-out and could ultimately limit the useful life of the component.

### 2. A SERIES MODEL OF IC RELIABILITY

#### Assumptions

- 1. The IC mask produces many basic elements that are not identically distributed.
- 2. The distributions of life for each element are known.
- 3. The failure of any element causes the IC to fail (series system).
- 4. The states of the elements (good, failed) are mutually statistically independent.

#### Notation

 $P_i(t)$  Cdf of element i

 $\lambda_i(t)$  failure rate of element i

 $\lambda_s(t)$  failure rate of the IC ("system")

n number of elements in the IC

 $F_1(t)$  Cdf corresponding to  $\lambda_s(t)$ 

The device failure-time Cdf is the well-known series formula:

$$F_1(t) = 1 - \prod_{i=1}^{n} (1 - P_i(t)). \tag{2.1}$$

The device failure rate can be determined using the fact that the failure rate of a series system equals the sum of the failure rates of its elements [21]:

$$\lambda_s(t) = \sum_{i=1}^n \lambda_i(t). \tag{2.2}$$

For 
$$\lambda_1(t) = \lambda_2(t) = \dots = \lambda_n(t) = \lambda(t)$$
, eq. (2.2) reduces to:

$$\lambda_s(t) = n \,\lambda(t) \tag{2.3}$$

The series model, which is also a minimum-order-statistic model, can be applied at the chip level, but it also describes the relationship of an individual failure mechanism to the dimensions of the structure in which it occurs.

The physical processes which cause wear-out have been widely studied [11-20]. These include:

- oxide shorts [12-13];
- metallization failure due to electromigration or corrosion [14-16];
- threshold-voltage shifting effects in MOS devices
   [17-19];
  - · alpha-particle induced soft errors [20].

In general, failure mechanisms are reactions which cause the directed movement of a physical quantity such as material or charge. For electromigration, a void is formed in a conductor and catastrophic failure occurs when the cross-sectional area of the defect equals the cross-sectional area of the conductor. Consider the conductor as consisting of n "identical" elements connected in series, each potentially containing a defect. The conductor fails when any one element fails, and so the reliability of the conductor is that of the minimum order statistic of the n elements. A similar situation accrues for oxide breakdown, where a defect develops through the oxide layer, leading to a catastrophic short circuit when the length of the defect is equal to the oxide thickness. A large dielectric area can be considered as a parallel connection of n small elements, with each element potentially enclosing a defect. The reliability of the n elements is once again given by the reliability of the minimum order statistic.

A defect occurs in an IC when a failure mechanism has proceeded for a sufficient time (the time to failure) to degrade the circuit performance beyond acceptable limits. Defects fall into two categories:

- Structural defects. These represent abrupt changes in circuit topology caused for example by a conductor becoming an open-circuit.
- Performance defects. Some failure mechanisms produce a continuous degradation of circuit performance until some threshold or acceptability is exceeded. This type of defect can be produced by hot electron injection, where the threshold voltage of an MOS transistor shifts with time until circuit operation becomes marginal.

From the perspective of modelling the IC time to failure, the two defect types may be treated in the same way.

#### 3. ELECTROMIGRATION

Electromigration in metal conductors has been widely studied during the last 15 years [14-16, 22-31]. When an electron current passes through a conductor, some of the momentum of the electrons is transferred to the metal atoms, resulting in a movement of metal in the direction of electron flow. When a flux divergence occurs, the rates of mass transport towards and away from a point differ and void or hillock formation results. Flux divergences may be caused by microscopic inhomogenities in the conductor such as grain boundary triple points or grain size variations. The bulk of the published literature deals with straight conductors and it is assumed that the incidence of electromigration is related to grain-size effects. Flux divergence may also arise in more complex conductor patterns because of variations in the effective conductor width or thickness — current crowding occurs on the inside of a 90° bend, for example. The analysis at high current densities is further complicated by local thermal gradients which influence the rate of mass transport.

### 3.1 Classification of Conductor Shapes

In applying the model to electromigration, each conductor is fractured into its component elements and the conductor failure rate is calculated as the sum of the failure rates of each individual element. Four basic element types are identified; see figure 2:

- straight segments of length L
- 90° bends
- steps caused by thenon-planar surface beneath the conductor
  - · contact windows or vias.



Fig. 2. Basic conductor shapes found in integrated circuit interconnects: a) straignt section, b) 90° bend, 3) step over a surface discontinuity, d) contact window.

This analysis assumes that the states of all elements are statistically independent. This places two restrictions on the validity of the model.

- 1. It is limited to low current densities ( $< 10^6 A/cm^2$ ) where thermal effects are negligible.
- 2. The minimum element length must be greater than the length of the locality which influences the growth of a single defect. Assuming that voids grow along grain boundaries [22], the mean size of a defect is of the order of the mean grain size, typically less the 3  $\mu m$ . La Combe & Parks [23] found that a hillock always forms within 10  $\mu m$  of a void, indicating that interactions may occur over this distance. However, their measurements were conducted at a current density of  $2 \times 10^6 \, A/cm^2$  and under those conditions thermal interaction would have played a role.

Early failures are associated with highly localized defects while late failures may involve mass transport over larger distances. In the remainder of this section, a basic element length of  $10 \, \mu \mathrm{m}$  is used and the results are valid for the first 10% of all failures.

### 3.2 Analysis of Straight Segments

Consider a straight conductor element having length  $L_E = 10 \,\mu m$  and variable width W. Measurements of electromigration time to failure show a lognormal pdf of the form:

$$P_E(t) = \frac{1}{\sigma\sqrt{2\pi} t} \exp\left[-\frac{1}{2}\left(\frac{\ln t - \ln t_{50}}{\sigma}\right)^2\right], \quad (3.1)$$

 $t_{50} \equiv \text{median time to failure}$  $\sigma \equiv \text{standard deviation of ln time, which is independent of time.}$ 

The median time to failure is a function of current density, temperature, and conductor width:

$$t_{50} = A(W) J^{-N} \exp\left[\frac{E_a}{kT}\right], \tag{3.2}$$

 $J \equiv \text{current density}$ 

 $E_{\alpha} \equiv \text{activation energy} = 0.54 \ eV [24]$ 

 $N \equiv$  an exponent approximately equal to 1 for

 $J < 4 \times 10^5 \, A/cm^2$  [25]

A(w) = a material constant that is function of width.

Based on the experimental data of Kinsbron [26] the following empirical expression for A(W) was derived for Al - 0.5 wt % Cu conductor, 250  $\mu m$  long and 5000 Å thick:

$$A(W) = 189(21.5W - 66 + 250W^{-1.7})$$
 (3.3)



Fig. 3. Variation of median time to failure (mtf) of n elements as a function of n (normalized to the mtf of a single element). Parameter  $\sigma$  is the standard deviation of a single element.

Electromigration lifetime is generally measured on conductors several hundred microns long, as these fail quicker than short conductors. The Kinsbron data were scaled to a  $10 \ \mu m$  long conductor by solving (2.1) for  $F_1(t) = 0.5$  and n = 250/10 = 25. The scaling ratio depends on  $\sigma$  (see

figure 3) which is in turn a function of W. J is written in terms of width and thickness, as follows:





Fig. 4. Partitioning of a simple conductor joining two contact windows into its component shapes.

$$J = I \times 10^8 / Wd$$

 $I \equiv \text{current } (A);$ 

 $d = \text{thickness } (\mu m).$ 

The median time to failure of the basic conductor element is therefore

$$t_{50}(W, d, I, T) = 1.523 \times 10^{-5} \frac{Wd}{I} \left(W - 3.07 + \frac{11.63}{W^{1.7}}\right) \exp\left(\frac{5800}{T}\right)$$
 (3.4)

Kinsbron also measured  $\sigma$  as a function of width. Scaling of this parameter to shorter conductor lengths may also be achieved using the *series* model, but this is complex for the lognormal distribution. A much simpler scaling procedure is possible by noting that  $\sigma$  is the standard deviation of a s-normal distribution P(x), where  $x = \ln(t)$ . This s-normal

distribution may be closely approximated by the following 3-parameter Weibull distribution, with c=3.288 [27]:

$$P(x) = 1 - \exp[-\{(x + \xi_0)/\alpha\}^c]. \tag{3.5}$$

Substituting (3.5) in (2.1), yields the Cdf of the device failure time which is also a Weibull distribution:

$$F_1(t) = 1 - \exp\left[-\left\{\frac{[x - \xi_0]}{\alpha'}\right\}^{c'}\right], \qquad (3.6)$$

$$\alpha' \; \equiv \; \frac{\alpha}{n^{1/c}} \;\; .$$

The standard deviation of  $F_1(t)$  is:

$$\sigma_n = \frac{\alpha}{n^{1/c}} \left[ \Gamma(2c^{-1} + 1) - \left[ \Gamma(c^{-1} + 1) \right]^2 \right]^{-1/2}, \tag{3.7}$$

and therefore

$$\frac{\sigma_n}{\sigma} = n^{-1/c} = n^{-0.304}. (3.8)$$

The Kinsbron data were scaled using (3.8) and the following function was fitted to the result:

$$\sigma(W) = \frac{2.192}{W^{2.625}} + 0.787 \tag{3.9}$$

Let the failure rate of the basic element be  $\lambda_E(t, W, d, I, T)$ . If the analysis is limited to the first 10% of failures  $(P_E(t) \le 0.1)$ , the following approximation may be made, with an error of less than 10%:

$$\lambda_E(t, W, d, I, T) = P_E(t) = \frac{1}{\sigma(W)\sqrt{2\pi}t}$$

$$\cdot \exp\left[-\frac{1}{2}\left(\frac{\ln t - \ln t_{50}(W, d, I, T)}{\sigma(W)}\right)^{2}\right]. \quad (3.10)$$

The failure rate of a conductor of length L is simply—

$$\lambda(t, W, d, I, T) = \frac{L}{L_E} \lambda_E(t, W, d, I, T)$$

$$= n_E \lambda_E(t, W, d, I, T) \tag{3.11}$$

 $n_E = number of basic elements.$ 

### 3.3 Analysis of Bends, Steps and Windows

In expanding the analysis to include bends, oxide steps, and contact windows it is assumed that the failure rates of these elements have the same dependency on t, I, and T as does the basic element. Each feature x is modelled as having the same reliability as a straight conductor of length  $L_x$ , thickness  $d_x$ , and width  $W_x$ .

Refer to figure 4; the failure rate of a simple conductor which conveys a current *I* between two contact windows is:

$$\lambda_c = \frac{2L_W}{L_E} \lambda_E(t, W_W, d_W, I, T) + n_b \frac{L_b}{L_E} \lambda_E(t, W_b, d, I, T)$$

$$+ \frac{n_o L_o}{L_E} \lambda_E(t, W, d_o, I, T) + \frac{L}{L_E} \lambda_E(t, W, d, I, T).$$
(3.12)

where

 $L_W$ ,  $W_W$ ,  $d_W$  are the equivalent length, width, and thicknesses of the contact windows, respectively;

 $n_b$ ,  $L_b$ ,  $W_b$  are the number of bends and equivalent length and width of each bend, respectively;

 $n_o$ ,  $L_o$ ,  $d_o$  are the number of oxide steps, and equivalent length and thickness of each step, respectively.

A number of researchers have investigated the effect of width and length on the reliability of straight conductors having widths of 1  $\mu m$  or more. However, relatively little reliability data is available for submicron conductors on the one hand and corners, steps and contact windows on the other. As linewidths decrease below 1  $\mu m$ , random variations in linewidth (caused by photolithographic defects or poorly controlled etching processes) will play an increasingly important role in conductor reliability. This issue has not been addressed in the literature and no correlation between reliability and defect density has yet been reported. We are working on a theoretical model for electromigration in a randomly defected conductor.

Returning to bends, steps, and contact windows, the factors which can affect the reliability of these elements include:

- accelerated failure at grain boundaries due to locally increased flux density
  - · flux divergence due to current crowding
  - stress build-up in the passivation layer over a step
- contact electromigration at Al-Si interfaces (in the case of contact windows).

All these effects can be taken into account in models for these segements, which are then included in (3.12). Because of a lack of published experimental data, only the effect of average flux density on grain-boundary migration is considered. The average path length around a 90° bend was taken as 0.785 W, while the average width is 1.12 W. These values were used for  $L_c$  and  $W_c$  respectively.

A simple, first-order model was also used for the case of a conductor running over a beveled step-discontinuity. The height of the step is h and the bevel angle is  $\theta$ . Assuming that the metal was deposited with a uniform height d in

the vertical direction, the thickness on the sidewall of the step is:

$$d_o = d\cos\theta. \tag{3.13}$$

The length of the stepped section is:

$$L_o = \frac{h}{\sin \theta} \,. \tag{3.14}$$

Therefore the step is considered as equivalent to a straight section of length  $L_o$  and thickness  $d_o$ . This model is used to describe the reliability of conductors running over oxide steps and contact window connections.

The IC interconnect reliability can be predicted based on the mask layout as described in a mask-descriptionlanguage file and the predicted branch currents as determined by circuit simulator. Such a program is under development as an adjunct to an existing suite of IC design tools.



Fig. 5. Mask layout of a section of a 3200 gate standard cell circuit, whose reliability is considered here.

#### 4. EXAMPLES

The following is an approximate analysis of an actual IC layout on which some conductor characteristics were measured. The circuit was a 3100-gate CMOS standard-cell design, with double-layer metallization and a minimum feature size of 2  $\mu m$  (figure 5). The I/0 interfaces were not

considered in the calculation. Parameters of the two interconnect layers are listed in table II. Conductor width was 5  $\mu m$  on the (upper) Metal-2 layer and 3  $\mu m$  and 4  $\mu m$  on the (lower) Metal-1 layer, except for the power supply and ground buses. The latter are in the form of interdigitated tree structures with differing widths in the trunk, branch, and twig sections (figure 6a). It was assumed, roughly that all logic interconnects carry the same current I. This does not apply to power and ground buses, where current levels would increase steadily across the length of a branch or twig. The failure rate of a power bus section is derived in the appendix.



Fig. 6. a) Power supply and ground buses of the standard cell device, showing the interdigitated tree-like structure.

b) Simple model of a bus segment.

The circuit was analyzed by applying (3.11) to each conductor segement and equation (A-7) to each branch of the power supply and ground buses. The overall failure rate is shown in figure 7, as a function of time. If we take a failure rate of 10 FIT as an acceptable maximum level [32], we can see that even at a chip temperature of 100° C, the interconnect system of this device is highly reliable. The 10 FIT is not exceeded for more than 30 years, compared to usual product lifetimes of 5-10 years. A breakdown of the failure rate data shows that the power and ground buses contributed roughly 60% of the total, followed by the contact windows with 30%. Straight conductors comprise less than 2% of the total, despite the fact that they contribute 84% of the total conductor length. Interesting as these

TABLE II
Specifications of Circuit Interconnect Layers

| Layer                       | Width | Length (µm) | Number<br>of<br>Corners | Number<br>of<br>Steps | Number<br>of Vias/<br>Contacts | Thickness<br>(Å) |
|-----------------------------|-------|-------------|-------------------------|-----------------------|--------------------------------|------------------|
| Metal I                     | 4     | 367800      | 4590                    | 55296                 | 36018<br>Contacts              | 6000             |
|                             | 3     | 656640      | 2349                    | 0                     | 7884 Vias                      |                  |
| Metal 2                     | 5     | 1107500     | 5580                    | 62137                 | 16935<br>Vias                  | 8000             |
| Gate &                      |       |             |                         |                       |                                |                  |
| Interconnect<br>Polysilicon |       |             |                         |                       |                                | 3500             |
| Interlevel<br>Oxides        |       |             |                         |                       |                                | 6000             |



Fig. 7. Failure rate  $\lambda$  (in FITS) as a function of time for the interconnect system of the 3200 gate standard cell device.

results are, they depend strongly on the layout of the IC and do not apply to all devices.

Having described the reliability analysis of an actual IC of moderate complexity, we conclude by extending our analysis to a complex generic VLSI circuit. The parameters for this circuit are summarized in table III, and the results are shown in figure 8. The interconnect system now reaches a failure rate of 10 FIT within two years for chip temperatures above 80 ° C. Comparison with the previous

example shows that increases in complexity and decreases in linewidths lead inevitably to a degradation of reliability through conductor wear-out. This wear-out process cannot be avoided by rigorous screening procedures: A high temperature burn-in, in fact, accelerates the wear-out. Figure 8 shows how a 168-hour burn-in at 175 °C reduces the time to reach a 10-FIT failure rate at 60 °C from seven to five years. The traditional maximum permissible current density of  $10^5 \ A/cm^2$  is not exceeded anywhere in the device.

TABLE III
Specifications for the Generic VLSI Circuit

| Number of transistors   | 200 000 |
|-------------------------|---------|
| Minimum conductor width | 1.5 μm  |
| Total conductor length  | 10m     |
| Number of bends         | 50 000  |
| Number of steps         | 470 000 |
| Number of contacts      | 600 000 |
| Number of vias          | 397 000 |
| Gate current            | 3E-6 A  |



Fig. 8. Failure rate  $\lambda$  (in FITS) as a function of time for the 200 000 gate generic VSLI device.

Our results are based on experimental data available in the literature. In some cases, data are lacking and first-order models were used instead. The complete characterization of all important features of an IC design, in a form suitable for inclusion in an overall reliability prediction program, still lies in the future.

#### 5. CONCLUSIONS

- 1. At the VLSI level of integration, wear-out mechanisms dominate and the failure rate is not a constant, contrary to the assumptions in Mil-Hdbk-217D model which is based on the flat part of the bathtub curve. A seriesd model for failure, and making use of the best available models describing wear-out mechanisms, is therefore a better approach to determining the reliability of a VLSI device.
- 2. The example of a generic chip shows a general method of analyzing interconnects and the approach is technology-independent. This approach provides information to the designer as to how the reliability of a component is affected by factors under his control. For example, if vias are the major contributors to the high failure rate in a particular design, the failure rate could be reduced by a different layout containing fewer vias. This type of analysis will therefore be an important factor in assessing future designs.
- 3. The results for a 200 000-transistor device show that wear-out is a problem for future VLSI devices even when only one wear-out mechanism (electromigration) is considered. Lifetime of less than 2 years for devices to reach a failure rate of 10 FIT are not acceptable.
- 4. More physical data and more accurate models for the wear-out mechanisms are required to improve the accuracy of the calculation and hence the assurance which can be placed in the reliability predictions. Large sample sizes must be evaluated and particular emphasis must be placed on fitting distributions to the early failures.
- 5. We have demonstrated a technique which provides valuable information to designers as to the reliability of a VLSI circuit. The approach is well suited to integration with an existing set of design tools as most of the data required resides in the design data base. The remaining data are obtained from usual process monitoring and they must be incorporated in a reliability database for the manufacturing process.
- 6. A reliability analysis program will be essential for the development of automated IC layout software, such as routers and silicon compilers.

# APPENDIX: Derivation of the Failure Rate

of a Power-Bus Section

Figure 6 b shows a simplified model of a power-bus section with width  $W_B$ . Gates are assumed to be uniformly distributed along the bus, with a distance  $L_g$  between adjacent gate connections. Each gate draws a current I from a power bus, and feeds a current I into a ground bus. The current in the first bus segment (furthest from the bonding pad) is I, in the second 2I, in the third 3I, etc. Numbering the bus segments from 1 to n, the current in segment i is  $i \cdot I$ . The failure rate of bus-i segment may be written in terms of the basic-element failure rate as:

$$\lambda_{i} = \frac{L_{g}}{L_{E}} \lambda_{E}(t, W_{B}, d, iI, T)$$

$$= \frac{L_{g} 1}{L_{E} \sigma \sqrt{2\pi} t} \exp \left[ -\frac{1}{2} \left( \frac{\ln t - \ln t_{50}}{\sigma} \right)^{2} \right] \quad (A-1)$$

Let  $x \equiv \ln t$ ,  $\mu \equiv \ln t_{50}$ :

The median time to failure from (3.2) is:

$$t_{50} = \alpha (iI)^{-N} \tag{A-2}$$

$$\alpha = A(W)(Wd)^N \exp\left[\frac{E_a}{kT}\right]$$
 (A-3)

Therefore  $\mu = \ln[\alpha(iI)^{-N}]$ . Substituting into (A-1) we obtain—

$$\lambda_{i} = \frac{1}{\sigma\sqrt{2\pi}} \exp\left[-\frac{x^{2}}{2\sigma^{2}} - x + \frac{x\ln(\alpha)}{\sigma^{2}} - \frac{\ln^{2}(\alpha)}{2\sigma^{2}}\right]$$

$$\cdot \exp\left[\left(\frac{N\ln(\alpha)}{\sigma^{2}} - \frac{Nx}{\sigma^{2}} - \frac{N^{2}\ln(iI)}{\sigma_{2}}\right)\ln(iI)\right]$$

$$\cdot \exp \left[ \frac{-\ln^2(t) - 2\sigma^2 \ln(t) + 2\ln(t)\ln(\alpha) - \ln^2(\alpha)}{2\sigma^2} \right]$$

$$(iI)^{\gamma(iI)}$$
 (A-4)

$$=C(t)(iI)^{\gamma(iI)} \tag{A-5}$$

$$\gamma(iI) = \frac{2N\ln(\alpha) - 2Nx - N^2\ln(iI)}{2\alpha^2}$$

$$= \frac{N}{\sigma^2} \ln \left\{ \frac{\alpha (iI)^{-N/2}}{t} \right\}. \tag{A-6}$$

For n bus segments in series, the total failure rate is—

$$\lambda_B = C(t) \sum_{i=1}^n (iI)^{\gamma(iI)}. \tag{A-7}$$

For large values of n, this summation is time-consuming and may be approximated by an integral, with i as a continuous variable:

$$\lambda_B \approx C(t) \int_0^n (iI)^{\gamma(iI)} di$$

$$= \frac{C(t) \sigma^2 n}{N}$$
(A.8)

$$\cdot \frac{\exp\left[\frac{N}{\sigma^2}\ln(nI)\ln\left(\frac{\alpha(nI)^{-N/2}}{t}\right)\right]}{\ln\left(\frac{\alpha}{nIt}\right)}$$
(A-9)

## 6. ACKNOWLEDGMENT

The assistance of the General Electric Company, Research Triangle Park, North Carolina is gratefully acknowleged. We are pleased to thank the *Editors* and referees for their helpful comments.

#### REFERENCES

- [1] P. D. T. O'Connor, *Practical Reliability Engineering*, Heyden, 1981, p 3.
- [2] Mil-Hdbk-217D, Reliability Prediction of Electronic Equipment, Reliability Analysis Center, RADC, 1982.
- [3] J. E. Arsenault, D. C. Roberts, "MOS semi-conductor random access memory failure rate," *Microelectron. Reliab.*, vol 19, 1979, pp 81-88.
- [4] P. D. T. O'Connor, "Microelectronic system reliability prediction," IEEE Trans. Reliability, vol R-32, 1983 Apr, pp 9-13.
- [5] S. Palo, "Reliability prediction of micro-circuits," Microelectron. Reliab., vol 23, 1983, pp 283-296.
- [6] D. M. Pantic, "Maturity factors in predicting failure rate for linear integrated circuits," *IEEE Trans. Reliability*, vol R-33. 1984 Aug, pp 208-212.
- [7] H. C. Rickers, P. F. Manno, "Microprocessor and LSI microcircuit reliability — Prediction model," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 196-202.
- [8] P. Jaaskelainan, "LSI reliability prediction based on time," Microelectron. Reliab., vol 20, 1980, pp 351-356.
- [9] H. Goldberg, Extending the Limits of Reliability Theory, Wiley Interscience, 1981, pp 58-64.
- [10] W. K. Denson, private communication.
- [11] D. G. Edwards, "Testing for MOS IC failure modes," IEEE Trans. Reliability, vol R-31, 1982 Apr, pp 9-18.
- [12] D. S. Peck, C. H. Ziardt, "The reliability of semiconductor devices in the Bell System," Proc. IEEE, vol 62, 1974 Feb, pp 185-211.
- [13] E. S. Anolick, G. R. Nelson, "Low field time-dependant dielectric integrity," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 217-221.
- [14] F. M. d'Heurle, P. S. Ho. "Electromigration in thin films," Thin Films — Interdiffusion and Reactions (Ed: Poate, Tu & Mayer), Wiley Interscience, 1978, pp 243-304.
- [15] J. R. Black, "Electromigration of Al-Si alloy films," Proc. 16 Int. Reliab. Phys. Symp., IEEE, 1978, pp 233-240.
- [16] P. B. Ghate, "Electromigration-induced failures in VLSI interconnects," Proc. 20 Int. Reliab. Phys. Symp., IEEE, 1982, p 292.
- [17] N. E. Lycoudes, C. C. Childers, "Semiconductor instability failure mechanisms," *IEEE Trans. Reliability*, vol R-29, 1980 Aug, pp 237-249.
- [18] B. Eitan, D. Frohman-Bentchkowsky, "Hot-electron injection in n-channel MOS devices," *IEEE Trans. Electron Dev.*, vol 28, 1981 Mar, pp 328-340.
- [19] E. Takeda, et al., "Submicrometer MOSFET structure for minimizing hot-carrier generation," IEEE J. Solid State Circuits, vol 17, 1982 Apr, pp 241-247.
- [20] G. Sai-Halasz, M. R. Wordeman, R. H. Dennard, "Alpha-particle-induced soft error rate in VLSI circuits," *IEEE Trans. Electron Dev.*, vol 29, 1982 Apr, pp 725-731.

- [21] H. A. David, Order Statistics, 2nd Ed., John Wiley & Sons, 1970, p
- [22] B. N. Agarwala, M. T. Attardo, A. P. Ingraham, "Dependence of electromigration-induced failure time on length and width of aluminum thin films conductors," J. App. Phys., vol 41, 1970, pp 3954-3960.
- [23] D. J. LaCombe, E. L. Parks, "The distribution of electromigration failures," Proc. 24 Int. Reliab. Phys. Symp., IEEE, 1986, pp 1-6.
- [24] J. R. Black, "Physics of electromigration," Proc. 12 Annual Reliab. Phys. Symp., IEEE, 1974, pp 142-149.
- [25] J. W. McPherson, "Stress dependent activation energy," Proc. 24 Int. Reliab. Phys. Symp., IEEE, 1986, pp 12-18.
- [26] E. Kinsbron, "A model for the width dependence of electromigration lifetimes in aluminum thin-film stripes," App. Phys. Lett., vol 36, 1980, pp 968-970.
- [27] N. L. Johnson, S. Katz, Continuous Univariate Distributions 1, Houghton-Mifflin, 1970, p 253.
- [28] R. J. Miller, "Electromigration failure under pulse test conditions," Proc. 16 Int. Reliab. Phys. Symp., IEEE, 1978, pp 241-247.
- [29] G. A. Scoggan, B. N. Agarwala, P. P. Peressini, A. Brouillard, "Width dependence of eletromigration life in Al-Cu, Al-Cu-Si and Ag conductors," *Proc.* 13 Int. Reliab. Phys. Symp., IEEE, 1975, pp 151-158.
- [30] S. Vaidya, T. T. Sheng, A. K. Sinha, "Linewidth dependence of electromigration in evaporated Al-0.5% Cu.," App. Phys. Lett., vol 36, 1980, pp 464-466.
- [31] S. S. Iyer, C.-Y. Ting, "Electromigration studies of submicrometer linewidth Al-Cu conductors," *IEEE Trans. Electron Dev.*, vol 31, 1984, pp 1468-1472.
- [32] Workshop on Submicrometer Device Reliability, Clemson University, Clemson, 1985 November 6-7.

#### **AUTHORS**

David F. Frost; Department of Electrical & Computer Engineering; Clemson University; Clemson, South Carolina 29634-0915 USA.

David Frost (M'81) was born in Cape Town, South Africa on 1951 November 5. He holds the degrees of BSc from the University of Stellenbosch (1974) and MEng from the University of Pretoria (1979), both in Electrical Engineering. Since 1979 he has held a teaching position in the Department of Electrical and Electronics Engineering, University of Stellenbosch. He is on sabbatical leave at Clemson University, South Carolina. His research interests include the design, reliability and testing of integrated circuits.

Dr. Kelvin F. Poole; Department of Electrical & Computer Engineering; Clemson University; Clemson, South Carolina 29634-0915 USA.

Kelvin Poole (M'85) was born in Durban, South Africa on 1943 February 10. He holds the degrees of MSc from the University of Natal, Durban and PhD from the University of Manchester, UK, both in Electrical Engineering. Dr. Poole is an Associate Professor at Clemson University, South Carolina and is interested in the design of integrated circuits and VLSI reliability.

## RELIANT: A RELIABILITY ANALYSIS TOOL FOR VLSI INTERCONNECTS

David F Frost\*, Kelvin F Poole and David A Haeussler

E&CE Department Clemson University Clemson, SC 29634

#### **ABSTRACT**

RELIANT is a CAD tool which predicts the failure rate of integrated circuit conductors. A circuit layout, device models and electromigration process data are inputs to RELIANT. The interconnect patterns in a Caltech Intermediate Format (CIF) file are fractured into a number of characteristic segment types. An equivalent circuit is extracted and SPICE is used to determine the transient currents in each segment. Using parametric models for electromigration damage, the failure rate of the system is computed. RELIANT provides designers with feedback on the reliability hazards of a design. Results show the application of the tool to a standard cell CMOS component. For modelling large VLSI interconnect systems, the incorporation of a switch-level simulator is discussed.

### INTRODUCTION

The goal of RELIANT is to provide a prediction of interconnect reliability during the design phase, by modelling the effect of wearout due to electromigration. It has been shown that for the most important failure mechanisms, the shrinking of layout design rules accelerates the wearout process [Woods 1984]. Similarly, an increase in circuit complexity has a negative impact on overall reliability. It is therefore becoming increasingly important to predict the intrinsic (wearout-limited) lifetime of VLSI circuits, particularly in view of the high cost of traditional methods of reliability qualification by burn-in.

The failure rate of integrated circuits is usually described by means of the "bath-tub" curve of instantaneous failure rate vs. time, shown in Fig. 1. The onset of wearout may be quantified by establishing a criterion for the maximum allowable failure rate. A goal of 10 FIT (1 FIT = 10<sup>-9</sup> failures/hr.) has been proposed by the Semiconductor Research Corporation [SRC 1985].

This failure rate is attained in a time much less than the median time to failure  $(t_{50})$  for most conductors. Therefore,  $t_{50}$  predictions alone are not a reliable indicator of intrinsic lifetime. The rate of failure when time t <<  $t_{50}$  is strongly influenced by the shape and variance  $(\sigma)$  of the time-to-failure distribution. These parameters are in turn influenced by factors such as conductor dimensions [Kinsbron 1980] and the presence of defects [Kemp 1988]. The approach used in this paper is to determine the instantaneous failure rate as a function of time, using data about the physical form and dimensions of conductors and the electrical stress (i.e. current density) applied to them. This method yields an assessment of reliability, in contrast to previously reported work [Hall 1986] which used  $t_{50}$  as a criterion for adjusting conductor widths.

## FAILURE RATE



Fig. 1: Bathtub curve of failure rate vs. time.

<sup>\*</sup> Currently at Stellenbosch University, Stellenbosch, South Africa.

## OVERVIEW OF RELIANT

RELIANT predicts the instantaneous failure rate of the interconnect pattern as a function of time. The method used is based on the principle of fracturing the interconnect pattern into a number of statistically independent conductor segments. The assumption of statistical independence is valid when t << t50 [LaCombe 1986] and when the current density is low enough to avoid significant thermal interaction. This places an upper limit on current density of approximately 10<sup>6</sup> A/cm<sup>2</sup>. Five commonly-occurring segment types are identified:

- i) straight runs;
- ii) steps resulting from a discontinuity in the wafer surface (when a Metall conductor crosses a Polysilicon stripe, for example);
- iii) contact windows;
- iv) vias between Metall and Metal2;
- v) bonding pads.

These segment types are shown in Fig. 2. Each occurrence of a segment type is characterized by a set of physical parameters, such as length and width.

The relationship between  $t_{50}$ ,  $\sigma$  and physical dimensions of each segment type are determined experimentally, using test structures. When the layout is fractured, an equivalent circuit which reflects the physical topology of the interconnect pattern is extracted. A circuit simulator is then used to determine the transient current flowing in each segment. The instantaneous current density j(t) is given by the instantaneous current i(t), divided by the nominal cross-sectional area of the segment. This time-varying current density is reduced to a single effective value  $J_{eff}$ , which is used to determine  $t_{50}$  for each segment. The instantaneous failure rate is then determined. Assuming that no part of the interconnect pattern is redundant, a minimum order statistical method may be used to compute the failure rate of the interconnect system. This method has been described in a previous paper [Frost 1987].



Fig. 2: Segment types.



Fig. 3: Structure of RELIANT.

The structure of RELIANT is shown in Fig. 3. It consists of three main modules, EXTREM, COMBINE and SIRPRICE. EXTREM fractures the interconnect patterns contained in a CIF layout description file into segments. It produces a database containing a description of each segment (type and physical dimensions) and a SPICE-compatible netlist which includes all parasitic interconnect resistances and capacitances. COMBINE adds external generators and loads to the extracted netlist. SIRPRICE (which includes a modified version of SPICE2G.6) uses this netlist to simulate the current waveforms in each segment and compute the failure rate. A database query language interface enables the failure rate of specific segments, nets or modules to be investigated interactively.

## **EXTREM**

EXTREM fractures the interconnect pattern in three phases. During the first phase, the CIF layout file (\*.CIF) is parsed. Wires and Polygons are reduced to collections of Boxes and the hierarchy is flattened. In the second phase, the positions of all abutments, contact windows, vias and bonding pads are established. The dominant current flow direction in each box is determined and all steps orthogonal to this direction are identified. In the third phase, each box is scanned in the direction of current flow and the dimensions and type of each segment are computed. Straight runs link up segments of other types. A record for each segment is stored in a physical data base file (\*.DB1). Active devices are identified and a SPICE-compatible netlist, including a  $\pi$ -section RC network for each physical branch of the interconnect pattern, is generated (\* EXT).

#### **COMBINE**

The user defines device models, analysis specifications and external components such as voltage sources and load resistors, by means of the \*.SPC file. COMBINE adds these definitions to the extracted netlist. Labels on the bonding pads of the layout are used to establish links to the external node numbers.

#### SIRPRICE

SIRPRICE accepts the complete netlist produced by COMBINE (\*.SPR) and calls a modified version of SPICE 2G.6. This module performs a transient simulation of the extracted circuit and generates indexed files which contain the current-time data for each resistor corresponding to a branch in the interconnect pattern. The cross-sectional area of each segment is determined from the physical data base file and an effective current density is computed as follows:

$$J_{eff} = \frac{1}{\Gamma t} \int_{0}^{t} \frac{f}{\sinh(\Gamma j(t))} dt$$
 (1)

where  $\Gamma$  is a constant [McPherson 1986]. The median time to failure is given by the following equation:

$$t_{50} = \frac{G}{J_{eff}} Exp \frac{E_a}{kT}$$
 (2)

where  $E_a$  is the activation energy, k is Boltzmann's constant and T is absolute temperature. G is a factor dependent on the physical dimensions of the segment. G and  $\sigma$  are determined from the information in the physical data base file. The failure rate is then calculated, using a suitable failure distribution. Experimental results of electromigration testing show a lognormal dependence and hence this distribution is used in the example discussed in the next section.

SIRPRICE produces various output files: \*.LIS contains an ASCII listing of the failure rate of the whole circuit and the total for each segment type. \*.DB2 is a second database file which includes the reliability data. This provides an interface to other program modules. \*.DTR is an interface to a query language which may be used to interrogate the reliability data base.

## EXPERIMENTAL RESULTS

RELIANT was used to determine the failure rate of a CMOS one-bit counter. The layout of this circuit is shown in Fig. 4, and the results of the analysis appear in Fig. 5. An analysis of the results indicates that the steps are the major contributor to failure rate. Overall failure rates for this simple circuit containing twenty-six active devices are very low, 7x10-11 FIT after 20 years.

The effect of layout and complexity is seen by comparing the failure rates for the input stage of this counter with the results in Fig. 5. A failure rate of  $4.5 \times 10^{-17}$  FIT after 20 years is predicted for the four-transistor input circuit and the major contribution is due to contacts.



Fig. 4: Layout of a one-bit binary counter.



Fig. 5: Failure rate vs. time for the one-bit counter.

# RELIABILITY PREDICTION OF COMPLEX INTERCONNECT PATTERNS

The use of SPICE limits the application of RELIANT to VLSI cells. CAD tools for assessing the current densities in interconnect systems reported by other researchers [Hall 1986], [Hohol 1986] share this limitation. A major consideration in the development of RELIANT was that reliability analysis should, where possible, be performed concurrently with simulation for design verification purposes. To meet this objective in the case of VLSI circuits, a new method for determining the current data required to assess the electromigration damage has been investigated. This method consists of extracting approximate current waveforms from a switch-level simulation of a MOS circuit. Transistors are represented by linear static and dynamic resistances and linear capacitors. Interconnects are represented by resistors and capacitors. The analysis method is based on network functions of RC trees. After each transition, the resultant RC networks are analyzed to determine the final state and static currents in all branches. They are then analyzed to determine delay and transient current waveforms. The initial charge distribution is also taken into account. Preliminary results show that an effective current for assessing the electromigration damage, can be predicted.

#### **CONCLUDING REMARKS**

A CAD tool called RELIANT for predicting the failure rate of interconnect systems has been presented. RELIANT requires a circuit layout and experimentally determined values of t50 and  $\sigma$  for all conductor segment types.

Our results show that the reliability of interconnects depends on the specific details of the layout. The program can be used to optimize the layout for reliability, by indicating the features of the interconnect pattern which make the largest contribution to the total failure rate.

#### **REFERENCES**

[Frost 1987] D.F. Frost, K.F. Poole, "A Method for Predicting VLSI-Reliability using Series Models for Failure Mechanisms", IEEE Trans. Reliability, vol R-36, June 1987, pp 234 - 242.

[Hall 1986] J.E. Hall, D.E. Hocevar, P. Yang, M.J. McGraw, "SPIDER: A CAD System for Checking Current Density and Voltage Drop in VLSI Metallization Patterns", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Hohol 1986] T.S. Hohol, L.A. Glasser, "RELIC: A Reliability Simulator for Integrated Circuits", Proc. IEEE Int. Conf. on Computer-aided Design, Nov. 1986.

[Kemp 1988] K.G. Kemp, K.F. Poole and D.F. Frost, "Failure Rate Prediction for Defect Enhanced Electromigration Wearout of Metal Interconnects", submitted to IEEE Trans. Reliab., Jan. 1988.

[Kinsbron 1980] E. Kinsbron, "A Model for the Width Dependence of Electromigration Lifetimes in Aluminum Thin-film Stripes", App. Phys. Lett., vol 36, 1980, pp 968 - 970.

[LaCombe 1986] D.J. LaCombe, E.L. Parks, "The Distribution of Electromigration Failures", 24th Int. Reliab. Phys. Symp., New York: IEEE, 1986.

[SRC 1985] Workshop on Submicrometer Device Reliability, Clemson University, Clemson, Nov. 6-7, 1985.

[Woods 1984] M.H. Woods, "The Implications of Scaling on VLSI Reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp. 6-1 to 6-30.

#### **ACKNOWLEDGEMENTS**

The authors wish to acknowledge that this work is supported in part by the SRC under contract No. 87-MP-082.

The assistance of the General Electric Company, Research Triangle Park, North Carolina is gratefully acknowledged.

TO THE PROPERTY OF THE PROPERT

David F. Frost and Kelvin F. Poole

Department of Electrical and Computer Engineering Clemson University, Clemson, SC 29634-0915 (803)656-5925

#### Abstract

A quantitative model for the reliability of a system of interconnects is presented. A circuit simulator is used to accurately predict device corrents, from which the reliability of individual conductor segments is determined. The failure rate of the interconnect system is then calculated using a minimum order statistical approach. An example shows the application of this technique to a CMOS circuit.

#### Introduction

As the minimum feature size of VLSI devices steadily decreases, there is a corresponding decrease in the reliability of these devices. Scaling increases the failure rate associated with all the most important known failure mechanisms, such as electromigration, ESD, time-dependant oxide breakdown and hot-carrier effects. It is therefore increasingly important to develop accurate, quantitative models for these mechanisms, in order to optimize circuit designs for reliability. These models must be embedded in reliability analysis software which forms part of the regular suite of IC design tools.

In most digital circuit designs, a set of standard active devices is designed once and used repeatedly throughout the circuit. Immunity to device-related failure such as hot-electron effect is designed in at the device level. Interconnects are qualitatively different to devices, because the interconnect layout of each circuit is unique and must be individually optimized for reliability. Because of the complexity of interconnect patterns, a systems approach to reliability is adopted in this paper. This approach is based on the partitioning of the interconnect mask layout into it's component parts.

The most significant failure mode in Aluminum alloy VLSI interconnects is open-circuit failure due to current-induced electromigration. When current flows

through a conductor, electrons collide with metal ions, transferring some momentum to the latter. This causes a flux of metal ions in the direction of electron flow. Divergences in this flux arise at local inhomogeneities such as grain boundary triple points or vacancy sites. These divergences result in the formation of voids and hillocks in the metal of the conductor. When the cross-sectional area of a void equals that of the conductor, an open circuit failure results. Other forms of electromigration occur at the metalsemiconductor interfaces within contact windows.

traditionally Designers have electromigration damage by limiting the maximum current density to 10s A/cm2. There are two essential problems with this approach. Firstly, the downward pressure on design rules makes it increasingly difficult to maintain this conservative limit on current density. Secondly, observance of this rule does not enable the designer to quantitatively predict reliability, nor does it provide any insight into the design trade-off between reliability and area. Increasingly, circuit layout is performed by automated software such as routers and silicon tools compilers, using layout-optimizing cost functions. Quantitative models for failure mechanisms are essential if cost functions which include reliability are to be developed for a future generation of silicon compilers.

# An Order Statistical Model of Interconnect Reliability

Fig. 1 shows a typical conductor on the surface of a VLSI chip. A number of segments may be identified, such as contact windows to the underlying devices, straight runs of various lengths, 90° bends and steps over discontinuities in the underlying surface. These shapes are shown in more detail in Fig. 2. The majority of VLSI conductor patterns may be broken down into a set of basic segments such as these. The

17



Fig. 1: Partitioning of a conductor into segments.



<u>Fig. 2:</u> Conductor segments considered: a) straight segments, b) 90°bends, c) steps over discontinuities, d) contact windows.

reliability of each segment is expressed in terms of statistical parameters such as median time to failure and standard deviation. These parameters are determined experimentally using special purpose test structures and accelerated testing techniques.

If there is no redundancy in the circuit, the entire chip will fail when any single conductor segment fails. It will also be assumed that there is no interaction between the segments, ie. the probability of failure of any segment is independent of all other segments. Under these conditions, interconnects form a series-connected system and a minimum order statistical

model for reliability is appropriate. The probability of interconnect failure is therefore equal to the probability of failure of the segment having the shortest lifetime. The failure rate of the interconnect system is equal to the sum of the failure rates of all conductor segments:

$$h_{s}(t) = \sum_{i=1}^{n} h_{i}(t)$$
 (1)

The requirement of non-interaction between segments restricts the applicability of this model to current densities below  $4\times10^{\circ}$  A/cm², where negligible thermal interaction due to self-heating occurs. Also, the minimum segment length must be greater than the length of the locality which influences the growth of a single defect. This is approximately equal to the median grain size of the Aluminum in the conductor, usually lµm or less. Boundaries between segments must be chosen along equipotential lines.

#### Reliability Parameters for Electromigration

Electromigration failure is generally described by a lognormal probability density function

$$p(t) = \frac{1}{\sigma \sqrt{2\pi} t} \exp \left[ -\frac{1}{2} \frac{\ln(t) - \ln(t_{50})}{\sigma} \right]^{2}$$
(2)

where  $t_{50}$  = median time to failure and  $\sigma$  = standard deviation. In general,  $t_{50}$  is much greater than the operating life of the circuit and only the early failures are important. If only the first 10% of failures are considered, then the failure rate is approximately equal to the probability density function. The failure rate of the interconnect system is then

$$h_{s}(t) = \sum_{i=1}^{n} \frac{1}{\sigma_{i} \sqrt{2\pi t}} \exp \left[ -\frac{1}{2} \left[ \frac{\ln(t) - \ln(t_{50})}{\sigma_{i}} \right]^{2} \right]$$
(3)

The median time to failure is a function of temperature, current and conductor dimensions. The following expression has been derived from measured data for a segment

10μm long<sup>™</sup>:

$$t_{50} = \frac{1.523E - 5}{J_{1}^{2}n(J)} = 0.07 + \frac{11.63}{L_{1}^{2}n(J)} = 0.07 + \frac{11.63}{L_{1}^{2$$

where J is the latter density in A/cm², we is the width in  $\mu m$ , and T is the absolute temperature. The exponent n(J) is given by the following expressionar:

$$n(J) = \tau J Coth (\tau J)$$
 (5)

where  $\tau=2x10^{-6}$  cm<sup>E</sup>/A. Equation (4) is applicable to DC currents, whereas conductors in ICs generally do not carry only direct current. In analog circuits, currents are often described by sinusoidal or other periodic functions, while in digital circuits they tend to switch between discrete values with exponential transitions resulting from the charge and discharge of circuit capacitances. These current waveforms are easily predicted using a circuit simulator and the resultant data made available in the design data base. In order to use this data, the following expression for  $t_{MO}$  has been derived in terms of time-varying current:

$$t_{50} = \frac{1.523E - 5 \left[w - 3.07 + \frac{11.63}{u^{1.7}}\right] \exp\left[\frac{5800}{T}\right]}{\sum_{j=1}^{m} \frac{p_{j}}{t_{jf}} \int_{0}^{t_{i}(t_{j})/A} \int_{0}^{t_{i}(t_{j})/A} dt_{j}} (6)$$

Equation (6) is based on the assumption of a static relationship between current density and electromigration damage". The denominator contains the summation of the electromigration damage produced by operating the circuit in n different modes. A circuit simulator is used to simulate each operating mode. The circuit operation in mode j is simulated from time  $t_3 = 0$  to  $t_4 = t_3 r$ . The current  $i(t_4)$  is the output data of each simulation, while  $p_3$  is a weighting factor representing the probability of the circuit being in mode j during normal operation. The constant A is the cross-sectional area  $(cm^2)$ , which is written as

$$A = (w d) 1E-8 \tag{7}$$

where w and d are the width and thickness, respectively (in  $\mu m)_{\star}$ 

(4) -- The standard deviation of the conductor segment is a function of width only.

$$\sigma = \frac{2.192}{2.625} + 0.787$$
 (8)

The results in equations (6) and (8) may be extended to straight conductors of any length by dividing the conductors into a number of 10 µm segments. Other conductor shapes such as corners, steps and contact windows may be approximately modelled as straight sections having equivalent values of width, thickness and length (this ignores the effect of stress concentration caused by the shape of segment). These values are then used in equation (3) to compute the overall interconnect failure rate.

#### Examples

A computer program has been written to solve equations (6) and (8). The input data for this program is a SPICE output data file containing the current-time data collected during a transient simulation. At present, conductor dimensions are entered manually; a circuit extractor will eventually automate this task.

The circuit diagram and layout of a CMOS 4bit Carry Look Ahead Unit are shown in Figs. 3 and 4 respectively. This is a standard cell component which forms part of a cell library in a double layer metal, 1.5µm technology. The design was simulated using SPICE and transient device current data was collected in several output files. An example of a simulated current waveform (in this case, the current in the power supply bus) is shown in Fig. 5. The designer enters the physical parameters of the conductor which he is designing, including an initial estimate of width. He also supplies the weighting factor p, applicable to each simulation. The failure rate for the conductor is calculated and if the designer is not satisfied he can input a new value of width. The total failure rate for all interconnects h. is calculated using equation (3), Fig. 6 shows the failure rate of the Carry Look Ahead Unit as a function of time. The median time to failure is much greater than 15 years and so the failure rate increases monotonically with time over the useful lifetime of the component. A failure rate of 10 FITS (1 FIT =  $10^{-9}$  failures/hr.) is considered to the maximum value acceptable. In this case, the failure rate is still less than 1 FIT after



Fig. 3: Circuit diagram of Carry Look Ahead Unit.



Fig. 4: Layout of Carry Look Ahead Unit.



Fig. 5: Current in power supply bus vs. time.

10 years of operation and so the design of this cell is highly reliable. However, Fig. 6 also shows that if 119 of these cells were produced in a 17x7 grid on the surface of a 300 mil x 300 mil chip, the failure rate of the chip would reach 10 FITS in 7½ years. This clearly shows the impact of increased complexity on reliability. The chip failure rate increases not only because of the summation of a greater number of cell failure rates, but also because the current level in the power supply and ground buses increases with increasing number of gates. While the failure rates of individual cells can be calculated in

advance, the failure rate of the entire chip can only be determined once the placement and routing is complete. As die sizes increase, the design of power supply and ground buses may become increasingly important for reliability.



Fig. 6: Failure rate vs. time for a single cell and for a 17x7 cell array.

#### Conclusions

A technique for predicting the reliability of a system of VLSI interconnects has been presented. The models used can accommodate complex VLSI interconnect patterns and actual operating conditions. This technique is useful to IC designers who wish to design interconnects for an optimum tradeoff between area and reliability. It is also of value in the development of new CAD tools for automated layout.

#### Acknowledgements

The assistance of David Abercrombie in preparing the example used, is gratefully acknowledged.

## References

- [1] M.H. Woods, "The implications of scaling on VLSI reliability", Tutorial notes: 22nd Int. Reliab. Phys. Symp., New York: IEEE, 1984, pp. 6-1 to 6-30.
- [2] D.F. Frost and K.F. Poole, "An order statistical method for the prediction of VLSI device reliability using models for failure mechanisms", to be published in IEEE Trans. Reliab.
- [3] J.W. McPherson, "Stress dependant activation energy", Proc. 24th Int. Reliab. Phys. Symp., New York: IEEE, pp. 12 18, 1986.
- [4] D.F. Frost and K.F. Poole, "A model for the median time to failure of VLSI interconnects carrying time-dependent currents", submitted for publication to IESE Trans. Reliab.
- [5] A. Vladimirescu, K. Zhang, A.R. Newton, D.O. Pederson, A. Sangiovanni-Vincentelli, SPICE User's Guide, Berkeley: Dept. of Elec. Eng. & Comp. Sci., Univ. of Calif. at Berkeley, 1983.

<u>Title</u>: Failure rate prediction for defect enhanced electromigration wearout of metal interconnects

<u>Authors</u>: Kevin G. Kemp, Student member IEEE Clemson University, Clemson.

Kelvin F. Poole, Member IEEE Clemson University, Clemson.

David F. Frost, Member IEEE Stellenbosch University, S.A.

Key words: Failure rate, reliability prediction, defects, electromigration

## Reader Aids:

Purpose: Advance state of the art

Special math needed for explanations: Statistics

Special math needed to use results: None

Results useful to: IC design engineers, CAD tool developers

## Abstract

A method which predicts the effect of defects on the failure rate of conductors due to electromigration wearout is proposed. Topographic defects and grain boundary triple points are identified as the major contributing defects. A random spatial distribution of both defect types is assumed, and a range of defect sizes is used. This analysis predicts that random topographic defects significantly increase conductor failure rate while they have little effect on the median time to failure (T50). In addition, predictions of lifetime versus stripe width agree with other published results and show that the increased lifetime of narrower stripes is due to the blocking mechanism of bamboo type structures. However, this increase is limited by topographic defects, which thus impose a minimum achievable stripe width.

## 1 Introduction

The lifetime of a conductor subject to electromigration wearout is determined by the physical structure of the conductor and the operating conditions (current density and temperature) applied to it [1]. In this analysis we will consider as a defect any random feature which enhances the failure by electromigration of the conductor. It is generally accepted that electromigration failure is initiated at grain boundary triple points where there is a mass flow divergence of conductor material [1,2]. These triple points are the first kind of defect with which we are concerned. We also consider macroscopic features such as topographic defects introduced during processing which are responsible for mass flow divergences that contribute significantly to conductor failure [3]. In this category we include photolithographic and other random "spot" defects\* which eliminate a portion

<sup>\*</sup>These partial open circuits are simply less extreme cases of fatal conductor defects which contribute to manufactured yield loss. In this analysis we are concerned only with these partial defects which result in failure of the device after it has been in operation for a length of time.

of the conductor and thus increase current density in the immediate vicinity of the defect. This analysis considers the contribution of both grain boundary triple points and topographic defects to conductor failure by electromigration and determines the effect of both on the failure rate of conductors.

## 2 Theoretical considerations

In developing a method to account for the influence of defects on electromigration we make use of an elemental model which considers the failure probability of a short length  $l_E$  of conductor material. A given stripe of width w and length L is treated as a series connection of N elements of length  $l_E$ . Since failure of the stripe is determined by the failure of the weakest element, the failure probability of the stripe is given by the minimum order statistic of the N elements [2]. Thus the survival probabilities of each of the individual elements. Similarly it may be argued that if each element is subject to a number of independent failure modes then the failure of that element is given by the minimum order statistic of those failure modes. In general then, the probability of failure as a function of time F(t) is given by:

$$F(t) = 1 - \pi \pi [1 - F_{ij}(t)]$$
i i

where  $[1-F_{ij}(t)]$  is the survival probability at time t of the ith element due to failure mode j. In evaluating reliability we are most often concerned with the failure of only the first few devices (usually the first ten percent). In this case the above expression may be reduced to:

$$F(t) = \sum_{i j} F_{ij}(t)$$
 (2)

and the instantaneous failure or hazard rate h(t) is: [2]

$$h(t) = \frac{f(t)}{1 - F(t)} = \sum_{i \neq j} \sum_{i \neq j} h_{ij}(t)$$
(3)

This failure rate h(t) is important in that it provides a useful measure of the reliability of a device or system, and is often used in reliability specifications.

In carrying out such an analysis it is necessary to carefully define the various modes which contribute to the failure mechanism under consideration.

The individual failure modes considered in this analysis are:

- i. Failure by bulk electromigration of an "ideal" element containing no grain boundaries and no geometric defects.
- ii. Failure by grain boundary electromigration of an element which contains only grain boundary triple points.
- iii. Failure of an element due to a topographic defect only.

  For each of these failure modes a model is used to express the failure probability of the element (as a function of time) in terms of its geometry and operating conditions. It should however be noted that we are not concerned with the accuracy of the models used in this paper, but rather are presenting a methodology for determining the contribution of each failure mode to the overall reliability.

### 2.1 Element with no defects

The model for the failure of an elemental conductor follows a lognormal density function given by: [1,4]

$$f(t) = \frac{1}{\sqrt{2\pi} \sigma t} \exp \left[ \frac{1}{2} \left[ \frac{\ln t - \ln T_{50}}{\sigma} \right]^2 \right]$$
 (4)

where: f(t) = failure probability as a function of time

t = time

σ = lognormal standard deviation

 $T_{50}$  = median time to failure

In practice it is not possible to make conductor element perfectly free of grain boundary triple points and topographic defects, therefore it is not usually possible to measure these bulk  $T_{50}$  and  $\sigma$  values. (Although the bulk  $T_{50}$  value may be estimated by comparing bulk and grain boundary electromigration activation energies [5]). Since the bulk  $T_{50}$  value is significantly larger than that for stripes containing grain boundaries, failure due to grain boundary electromigration predominates and the bulk  $T_{50}$  and  $\sigma$  values are not significant in the analysis except to set the lifetime upper limit.

## 2.2 Element with grain boundaries only

The preferential failure occurring at grain boundary triple points is also modeled using a lognormal density function, for which the  $T_{50}$  and  $\sigma$  values may be determined from lifetests on conductor stripes. In this analysis a random distribution of triple points over the conductor is assumed in order to determine the probability of an individual element containing a triple point. The triple point density is determined by the mean grain size  $(x_G)$ , thus:

$$Pr\{triple point\} = 1 - exp \frac{(-w\ell_E)}{\frac{2}{\kappa_G}}$$
 (5)

where: w = stripe width

 $\ell_E$  = element length

x<sub>G</sub> = average grain size

The model also takes into account the probability of "bamboo" structure blocking grains [6,7] which effectively prevent grain boundary electromigration by restricting the distance over which material may be transported. This probability is modeled as:

$$Pr\{blocking grain\} = 1 - exp \left[ \frac{-\ell_{C}(x_{G}-w)}{x_{G}^{2}} \right] ; w < x_{G}$$

$$0 ; w > x_{G}$$
(6)

where:  $\ell_C$  = mean critical distance for mass transport. The effective triple point probability is thus given by

## 2.3 Element with topographic defects only

The contribution of topographic defects is also found by considering a random spatial distribution, except that a range of possible defect sizes is taken into account. For the purpose of this model, these defects are all assumed to be caused by particles, with a size distribution given by [8]:

$$f(x) = A/x^3 \qquad x > x_0$$

$$Bx \qquad x < x_0 \qquad (7)$$

where: f(x) = density of defects of diameter x

A, B = constants

 $x_0$  = minimum reproducible spot size

The  $A/x^3$  function corresponds to the distribution of particle sizes according to the standard environmental class curves of MIL STD 209B [9], and  $x_0$  represents a minimum reproducible spot size determined by the resolution of the photolithographic process.

The defect size distribution is calculated from the particle size distribution by considering the probability of a given spot falling on a conductor element. This closely follows the method adopted by Stapper [8], except that we are concerned here with defects that eliminate only a portion of the conductor. This analysis thus takes into account defects which are smaller than the conductor width, as well as those which fall on the edge of the conductor, and yields the defect size distribution function depicted in Fig. 1.

The defect model also takes into account the effect on conductor lifetime for each size of defect. The observed relationship

$$T_{50} = A J^{-n} \exp(E/kT)$$
 (with n constant) (8)

is applicable to current densities below 1E6 A/cm [10]. However, if a significant portion of a conductor has been removed by a defect then temperature gradients in the vicinity of the defect significantly increase this factor. Results reported by [3], [10] and [11] were used to determine a relationship between defect size and failure time, which is expressed as:

$$T_{50}(w) = T_{50}(w_0) \exp(1 - \frac{w_0}{w}) \qquad w \le w_0$$
 (9)

where:  $w_0$  = nominal stripe width

w = remaining stripe width at the defect

 $T_{50}$  (w) = median time to failure of defected element

 $T_{50}$   $(w_0)$  = median time to failure of undefected element The failure probability of the element due to topographic defects is then calculated from the above lifetime and defect size distribution functions.

## 3 Comparison with experimental results

## $T_{50}$ and $\sigma$ variation with linewidth

The above models were used to simulate the lifetime of stripes having the same dimensions as those in the experiments conducted by [6] and [7]. The variation of  $T_{50}$  with linewidth, depicted in Fig. 2, closely follows that reported by these authors. The "U" shape located around w=2 $\mu$ m derives from the probability distribution of grain boundary triple points and "blocking" grains, which is in exact agreement with the mechanism described by Kinsbron [6]. Values for  $\sigma$  were found to increase with decreasing stripe width, as reported by [6] and [7].

However, further simulation reveals that the increasing lifetime of narrow stripes in the presence of topographic defects is abruptly reversed at some point (typically below 0.1 $\mu$ m), and that  $T_{50}$  approaches zero as the width is reduced further (Fig. 3). This reversal is due to an increase in the severity of topographic defects for narrower stripes, and the location of the turning point is determined by the topographic defect density and size distribution. Topographic defect densities of the order of  $0.5/cm^2$  per level are required for modern semiconductor processes [12], and this value produces a turning point in the vicinity of 0.1 $\mu$ m. It is also noted that topographic defects do not affect the  $T_{50}$  and  $\sigma$  values of stripes that are wider than this minimum value imposed by topographic defects. Interconnect stripes produced in modern semiconductor processes are of the order of 2-5 $\mu$ m, therefore the effect of topographic defects is not apparent from  $T_{50}$  measurements carried out on such stripes, provided they are manufactured under low topographic defect density conditions.

## Effect of topographic defects on the lognormal failure curve

Although we have shown that topographic defects do not significantly change  $T_{50}$  for a conductor stripe, they nevertheless have an important effect on F(t), the time dependent cumulative failure probability. This failure probability is usually plotted on lognormal axes to reflect the approximately lognormal nature of the electromigration wearout mechanism, and Fig. 4 shows a series of lognormal plots obtained from our simulations using various topographic defect densities. At relatively high densities (>1E3/cm²) a deviation in the form of a "tail" appears at the lower end of the lognormal curve, which results from the early failure of those devices which contain topographic defects. Such "tails" are evident in some published lifetest results [3,4,7], and on the basis of this simulation we believe that these published results reflect the early failure of devices due to topographic defects.

## Effect of topographic defects on failure rate

The early failures which cause the "tail" in the cumulative failure probability also result in a corresponding increase in the instantaneous failure rate h(t). This is true even at very low defect densities (of the order of 0.5/cm²), where the deviation from the lognormal curve is not readily apparent. Simulations of instantaneous failure rate for stripes containing topographic defects yield curves which have a "bathtub" shape, identical to that referred to in the literature [9]. Fig 5 shows a series of such failure rate curves plotted on lognormal axes for different conductor lengths and topographic defect densities. Each curve results from the combined effect of two functions - a decreasing failure rate due to the accelerated wearout of a small fraction of defected devices, and a

mound-shaped (lognormal) function due to wearout of the remaining undefected ones. The most seriously defected devices fail during the initial decreasing portion of the bathtub curve, and may be screened out during burn-in testing, while the lowest point on the curve represents the minimum failure rate achievable during the lifetime of the conductor. In addition, the initial decreasing portion of the failure rate curve corresponds to the lower end of the (lognormal) cumulative failure curve where the deviation from the lognormal is observed.

## Implications for interconnect reliability

It is apparent from Fig. 5 that the minimum failure rate (at the lowest point on the curve) increases with both conductor length and topographic defect density. The reliability objective for VLSI circuits is to meet a failure rate criterion of 10 FITs (1 FIT = 1E-9 failures/hour) during the operational lifetime of the circuit [9]. Topographic defects therefore play an important part in determining whether this requirement will be satisfied in respect of the interconnect system of such a circuit. The significance of the method presented here is that it provides a means of determining the relative effect of different types of defects on conductor failure rate, and thus allows more meaningful predictions to be made of interconnect reliability than has previously been possible.

## 4 Conclusion

- 1. Our analysis shows that the trend of increasing  $T_{50}$  as predicted by experimental results from progressively narrower conductors is eventually limited by topographic defects.
- 2. Topographic defects are responsible for early device failures and hence lifetest measurements which yield  $T_{50}$  and  $\sigma$  do not provide the information necessary to predict failure rates over the entire lifetime

- of a device. However, our analysis shows that it is possible to predict early failure rates in the presence of defects from a knowledge of the defect distribution and the failure characteristics of defected devices.
- 3. The aim of this work is to develop a method of predicting the reliability of the interconnect system for a VLSI device. The method developed to assess the effect of defects on the failure rate of a single conductor will be combined with that described in an earlier paper [13] to determine the overall failure rate of a complex system.

## 5 References

- [1] M.J. Attardo, R. Rutledge and R.C. Jack, "Statistical metallurgical model for electromigration failure in aluminum thin-film conductors,"

  J. Appl. Phys., 42, 1971, p. 4343.
- [2] B.N. Agarwala, M.J. Attardo and A.P. Ingraham, "Dependence of electromigration-induced failure time on length and width of aluminum thin-film conductors," J. Appl. Phys., 41, 1970, p. 3954.
- [3] J.R. Lloyd, P.M. Smith and G.S. Prokop, "The role of metal and passivation defects in electromigration-induced damage in thin film conductors," Thin Solid Films, 93, 1982, p. 385.
- [4] P.M. Smith, J.R. Lloyd and G.S. Prokop, "Lot-to-lot variations in electromigration performance for thin film microcircuits," J. Vac. Sci. Technol., A 2 (2), 1984, p. 220.
- [5] R.E. Hummel, R.T. Dehoff and H.J. Geier, "Activation energy for electrotransport in thin aluminum films by resistance measurements," J. Phys. Chem. Solids, 37, 1976, p. 73.
- [6] E. Kinsbron, "A model for the width dependence of electromigration lifetimes in aluminum thin-film stripes," Appl. Phy. Lett. 36, 1980, p. 968.
- [7] S.S. Iyer and C.-Y. Ting, "Electromigration lifetime studies of submicrometer-linewidth Al-Cu conductors," IEEE Transactions on Electron Devices, 31 No. 10, 1984, p. 1468.
- [8] C.H. Stapper, "Modeling of defects in integrated circuit photolithographic patterns," IBM J. Res. Develop. Vol. 82 No. 4, 1984, p. 461.
- [9] M.H. Woods, "MOS VLSI reliability and yield trends," Proc. IEEE, Vol. 74, No. 12, 1986, p. 1715.

- [10] J.D. Venables and R.G. Lye, "A statistical model for electromigration induced failure in thin film conductors," Proc. 10th Ann. Rel. Phys. Symp. IEEE, New York, 1972, p. 159.
- [11] R.A. Sigsbee, "Electromigration and metallization lifetimes," J. Appl. Phys., 44, 1973, p. 2533.
- [12] J.F. McDonald et al, "Yield of wafer-scale interconnections," VLSI Systems Design, December 1986, p. 62.
- [13] D.F. Frost and K.F. Poole, "A method for predicting VLSI-device reliability using series models for failure mechanisms," IEEE Trans. on Reliability, Vol. R-36, No. 2, 1987, p. 234.









