Topic

Modelling failure rate

The reliability of a system is the consequence of the reliability of all its parts, electronic and mechanical, not forgetting the reliability of the solder joint itself. Although formal rigour is not normally commercially viable, making reliability predictions for systems is usually part of the military designer’s mandate.

Figures for failure rate for the system (or for Mean Time To Fail, which can be derived from the failure rate) can in theory be predicted from failure rates for the individual components. MIL-HDBK-217F1 is an example of a specification which suggests how this might be done. Its failure rate models are specific to particular device types, but are generally of the form

$\lambda _p  = \lambda _b \pi _Q \pi _E \pi _A  \ldots $

where lp is the failure rate under the environmental conditions, lb is the base failure rate at the device temperature, and pQ, pE, pA, … are factors which take into account the quality screening level of the part, the equipment environment and the application severity.

1 This specification and its supplements are available for free download from a number of sites – we used the Society of Reliability Engineers’ web site at http://www.sre.org/pubs/ – but even the PDFs total over 20MB, so make sure you have a fast connection.

 

MIL-HDBK-217 has formed the basis for many other published databases and methods for predicting the reliability of electronic systems. However, all these methods have been rightly criticised as a means of predicting system-level reliability, because of:

Of these criticisms, one of the most cogent is that MIL-HDBK-217 bases its relationship of failure rate to temperature on the Arrhenius model:

\[\lambda _p  = K \cdot \exp ( - {\raise0.7ex\hbox{$E$} \!\mathord{\left/ {\vphantom {E {kt}}}\right.\kern-\nulldelimiterspace}\!\lower0.7ex\hbox{${kt}$}})\]

where K is a constant, E the ‘activation energy’ for the process, k Boltzmann’s constant, and T the absolute temperature.

Quote

“The Arrhenius formula that relates physical and chemical process rates to temperature has been used to describe the relationship between temperature and time to failure for electronic components, and is the basis of methods for predicting the reliability of electric systems. However, this is an erroneous application, since, for the great majority of modern electronic components, most failure mechanisms are not activated or accelerated by temperature increase. And the materials and processes used are stable up to temperatures well in excess of those recommended for use. The reason why the relationship seemed to hold is probably because, in the early years of microcircuit technology, quality control standards were not as high, and therefore a fairly large proportion of components were observed to fail at higher temperatures. However, current data do not show such a relationship.”

O’Connor, 2002

 

As O’Connor points out, the assumed temperature dependence on which much forecasting is based is probably no longer valid. This may well be the reason why the results of testing real products are so much better than those predicted by MIL-HDBK-217 (Figure 1).

Figure 1: Failure rate plotted against temperature for electronic components

Figure 1: Failure rate plotted against temperature for electronic components

after O’Connor 2002

 

In practice, individual component (and joint) base failure rates are extrapolated from experimental data, and a model similar to MIL-HDBK-217 used to predict how the resulting system failure rate will change with operational conditions. The standards also embody some general pointers to good practice, or at least clear justifications for what many designers would consider common sense! For example:

Another problem with reliability prediction is that it depends on the quality of the information. Many of the figures given in MIL-HDBK-217 are now incredibly conservative: combining this with acceleration factors that are also conservative will result in reliability estimates that are way off the mark – “it will never fly, Wilbur!” Nevertheless, carrying out this sort of exercise is worthwhile, because it highlights clearly which are the risk factors, and enables the designer to take avoiding action.

Yet another difficulty in performing reliability assessment is the lack of current information. Reliability testing is all about proving that the product is capable of meeting its intended goals. However, we cannot afford to carry out reliability testing on every product. We therefore need a test regime that can be applied across the board, to allow results to be compared, and to give a means of predicting reliability, based on experience with comparable products and similar designs. This becomes particularly important as the industry adopts alternative practices, such as lead-free soldering, in order to meet environmental objectives, and also develops new and smaller designs of package in order to meet market demands.

IPC-9701 has been developed to create specific test methods for evaluating performance and reliability for surface mount technology, and in particular the attachment of Chip Scale Packages. Its tests are designed to replicate actual use environments of the assemblies, and to give information on the reliability of solder attachment to a number of different circuit structures, embracing flexible and flex-rigid circuits as well as the more traditional rigid substrates. Expect to hear more about this standard!

Finally, be aware that a number of suppliers offer Reliability Prediction Analysis services aimed at helping designers make critical decisions on reliability, maintainability and quality. Examples of such companies are: MTain (http://www.mtain.com/relia/relpred.htm), Quanterion Solutions (http://quanterion.com/index.asp) and Relex (http://www.relexsoftware.com/). If you need to calculate reliability, then a number of software tools are available, some of which are available for trial download.

Self Assessment Question

SAQ

You have been asked to make a high-reliability assembly by a customer who is concerned to achieve a specific failure rate that is probably lower than can be obtained using standard commercial parts. Explain to him how he might do this.

compare your answer with this one

 

[back to top]