Design for Thermal Issues

Unit 1: The need for thermal management

Unlike the electromagnetic fields you may have studied in our modules on signal integrity and EMC, heat and the flow of heat are much more tangible; real effects about which everyone has personal experience and an intuitive understanding of the issues. So we have an immediate grasp of the fact that, the more heat we put in, the hotter something will get, and the smaller the volume into which the heat is concentrated, the hotter the parts will become. After all, this is what we did as children, when we used sunlight concentrated through a magnifying glass to set paper alight.

In electronic applications, however, just to know that something isn’t burning is helpful, but not sufficient to ensure a long life! What we need are ways of estimating the likely temperature of a circuit element under given operating conditions. And, as that element nears its maximum acceptable temperature, our estimate needs to be ever more accurate. In this respect, thermal management is no different from any other activity where one is “operating near the limits”. So we can expect to find that there are limitations to simple approximations, and that we will need to carry out more precise simulations in order to get a better understanding of the thermal flows within our environment.

This first Unit is a “gentle introduction to the topic”, discussing the sources of heat, where it goes and what it does to the circuit, before overviewing how heat can be removed and how thermal design is tackled.


Unit contents


Where heat comes from

Heat may be generated within the ‘system of interest’, and it can also flow into the system from the external environment and conversely, from the system to the environment. When equilibrium is reached, the net inflow and outflow of heat will be zero.

However, to make sense of that statement we have both to define the ‘system of interest’ and be clear about what heat is. Taking the second of these first. Most commentators dislike the use of the term ‘heat’ as a noun, preferring to think of heat flow into the system (‘heating’) or out of the system (‘cooling’) and of the thermal energy (‘heat energy’) that is associated with the system.

As with potential energy, the thermal energy of a body is the result of the work that has been done on it – if we push a heavy ball up the side of a hill with a rough surface, most of the energy we transfer to the ball will be potential energy, but the friction against the ground will result in heating of the ball. So most of the energy we expend will be effective, and only a small percentage dissipated as heat. As we will see, electronic products are very similar, in that they are relatively efficient in converting input energy (in this case electrical energy) into effective action, but there is still a small percentage of energy that is wasted and converted to heat.

The idea that energy in its various forms is interchangeable is the first of the Laws of Thermodynamics that are discussed at this link. Or, as Flanders and Swan put it, “Heat is work, and work is heat, and that’s the First Law”! We return to the idea of conversion efficiency later in the Unit.

But how about the ‘system of interest’? The fact is that thermal analysis can be carried out at any level of detail: the student of geophysics can apply the concepts to estimating the average temperature of the Earth; as designers of electronic equipment, we can look at the local temperature in a very small area of the surface of a silicon die, at a complete integrated circuit package, at a board assembly, at the array of boards in a complete system cabinet, or at a room stuffed with electronic equipment, such as we might find in a server complex. Whatever the scale, the same basic concepts apply, although the level of detail is different.

Most of the energy drawn by an electronic product is converted to useful work, whether this is the generation of motion (as with a motor), the radiation of energy (as in a cell phone), or the sound emitted through a loudspeaker. Some of these conversions have a higher efficiency than others. For example, a power supply may have a conversion efficiency greater than 95%, whereas a hi-fi audio amplifier may struggle to reach 30% and the efficiency of the actual transducer element (the loudspeaker) in converting electrical energy to acoustic power is quite small.

As far as the electronics is concerned, much of the inefficiency of circuit elements derives from their departures from the ‘perfect component’:

1 Unless they are superconductors (which are outside the scope of this module), some ohmic loss is inevitable.

We will be looking in more detail in this in Unit 2, but be aware, even at this stage, that the losses are frequency-dependent, generally increasing as the frequency increases, and that all the energy losses eventually end up as heat, first within the component itself, and eventually in the system as a whole.

Not only is heat generated within the product, but it arrives at the product through an interaction with the environment. Although Unit 5 deals in more detail with the mechanisms of heat transfer, you will be familiar with convection and radiation as they affect an everyday item such as a transistor radio. Left on the beach, the radio will be warmed by radiation from the sun and cooled by the gentle breezes! And you will also be familiar with the way in which fans are used in a computer enclosure to transfer heat; a type of convection referred to as ‘forced convection’. But convection can heat as well as cool; you may have noticed that the drives in the top of your PC tower tend to be significantly warmer than those at the bottom of the stack.

Conduction is less obvious in an electronic context than it is in the kitchen – leave a wooden stirring spoon in a saucepan being heated, and it is much more friendly to the fingertips than a metal spoon, because the latter conducts heat much better! Yet it is (mostly) conduction that warms the table underneath your laptop computer, transferring heat from hot areas to cold through board and enclosure.

So our ‘system of interest’ will exchange heat with the environment by radiation, convection and conduction. In most of the examples so far, the amount of heat transferred is modest, and the increases in temperature small. But we shouldn’t forget that very substantial amounts of energy can be involved. For example, just how much heat comes from the sun?

Exercise

Engineers are supposed to be good at making estimates, so try and estimate how much ‘solar gain’ a square metre of surface might experience. Hint: you might find a web search helpful here.

Think about the scale of the problem, and make your own estimate before looking at our solution.

 

Were you surprised at the value? We were! Of course, the amount of heat flux depends on a number of influences:

For all these reasons, the amount of external solar radiation received is very variable, although in some circumstances it may be extremely important to take into account. And not all such radiation is absorbed. We know from common experience that light-coloured or reflective objects stay much cooler than matt black ones.

Despite what we have said about the potential for heating by radiation, a more significant influence on most products will be heat inputs from the local environment, whether from other electronics or from external sources of heat. Whilst hand-held equipment is likely to operate in a benign environment, and office equipment usually enjoys similar surroundings, the same cannot be said for critical applications in areas such as automotive and aerospace. Of these, arguably the more aggressive is the ‘under-hood’ environment (‘under-bonnet’ for speakers of UK English!). Here a high maximum temperature is combined with wide temperature excursion, whilst the product is simultaneously subjected to vibration and shock. We will be saying more about the overall environment in Unit 16, when we consider environmental and testing issues in the context of enclosure design.

Finally, in terms of heat sources, we must remember the effect on the individual item of the ‘equipment practice’ employed. Heat may come from adjacent components, or from nearby circuits, and the distribution of heat will depend on the layout of the board, its arrangement within the racking, and even in some cases on the way that equipment is arranged within a room. The physical arrangement of heat-producing and heat-sensitive elements within the overall system can markedly affect the extent of the thermal challenge.

Note

One aspect of design which needs careful consideration is the influence of faults or over-stress conditions. For example, during the module we will be touching on the challenge of “over-clocking”, and on what happens if filters get blocked or fans fail.

In fact, we would always recommend you to consider potential failure modes for any component or design. If you know enough about how things fail, then you may be able to avoid things like the “laptop on fire” problem that was widely reported on 2006 (see my blog of 22 August 2006). This was a (rare) example of something that became a thermal nightmare without actually being a thermal management challenge!

 

[ back to top ]


Where heat goes

Heat will flow from a body in the same ways in which it arrives:

Radiation is usually not a major contributor to heat loss, unless the system is operated at substantially elevated temperatures – more about this in Unit 5. However, within environments such as reflow furnaces, radiation plays a part in the soldering process, even though the bulk of heat is transferred by convection.

Can you remember a really hot day when you felt there was nothing you could do to get cool? [Depending on where you go on holiday, you may need a long memory for this!] And remember that, when you had found somewhere cooler, you were able to keep in the shade (less radiation) and fan yourself (convection) as well as downing a cold beer (mostly conduction!). This is a reminder that heat will only transfer from the ‘system of interest’ if it has somewhere cooler to go. We know intuitively that heat flows from a hot body to a cool body, cooling in the process, but also warming the cooler body. This is stated more formally as the Second Law of Thermodynamics.

Key information

If you haven’t already looked at our information on these Laws, we suggest you do so now. Note that the Second Law has considerable implications for life on Earth, not just the life of electronic circuits.

 

As we have indicated, an equilibrium is generally established where the heat entering our system and leaving it are in balance. However, this rarely happens immediately, so non-steady-state conditions are important in electronics. For example, much equipment is turned on (when it heats up) and later turned off (when it cools down). Whether or not the system reaches equilibrium during the on part of the cycle will depend on the relative rate of energy gain and loss and on the characteristics and mass of the system.

We define the ‘specific heat’ of a material as the number of energy units needed to raise a unit mass through a unit of temperature, and the ‘thermal capacity’ of a system (or any element of a system) as the energy required to raise the system (or element) through a unit of temperature. Hence the thermal capacity of a body is its specific heat multiplied by its mass. Real electronic products are complex combinations of materials, but it is possible to work out a ‘thermal capacity’ for a complete body, based on the masses of the materials it incorporates and the specific heat of each.

Supplementary Information

Read this short information sheet on specific heat and thermal capacity. It contains some typical values and an important note about units.

 

Self-Assessment Question

This is an example of the role of specific heat capacity and the importance of heat dissipation.

A chip generates 50mW of thermal power and has dimensions 5mm x 1mm x 1mm. If it is perfectly insulated on all sides and operates for 30 seconds, calculate the maximum average temperature it will achieve. Assume that the density of silicon is 2,330kg·m−3 and the specific heat capacity 705J·kg−1K−1.

 

Click here to view the answer

Most specific heats (and, correspondingly, thermal capacities) are functions of temperature, but the functions are smooth and curvilinear. However, if you heat a product through an extended range, you also come across changes of state, from solid to liquid to gas. Not only do solid, liquid and gas have different specific heats, but each phase transition is associated with a ‘latent heat’, energy that is either absorbed or released during the phase transition, even though this takes place at a constant temperature. This latent heat represents the energy difference between a relatively ordered state (solid or liquid) and a less ordered state (liquid or gas) in which the molecules are freer to move and are in a higher-energy condition. Whilst a fuse absorbs energy by heating a wire until it liquefies, fortunately most electronics operate at lower temperatures! However, the cooling effect of evaporation of a liquid is used for cooling electronics in some special circumstances, as we will consider in Unit 13.

[ back to top ]


What heat does to the circuit

If you have ever watched war-time radio operators in a movie, you will have some appreciation of the problem that earlier equipment experienced with drift; tuning at radio frequencies was highly dependent on a number of factors, including temperature change. Nowadays, drift is a much more unusual experience. This is because, with the wide use of digital technologies, the circuit function is less dependent on the physical characteristics of the components. Nevertheless, there are more subtle changes with temperature that can still affect the detail of the operation, and may occasionally lead to system failure.

Passive components, such as resistors, capacitors and inductors typically change in value with temperature, at rates which may be as low as 10ppm/°C, but in certain cases may be one thousand times higher. So, where components are used for purposes such as precision filters, drift with temperature may present a challenge. This drift is a reversible change, returning to the original value once the temperature has reverted, but permanent drift may also happen in the longer term. The magnitude of such permanent drift in characteristics often depends on the temperature of operation of the component, and hence on the thermal design of the circuit.

Active components show a number of changes with temperature that affect operation. For example, the bulk resistance of silicon reduces as the temperatures rises, as does the forward voltage of a bipolar junction. At the same time, leakage currents typically increase.

Changes in characteristics with temperature and time can usually be compensated for during the design phase, and drift is any case becoming less important with the increased use of digital technology. However, higher temperatures, and in particular repeated temperature excursions, can result in catastrophic failure, rather than the parametric failure that is a typical consequence of drift. Whilst the relationship between failure rate and temperature varies considerably between component types, there are few failure causes that are not ‘accelerated’ (that is, made worse) by increases in temperature.

This applies whether or not the component is operated within its recommended temperature limits, but the probability of failure increases markedly when devices have to be operated outside their normal specification. This problem isn’t just confined to those who try to squeeze every ounce of performance from their PC (a visit to http://www.overclockers.com/ is enlightening); users in the defence industry frequently need to make use of COTS (Commercial Off The Shelf) components because equivalents to a military specification are no longer available.

[ back to top ]


Trends in power dissipation

When electronic circuitry was powered by valves, the heat that these dissipated had a severe adverse impact on reliability – the electrolytic capacitors “dried out”, resistors went high value, and film and paper capacitors deteriorated. In fact, one of the immediate benefits of the transistor revolution (apart from the smaller size) was a reduction in power consumption; transistors operated at low voltage, reducing both dissipation and the stress on components, and no longer wasting energy in heating a filament to a temperature at which it would emit electrons.

That dramatic reduction in circuit dissipation occurred in the 1950s, and was continued, albeit at a much lower rate, during the next 30 years, as circuits were developed that used progressively lower voltages and lower currents. Nowhere was this more true than in digital circuitry, where the adoption of CMOS logic for most medium-speed applications reduced the energy involved in making a logical transition.

Over the past 20 years, other influences have been at work, affecting both the amount of energy dissipated and the energy density. The first of these is a consequence of Moore’s Law – first formulated in 1965 by Gordon Moore, the CEO of Intel Corporation, and named after him. The prediction was of a continuing doubling of circuit complexity each 18 months.

Figure 1: Growth in complexity of Intel processors

Growth in complexity of Intel processors

 

Although repeated doubling quickly results in very high figures – remember the story of the reward offered to the man who invented chess – Moore’s Law has proved an accurate predictor of trends for more than 30 years. From time to time the fainthearted have predicted the end of the seemingly-endless increase, but the ultimate has not yet been reached. See Moore’s 2003 presentation “No exponential is forever … but we can delay ‘forever’” at this link (PDF file, 2,005 KB).

As predicted by Moore, component complexity has changed quite remarkably over that period. And, as complexity has increased, so has the power dissipated by the key elements in any electronic product, the complex integrated circuits. These are not only the microprocessors that provide programmable intelligence, but many of the support chips and devices that perform specialised functions such as digital signal processing.

Figure 2: Exponential growth in power of Intel microprocessors

Exponential growth in power of Intel microprocessors

Source: Intel (May 2002)

At the same time as complexity has increased, so has the clock speed. Although many advances have been made in the efficiency of digital circuits, the inescapable fact is that power dissipation is related to the number of transitions made, so that the higher speed equates to higher dissipation. Browse for information on the trends in benchmark PCs – what was “state-of-the-art” only a few years ago is now ready for the museum! Change in clock speed alone means that most personal computers now have multiple fans, and thermal management has become a key consideration.

Activity

Read the material on power progression and thermal management in Emerging directions for packaging technologies, Ravi Mahajan, et al, Intel Technology Journal, Volume 06, Issue 02, 16 May 2002 (PDF, 670KB).

Notice the extremely large increases in dissipation (Majahan’s Figure 2 is on a logarithmic scale), and the ways in which these are handled.

 

Greater complexity and higher operating speeds have also been accompanied by pressures from end-users to reduce the overall package size as well as to increase functionality. As a result, not only are more complex integrated circuits frequently used, but they are incorporated in smaller packages with more closely-packed leads.

And there is rarely much space between components: not only is the local power density higher, but some components will provide physical barriers against cooling air. So we must consider the layout and positioning of the board with regard to the cooling system, and not just its overall dissipation.

Most of what we have said about Moore’s Law and higher operating speeds applies primarily to digital circuitry. However, analogue active components and integrated circuits have also increased in complexity, operating speed and power rating. This is particularly true of transmission components used in applications such as mobile phones. An additional complication is provided by these often being supplied as pre-packaged RF modules, with the end-user purchasing a ready-made tested function, in the same way that power supplies have traditionally been sold.

Of course, power problems are not just confined to printed circuit assemblies. At the high-power end of the spectrum, the dissipation in components such as magnetrons can be very significant. This is not because these are inefficient, but even a small fraction of a per cent of energy lost can be important where the component dissipates a very significant amount of energy. Whilst it is beyond the scope of this course, be aware that, for some applications, short bursts of power of megawatt proportions are not unusual. And, of course, very short pulses give no opportunity for equilibrium conditions to be established.

[ back to top ]


How excess heat can be removed

With more and more power to dissipate in less and less volume, removing excess heat has to be deliberate policy, and not just left to chance and ‘adventitious ventilation’. So we seek to assist natural processes by such methods as:

In a typical high-dissipation assembly, several of these approaches will be taken.

As well as improving cooling in these ways, air-flow will be increased. Firstly, natural ventilation is maximised by optimal placement of the heat-generating components. But most systems will have forced ventilation. Often a heat-generating component such as a microprocessor will be fitted with a heat sink with integral fan, and a separate larger fan used to ventilate the complete enclosure. In such cases, the overall flow of heat can be difficult to estimate, other than by sophisticated simulation.

When we come to look at the detail, there will be an emphasis both on the range of materials that can be used, and their thermal properties, and on the structures that can be devised to maximise cooling at minimal cost. This is a particular issue with heat sinks and their use and manufacture.

But keep in mind that all the methods discussed are about distributing excess heat, not removing it. If you are able to influence a design in its early stage, then you should also consider whether it is possible to reduce the energy dissipated by the circuit.

Activity

Thermal issues are becoming ever more important. To demonstrate the challenges as they relate to computer applications, read the first part of Richard Chu’s presentation at this link (PDF file 2,155 KB):

Contrast the traditional and new thermal design requirements in Slides 5 and 6

Consider the impact of cooling on relatively modest products in Slide 7

Look at the dissipation trends in Slides 8–10

Finally, look at the limitations of current cooling technologies and the struggle to take away heat that is projected for the near future.

In later Units we will be looking at further slides as illustrations of cooling practice.

 

[ back to top ]


How thermal design is tackled

The generation of heat within an electronic assembly is unavoidable, so we need to start early during the design to think about the extent of the problem and how it might be tackled. Whilst avoiding the problem by reducing the dissipation is always worth considering, for many applications the choice of semiconductor technology and clock speed will determine the total dissipation. But how that total dissipation is managed is up to the designer.

Many designers will use prior experience to suggest best practice, or to indicate, by means of “rules of thumb”, shortcuts to conservative design such as the maximum dissipation for a particular enclosure type. However, there are dangers in this approach, particularly when working towards the upper limits of dissipation for a design. For this we may need to carry out at least some calculations, and probably perform some finite element modelling of the product in order to be certain that there is enough “thermal headroom”. This concept is illustrated in Figure 3.

Figure 3: Illustration of thermal headroom

Illustration of thermal headroom

 

Activity

Review this report of Staktek’s experience, and see how they applied thermal modelling in three ways, to evaluate a new design concept, to look at the effect of making changes to a package, and to improving a material.

 

As you can see from the example, thermal design is appropriate at all stages during a product’s life cycle, from the original concept through to enhancement of a production item, but thermal modelling becomes especially important at the limits of technology.

Thermal design is equally ‘holistic’, in the sense that it applies to every aspect of a product, design, components and materials, it affects the way in which an assembly is tested, and it impacts markedly on the long-term reliability of the design. We hope that, as a result of studying this module, you will start early during the design process, look at every aspect of the design, and use appropriate calculation and simulation to resolve thermal issues.

[ back to top ]


Resources for this Unit

Each of these lists is in the order in which the material is referenced in the Unit text. However, note that links to SAQ answers are not included!

Needed for activities

Recommended supplementary material

Optional links and information

[ back to top ]