Climate Change
As you read this, the 26th United Nations Climate Change Conference (aka COP26) is underway in Glasgow.
COP26 is an important event. Inside, 25,000 delegates, including 120 heads of state, are discussing how to reduce greenhouse gas emissions and slow global warming. Outside, thousands of protesters, including Greta Thunberg, are pressing folks on the inside for less talk and more action. "No more blah blah blah", as Greta put it.
In this newsletter I want to discuss the role of statistics in understanding global warming, as it's the aspect of climate change that attracts the most public attention, and one of the main reasons people convene, or protest, meetings like COP26. In other words, instead of calling out some misuse of statistics, as I often do in these newsletters, I want to share some examples of how stats helped us identify and contend with an environmental crisis. I believe that three or four centuries from now, if humanity still exists, historians will remember that statistics played a role in saving the earth and its inhabitants. (Is that melodramatic? Probably. It doesn't mean I'm wrong.)
Stats on global warming abound. For example, we read that from 2015 through 2020, each year was probably the warmest on record, and July 2021 was probably the warmest month since 1880. We hear projections for the future that vary in the details but, assuming no changes to current policies, all sound grim. And, there are widely discussed targets, such as the famous 2 degrees. Building on a report from the International Panel on Climate Change (IPCC), many policy experts now hold that limiting global warming to a 2 degree C increase over pre-industrial levels would allow the earth to remain habitable for us, albeit increasingly damaged. That's a low, but perhaps attainable bar.
2 degrees has become a sort of mantra, and in a sense it's one of those "magic numbers", like 10,000 steps, that's nice and neat and slightly misleading. After all, (a) an increase of 2.001 or 1.999 degrees would probably lead to the same outcomes, (b) everyone agrees that our goal should be the lowest possible increase, and (c) temperature targets distract the public from effects of climate change like ocean acidification that aren't substantially temperature-related.
At the same time, that 2 degree target and stats like it are helpful, because they provide the international community some focus for their conversations. Limiting global warming to 2 degrees (or, in the language of the Paris Agreement, “well below” 2 degrees) above pre-industrial levels may be enormously complicated, but at least the goal is clear, and if you find this limit unrealistic, as some experts now do, you can raise the number and still continue a conversation that could, in theory, be productive.
My focus here is not on the temperature targets, but rather on the role of stats in documenting actual temperature changes that have already occurred. Ultimately, COP26 attendees and protesters have come to Glasgow owing to recent spikes in global temperature (and evidence that the increases are mostly anthropogenic – i.e., caused by people).
The statistic I'll be focusing on is global mean surface temperature (GMST). How exactly is GMST measured?
Consider annual GMST for a moment. You might think this stat is calculated in a relatively straightforward way: Just average the average daily temperatures recorded for all the world's cities, oceans, and unpopulated areas. Weight those averages according to the size of each city or region. Then take the average of those averages for the year, and you're done.
If you calculated GMST this way, your estimate would be fairly accurate, but only from a cosmic perspective. A visitor from another planet, for instance, might find the estimate both accurate and useful. We earthlings need much more precision, given that just a few degrees of increase spells our demise, and that each year the fractional increases are accumulating. In short, we need estimates of GMST change expressed at the level of fractions of degrees.
Measuring GMST with such precision is extremely challenging. Ultimately, what makes it challenging is the inherent variability of temperature. All day long temperature changes, from moment to moment, and from place to place. Within any given place – say, one city block – it varies according to relatively stable factors (e.g., the density of the buildings on each side of the street) as well as transitory ones (e.g., the amount of vehicular traffic). It varies according to how far away from the land or water you've placed your measuring device. And, the amount of variability is quite large. At any given moment, the hottest and coldest places on earth differ by over 100 degrees.
Dealing with these kinds of variability is part of what makes the calculation of GMST mind-blowingly complicated (at least to me). Here's the simplest way I can put it: Scientists obtain estimates of GMST by dividing the earth into grids, calculating average temperatures within each grid, then averaging the averages. (Math people: Those are integrated averages.) For example, one of the influential estimates of GMST comes from NASA's Goddard Institute for Space Studies' Surface Temperature Analysis (GISTEMP), which divides the earth into grids of 2 degrees latitude and longitude. (From where you're sitting, that's probably about 138 miles by 110 miles; these values change the closer you are to the poles.) The "M" or "mean" in the acronym GMST is calculated in different ways in different models, but it's often grounded in averages of high and low daily temperatures at each measurement point in each grid. The "S", or "surface" in GMST, actually means "near the surface", because land temperatures are usually recorded by weather stations roughly 5 feet above the ground, while sea temperatures are usually recorded by devices on ships, buoys, etc. ranging from less than an inch to over 60 feet below the waterline.
In order to explain a bit more about the calculation of GMST, I'll share some examples of how statistical procedures are essential to the creation of temperature estimates, as well as to the comparison of results across models. Again, at the risk of sounding melodramatic, stats have been essential for determining that the world is heating up, and for sounding the alarm that both the rate and the extent of temperature increases are dangerous.
1. Interpolation
Ideally, scientists would analyze temperature data for every area within each grid, but the reality of the data doesn't sustain this ideal. There aren't enough measuring devices in all parts of the globe, and data aren't available for every moment of the day. In short, global temperature measurements have substantial gaps across both time and space.
One approach for dealing with missing data is to just treat it as missing and calculate averages based on whatever data are available (as is done, for example, in the UK's influential GMST dataset). A more common approach is interpolation, or the creation of numbers that represent what the missing values are most likely to be. A number of statistical approaches allow interpolation to be done systematically. I'll focus here on spatial interpolation and just two approaches.
When lay people imagine how interpolation works, we tend to think of the simplest form of a technique called inverse distance weighting (IDW). It's pretty easy to understand. Imagine two weather stations 100 miles apart. If Station Alpha calculates an average daily temperature of 20 degrees C and Station Beta calculates an average daily temperature of 26 degrees C, then we can assume, for example, that at a point exactly 50 miles between stations, the average daily temperature is 23 degrees. In other words, 23 degrees would be our interpolated value. Easy enough (and if you're mathematically inclined, you can imagine a simple linear function that predicts temperature based on distance between stations).
In temperature interpolations, IDW is applied to the more complicated case in which many known temperature values are used to calculate an unknown value. The core assumption here is that the closer together in space two places are, the more similar their temperatures will be. For example, suppose that using the figure below, we want to calculate the average temperature in the exact middle of the circle, where the two straight lines cross. Each colored dot in the circle represents an average temperature recording from one weather station. We can interpolate the unknown value with a formula that gives greater weight to values obtained from stations closer to the unknown area. (For example, in this figure, values for the red circles would be most heavily weighted.) This is a simplified version of what GISTEMP does, for example.
Source: ArcGIS Pro
IDW isn't ideal, because not all recorded measurements in a grid will be close enough to allow interpolation to be useful (look at the lower right quadrant of the circle above, for example). Also, IDW doesn't incorporate uncertainty into its estimates. Why is uncertainty important? For one thing, not all measuring devices are equally spaced (as you can see in the figure). IDW will treat the interpolated value from a set of densely clustered measurements the same as values from a set of measurements taken far apart in space. However, it's clear that in the latter case you have much less certainty about the accuracy of interpolated values. Models like GISTEMP don't directly quantify this kind of uncertainty.
What can be done about the problem? More statistics, of course! If you're not a stats person, I've probably already tried your patience, so I'll just briefly add that there are methods of interpolation that do incorporate this kind of uncertainty. For example, Kriging is a regression-like method for interpreting unknown values based on known ones that takes into account uncertainties arising from distances between measurement points. At the same time, Kriging filters out redundancies from tightly clustered measurement points that would lead to biased estimates for those areas. Models such as Berkeley Earth use Kriging, in conjunction with more complicated procedures (more on the Berkeley model shortly).
In sum, statistics is playing a small but essential role in saving humanity by helping us make up numbers!
2. Data selection & adjustment
Rush Limbaugh once claimed that global warming doesn't exist. His argument was that air conditioning has become more prevalent in recent decades, and people have gotten used to it, so when they step outside, it feels hotter than before.
Scientists don't need to respond to lunatics, but climate change skeptics have latched onto some seemingly reasonable concerns about GMST estimates. These include: (a) calculation of estimates based on too few weather stations, (b) inaccuracies in reports from individual stations, and (c) systematic biases resulting from growing urbanization. (The claim here is that since urban areas are often warmer than rural areas, due to urban heat island effects, growing urbanization has led to overestimates of global warming.)
Over the past decade, the renowned organization Berkeley Earth has used advanced statistical methods that enable (a) the inclusion of data from more weather stations, and (b) the calculation of station-level accuracy while correcting for stations whose temperature estimates are found to be inaccurate. Inaccuracies are detected when a station's measurements are highly discrepant from those of nearby stations for no apparent reason, or when there are large discontinuities over time at a single station (e.g., because the station has moved or changed its measurement tools). In addition, Berkeley Earth, like other teams, has (c) conducted GMST analyses that exclude urban areas as well as rural areas near urban ones. After making these and other adjustments, Berkeley Earth researchers have documented warming trends that are consistent with what other groups have observed. The researchers have also showed that limited sampling, station-level inaccuracies, urban heat island effects, and even the adjustment of historical data based on new findings, all have small effects on specific estimates but do not appreciably affect overall estimates of global warming.
In sum, by using statistical approaches that improve data selection and adjustment, Berkeley Earth and others have refuted some of the most sensible challenges to global warming raised by skeptics.
As for Rush Limbaugh...air conditioning does contribute to urban heat island effects, but global warming isn’t the delusion of people who’ve grown accustomed to cool rooms.
3. Model convergence
So far I've shared a brief, incomplete list of the ways statistical procedures support individual models of GMST. Stats also help us compare GMST estimates across models.
Different organizations and research teams use different data and different methods for estimating GMST and GMST changes. Remarkably, their descriptions of prior and present trends tend to be highly similar, as illustrated by the figure below.
Source: NASA GISS
The key takeaway from this figure is that the lines overlap quite closely (notice that the entire y-axis ranges from 0 to 1 degree C). In this figure, the average GMST for 1951-1980 is taken as the reference. Thus, you can see that at the highest point, just prior to 2020, annual GMST was nearly 1 degree C above the 1951-1980 mean. (More on this in the Appendix.)
If you've taken some stats, you might be wondering: What's the correlation between GMST estimates across models? We can't calculate that value, because, statistically speaking, doing so would represent a combination of cheating and confusion. There's much though not total overlap in the datasets that each group used, and there were both similarities and differences in analytical approaches. However, looking at the magnitude of differences between model predictions at any point in time yields remarkably tiny values, particularly after 1920. Although I'm not an expert, I see no divergences across any 20 year period in the past century that would have any practical importance. These are persuasive data. The gold standard for truth in science is replication – different research teams obtaining the same results, using somewhat different samples, different approaches to measurement, and different techniques of analysis. What you see here are multiple replications of past and present GMST estimates.
As for projections of future trends, we won't know until we get there, but it's worth noting that two years ago, a team of researchers examined the performance of climate models published between 1970 and 2007 and found that most models predicted subsequent GMST change quite accurately, with no systematic bias toward under- or overestimation. So, given that current models closely converge, we should be concerned about global warming, and, more broadly, climate change.
Postscript
Public perceptions of climate change are evolving. The UN Development Programme's People's Climate Vote, a survey of half a million people across 50 countries, reveals that most of the world's citizens consider climate change problematic. Among Americans, in spite of a tendency for Republicans to be less concerned than Democrats, there's now a majority bipartisan consensus that climate change is a problem requiring greater action from our government. At the same time, about a third of Americans say that climate change is not affecting their community much or at all, and the percentage is higher for those who live inland. This, I think, reflects a "seeing is believing" mentality: If you don't see it (i.e., experience it first-hand), you're less likely to believe it. Thus, climate change skeptics continue to cite evidence of average temperatures decreasing in certain parts of the world (e.g., in their own cities).
Having lived in Dallas for 30 years, I can tell you that the winters there are shorter, warmer, and less snowy than they used to be. But that's not how to respond to a climate skeptic. GMST is a global stat– increases in GMST are a global problem– and the need to take this problem seriously is evidenced by the graph above, not by anyone's subjective and/or geography-specific observations.
For advice on talking to climate change skeptics, including those who counter scientific generalizations with personal anecdotes, see here.
For tips on reducing your carbon footprint, see here.
For guidance on encouraging elected officials to do their part (and links for who to contact), see here.
Thanks for reading!
Appendix: Temperature anomalies
If you're statistically inclined, you may have noticed that I didn't give any specific values for GMSTs. For example, I talked about the goal of limiting GMST increase to 2 degrees C over pre-industrial levels, rather than citing a numerical goal such as limiting the GMST to, say, 16 degrees C. You may have also noticed that in the figure showing convergence across climate models, convergence is not given for GMST but for "Temperature Anomaly". Other details of that figure might seem puzzling as well.
Briefly, most analyses of global temperature trends don't focus on GMSTs themselves (which are absolute values) but rather on "anomalies", or departures from normal GMST values for a given region over a lengthy period of time. Anomalies may be higher or lower than normal values, which we can call references. For example, in the aforementioned figure, the average GMST for 1951 through 1980 is used as the reference. By representing the 1951-1980 average with a 0 on the y-axis, the researchers effectively standardized GMST data, and the figure shows how much below or above the 1951-1980 GMST each model calculates for each time period.
The reference in GISTEMP's model is 1951-1980; other models use different time periods. For example, the goal of limiting GMST to 2 degrees C above pre-industrial levels treats the period prior to 1880 as the reference (although models differ in the exact details of how "pre-industrial” is defined).
Why look at anomalies rather than absolute temperatures? Mainly because anomalies provide more accurate information across large areas. Consider, for example, a mountain beside a valley. Differences in absolute temperature between the mountain top and the valley will be substantial (and not perfectly correlated). The absolute temperature will continue to differ as you cross to a hill on the other side of the valley. In contrast, studies predict that anomalies tend to be consistent for a several hundred mile region that would include this mountain, valley, and hill. Thus, researchers can look at the larger area and obtain more reliable information about temperature change over time. Since the ultimate goal is to understand climate change, many researchers find it preferable to consider anomalies rather than more variable absolute temperatures, which tell us more about weather than climate.