SLC winter pollution comparisons 2007-2017



SLC is famous for its winter air pollution (specifically PM2.5). We want to nail down if it’s getting better. In our last post we looked at hourly measurements from Hawthorne Elementary in SLC proper. The trend was encouragingly downward! But, that post only looked at one site and didn’t focus on the winter (when the problem is at its worst).

The plan

Let’s look at 2007-2017 daily data to compare Rose Park (EPA site 49-035-3010) and Hawthorne Elementary (site 49-035-3006), both of which are in SLC proper and 5.2 miles apart as the crow flies. We’ll work up to a winter by winter comparison over the last ten years.

Get the data

We download the data from here and again use these parameters:

Let’s read the two flat files into R and combine.

# While I could have combined this into one file, I didn't want to mask
# how the original .txt data looked coming from EPA website (reproducibility!)
dfH <- read_csv('../data/SLCPM2_5/SLCHawthorne_PM2.5_Daily_1999-2017.csv')
dfRP <- read_csv('../data/SLCPM2_5/SLCRosePark_PM2.5_Daily_2007-2017.csv')

df <- dplyr::bind_rows(dfH, dfRP)

The analysis

Summarize and check the data

For brevity, we’ve loaded in our data and now we’ll grab data from the most robust Hawthorne sensor and select only the helpful columns.

dfinal <- df %>% 
  # Hawthorne has several options; grab the most complete record
  filter(POC == 1, 
         `AQS Parameter Desc` == 'PM2.5 - Local Conditions') %>% 
  # Change the Site column to be more readable
  mutate(Site = ifelse(`Site Num` == 3006, 'Hawthorne', 'Rose Park')) %>% 
  mutate(Date = as.Date(`Date Local`, format="%m/%d/%Y")) %>%
  rename(Measurement = `Sample Measurement`) %>%
  select(Date, Site, Measurement) %>%

# Look at the data
head(dfinal, 4)
## # A tibble: 4 x 3
##   Date       Site      Measurement
##   <date>     <chr>           <dbl>
## 1 2017-11-30 Hawthorne        21.2
## 2 2017-11-30 Rose Park        20.1
## 3 2017-11-29 Hawthorne        11.3
## 4 2017-11-29 Rose Park        12.4

Nice and simple. Note that this tibble is the handy tidy format, in that you could add any number of sites and it wouldn’t require the annoyance of adding new columns. Let’s now count up the average number of observations per year, to see how much data might be missing.

dfinal %>% 
  mutate(Year = lubridate::year(Date)) %>%
  filter(Year %in% 2007:2011) %>%
  group_by(Site, Year) %>% 
  summarize(`Measurements Per Site` = length(Measurement))
## # A tibble: 10 x 3
## # Groups: Site [?]
##    Site       Year `Measurements Per Site`
##    <chr>     <dbl>                   <int>
##  1 Hawthorne  2007                     341
##  2 Hawthorne  2008                     362
##  3 Hawthorne  2009                     351
##  4 Hawthorne  2010                     305
##  5 Hawthorne  2011                     344
##  6 Rose Park  2007                     255
##  7 Rose Park  2008                     356
##  8 Rose Park  2009                     320
##  9 Rose Park  2010                     331
## 10 Rose Park  2011                     346

Note that we weirdly have fewer than 365 observations per year (but foruntately still enough to work with). Note sure why these sites have such issues–anyone?

Two-site comparison over time

Now that we’ve simplified the dataset and feel good about its coverage over time, let’s see how well the measurements from Hawthorne and Rose Park correspond. While there are statistical ways to do this, let’s simply plot.

ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
  geom_line() +
  labs(title = "3-Day Running Average PM2.5 in SLC from 2007 - 2017") +
  scale_colour_manual(values = c("black", "red")) +
  scale_y_log10(name = "PM2.5 µg/m3", 
                limits = c(1,NA), 
                breaks = c(1, 5, 10, 15)) +
  scale_x_date(name = "Year-Month",
               labels = date_format("%Y-%m"),
               limits = c(as.Date('2007-01-01'),NA),
               breaks = date_breaks("1 year"))

While some missing data, it indeed looks like there is fairly good agreement between the Hawthorne and Rose Park daily PM2.5 values from 2007-2017. And it indeed appears that for both sites winter air quality has generally been improving.

Differences after loess smoothing

Even though Hawthorne and Rose Park mostly track together, how to they differ overall? To check longer-term differences, let’s plot with a loess smoother (which is built into ggplot2).

ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
  geom_smooth(method = 'loess') +
  labs(title = "Smoothed PM2.5 in SLC from 1999 - 2017") +
  scale_colour_manual(values = c("black", "red")) +
  scale_y_continuous(name = "PM2.5 µg/m3",
                     breaks = c(6, 8, 10, 12)) +
  scale_x_date(name = "Year",
               limits = c(as.Date('1999-01-01'),
               labels = date_format("%Y"), 
               breaks = date_breaks("1 year"))

Overall, note the downward Hawthorne trend since ~2004! Interestingly Rose Park PM2.5 levels are typically higher than those at Hawthorne Elementary (which is in the Liberty Wells neighborhood). Let’s zoom a bit, since Rose Park data starts so much later.

ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
  geom_smooth(method = 'loess') +
  labs(title = "Smoothed PM2.5 in SLC from 2008 - 2017") +
  scale_colour_manual(values = c("black", "red")) +
  scale_y_continuous(name = "PM2.5 µg/m3",
                     breaks = c(8, 10, 12)) +
  scale_x_date(name = "Year-Month",
               labels = date_format("%Y-%m"), 
               breaks = date_breaks("1 year")) +
  # Zooming w/o affecting smoother
                  expand = FALSE)

Over much of the last six years there’s been a difference of ~1 µg/m3 between the sites. While that level of background difference doesn’t seem like much, this smoothing likely hides shorter-term spikes; what’s scary is even differences in low-level PM2.5 exposure can lead to adverse health effects.

While this is speculation, the heightened PM2.5 levels in Rose Park could be due to proximity to the local refinery, which is just 1.5 miles east of the air quality measurement site (although it’s totally lost on me why the difference is exacerbated in 2012).

Winter comparisons for both sites

Now, let’s make the year-over-year winter comparison crystal clear. Note that because we’re only comparing winters, we need a winter column that allows us to use faceting in ggplot2. (Thanks to Mike Levy for the seasonal support here.)

dividers <- as.Date(paste0(2007:2017, "06", "01"), format = "%Y%m%d")

dfinal <- dfinal %>%
  mutate(Season = cut(as.numeric(Date), 
                      include.lowest = TRUE),
         Season = as.numeric(Season),
         Month = lubridate::month(Date)) %>% 
  filter(Date > as.Date('2006-06-01'),   
         Month %in% c(1,2,11,12), # Only care about winter months
         !          # This avoids an NA facet panel

And now we plot PM2.5 levels for five winters at Hawthorne and Rose Park.

# Add custom winter label for faceting
season_names <- c(
  '1' = "2007-08",
  '2' = "2008-09",
  '3' = "2009-10",
  '4' = "2010-11",
  '5' = "2011-12",
  '6' = "2012-13",
  '7' = "2013-14",
  '8' = "2014-15",
  '9' = "2015-16",
  '10' = "2016-17")

ggplot(dfinal, aes(x = Date, y = Measurement)) + 
  geom_smooth(method = "loess") +
  scale_y_log10(name = "PM2.5 µg/m3",
                breaks = c(3, 5, 7, 10, 15, 20, 30, 40, 50)) +
  scale_x_date(name = "Winter Month",
               labels = date_format("%m"),
               breaks = date_breaks("1 month")) +
  # Zooming w/o affecting smoother
  coord_cartesian(ylim = c(3,40)) +
  facet_grid(Site ~ Season, 
             scale = "free", 
             labeller = labeller(Season = as_labeller(season_names)))

First, note how similar the patterns are between Rose Park and Hawthorne for a given winter–it’s heartening to see the signal is roughly what we’d expect. There is some missing data, however, which is why there’s so much gray in Hawthorne for 2014-15.


Overall, for Hawthorne Elementary (which has the most accurate measurements in the Salt Lake Valley), it’s pretty clear that at least the last two winters have had nearly the lowest PM2.5 levels of the last 10 years. Note that winter 2010-2011 had remarkable air as well. Rose Park confirms this.

While a couple of relatively pollution-free winters could be due to meteorgological conditions (i.e., fewer high-pressure ridges), the fact that this decline in PM2.5 in the Salt Lake Valley is corroborated by the longer term trends shown above gives me confidence that this is a true decline in background pollution.

What does this mean for policy advocacy in Utah? This downward trend is encourgaging and, when combined with the fact that Salt Lake County is still an serious EPA PM2.5 non-attainment area, it gives us reason to believe that brave policy changes can eventually make SLC safe enough for all folks to go outside every winter day. Doesn’t seem too ambitious, right?

Note, see here to track the 2018 Utah Legislative session.