25  Visualizing your data for reporting

Visualizing data is becoming a much greater part of journalism. Large news organizations are creating graphics desks that create complex visuals with data to inform the public about important events.

To do it well is a course on its own. And not every story needs a feat of programming and art. Sometimes, you can help yourself and your story by just creating a quick chart, which helps you see patterns in the data that wouldn’t otherwise surface.

Good news: one of the best libraries for visualizing data is in the tidyverse and it’s pretty simple to make simple charts quickly with just a little bit of code. It’s called ggplot2.

Let’s revisit some data we’ve used in the past and turn it into charts. First, let’s load libraries. When we load the tidyverse, we get ggplot2, and we’ll need lubridate, too.

library(tidyverse)
library(lubridate)

The dataset we’ll use is 911 overdose calls from Baltimore County in 2022. Let’s load it.

baltco_911_calls <- read_csv("data/baltco_911_calls.csv")
Rows: 3822 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): address
dbl  (1): callnumber
date (1): date
time (1): time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

25.1 Bar charts

The first kind of chart we’ll create is a simple bar chart.

It’s a chart designed to show differences between things – the magnitude of one thing, compared to the next thing, and the next, and the next.

So if we have thing, like a county, or a state, or a group name, and then a count of that group, we can make a bar chart.

So what does the chart of the top months with the most 911 overdose calls look like?

First, we’ll add a month column to our dataframe using lubridate.

baltco_911_calls_by_month <- baltco_911_calls |>
  mutate(month = month(date, label=TRUE)) |> 
  group_by(month) |> 
  summarize(total_calls = n()) |> 
  arrange(desc(total_calls))

baltco_911_calls_by_month
# A tibble: 12 × 2
   month total_calls
   <ord>       <int>
 1 Aug           366
 2 May           365
 3 Mar           364
 4 Apr           357
 5 Jul           339
 6 Oct           334
 7 Jun           314
 8 Jan           293
 9 Nov           293
10 Sep           289
11 Dec           271
12 Feb           237

Now let’s create a bar chart using ggplot.

With ggplot, the first thing we’ll always do is draw a blank canvas that will house our chart. We start with our dataframe name, and then (|>) we invoke the ggplot() function to make that blank canvas. All this does is make a gray box, the blank canvas that will hold our chart.

baltco_911_calls_by_month |>
  ggplot()

Next we need to tell ggplot what kind of chart to make.

In ggplot, we work with two key concepts called geometries (abbreviated frequently as geom) and aesthetics (abbreviated as aes).

Geometries are the shape that the data will take; think of line charts, bar charts, scatterplots, histograms, pie charts and other common graphics forms.

Asesthetics help ggplot know what component of our data to visualize – why we’ll visualize values from one column instead of another.

In a bar chart, we first pass in the data to the geometry, then set the aesthetic.

In the codeblock below, we’ve added a new function, geom_bar().

Using geom_bar() – as opposed to geom_line() – says we’re making a bar chart.

Inside of that function, the asthetic, aes, says which columns to use in drawing the chart.

We’re setting the values on the x axis (horizontal) to be the name of the county. We set weight to total loans, and it uses that value to “weight” or set the height of each bar.

One quirk here with ggplot.

After we’ve invoked the ggplot() function, you’ll notice we’re using a + symbol. It means the same thing as |> – “and then do this”. It’s just a quirk of ggplot() that after you invoke the ggplot() function, you use + instead of |>. It makes no sense to me either, just something to live with.

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=month, weight=total_calls))

This is a very basic chart. But it’s hard to derive much meaning from this chart, because the counties aren’t ordered from highest to lowest by total_loans. We can fix that by using the reorder() function to do just that:

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=reorder(month,total_calls), weight=total_calls))

This is a little more useful. But the bottom is kind of a mess, with overlapping names. We can fix that by flipping it from a vertical bar chart (also called a column chart) to a horizontal one. coord_flip() does that for you.

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=reorder(month,total_calls), weight=total_calls)) +
  coord_flip()

Is this art? No. Does it quickly tell you something meaningful? It does.

We’re mainly going to use these charts to help us in reporting, so style isn’t that important.

But it’s worth mentioning that we can pretty up these charts for publication, if we wanted to, with some more code. To style the chart, we can change or even modify the “theme”, a kind of skin that makes the chart look better.

It’s kind of like applying CSS to html. Here I’m changing the theme slightly to remove the gray background with one of ggplot’s built in themes, theme_minimal()

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=reorder(month,total_calls), weight=total_calls)) +
  coord_flip() + 
  theme_minimal()

The ggplot universe is pretty big, and lots of people have made and released cool themes for you to use. Want to make your graphics look kind of like The Economist’s graphics? There’s a theme for that.

First, you have to install and load a package that contains lots of extra themes, called ggthemes.

#install.packages('ggthemes')
library(ggthemes)

And now we’ll apply the economist theme from that package with theme_economist()

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=reorder(month,total_calls), weight=total_calls)) +
  coord_flip() + 
  theme_economist()

Those axis titles are kind of a mess. Let’s change “count” on the x axis to “net change” and change “reorder(County,TOTAL_DIFF)” to “county”. And while we’re at it, let’s add a basic title and a source as a caption. We’ll use a new function, labs(), which is short for labels.

baltco_911_calls_by_month |>
  ggplot() +
  geom_bar(aes(x=reorder(month,total_calls), weight=total_calls)) +
  coord_flip() + 
  theme_economist() +
  labs(
    title="More 911 Overdose Calls in Warmer Months",
    x = "month",
    y = "total calls",
    caption = "source: Baltimore County"
  )

Viola. Not super pretty, but good enough to show an editor to help them understand the conclusions you reached with your data analysis.

25.2 Line charts

Let’s look at how to make another common chart type that will help you understand patterns in your data.

Line charts can show change over time. It works much the same as a bar chart, code wise, but instead of a weight, it uses a y.

baltco_911_calls_by_date <- baltco_911_calls |>
  group_by(date) |>
  summarise(
    total_calls=n()
  )

baltco_911_calls_by_date
# A tibble: 366 × 2
   date       total_calls
   <date>           <int>
 1 2022-02-06           8
 2 2022-02-07          15
 3 2022-02-08           7
 4 2022-02-09           5
 5 2022-02-10           5
 6 2022-02-11           8
 7 2022-02-12           9
 8 2022-02-13           7
 9 2022-02-14          11
10 2022-02-15           7
# ℹ 356 more rows

And now let’s make a line chart to look for patterns in this data.

We’ll put the date on the x axis and total calls on the y axis.

baltco_911_calls_by_date |>
  ggplot() + 
  geom_line(aes(x=date, y=total_calls))

It’s not super pretty, but there’s a bit of a pattern here: the number of calls fluctuates between 5 and 20 a day for most of this period, and then jumps way up at certain points during the year. In particular there are spikes in July, early October and January.

Right now, it’s kind of hard to see specifics, though. When did some of those smaller spikes and troughs happen?

We can’t really tell. So let’s modify the x axis to have one tick mark and label per month. We can do that with a function called scale_x_date().

We’ll set the date_breaks to appear for every week; if we wanted every month, we’d say date_breaks = “1 month”. We can set the date to appear as month abbreviated name (%b) and day (%d).

baltco_911_calls_by_date |>
  ggplot() + 
  geom_line(aes(x=date, y=total_calls)) + 
  scale_x_date(date_breaks = "1 week", date_labels = "%b %d")

Those are a little hard to read, so we can turn them 45 degrees to remove the overlap using the theme() function for styling. With “axis.text.x = element_text(angle = 45, hjust=1)” we’re saying, turn the date labels 45 degrees.

baltco_911_calls_by_date |>
  ggplot() + 
  geom_line(aes(x=date, y=total_calls)) + 
  scale_x_date(date_breaks = "1 week", date_labels = "%b %d") +
  theme(
    axis.text.x = element_text(angle = 45,  hjust=1)
  )

Again, this isn’t as pretty as we could make it. But by charting this, we can quickly see a pattern that can help guide our reporting.

We’re just scratching the surface of what ggplot can do, and chart types. There’s so much more you can do, so many other chart types you can make. But the basics we’ve shown here will get you started.