33 Finishing touches
The output from ggplot is good, but not great. We need to add some pieces to it. The elements of a good graphic are:
- Headline
- Chatter
- The main body
- Annotations
- Labels
- Source line
- Credit line
That looks like:
33.1 Graphics vs visual stories
While the elements above are nearly required in every chart, they aren’t when you are making visual stories.
- When you have a visual story, things like credit lines can become a byline.
- In visual stories, source lines are often a note at the end of the story.
- Graphics don’t always get headlines – sometimes just labels, letting the visual story headline carry the load.
An example from The Upshot. Note how the charts don’t have headlines, source or credit lines.
33.2 Getting ggplot closer to output
Let’s explore fixing up ggplot’s output before we send it to a finishing program like Adobe Illustrator. We’ll need a graphic to work with first.
library(tidyverse)
library(ggrepel)
Here’s the data we’ll use: college football team scoring from 2009-2018.
For this walkthrough:
Let’s load them and join them together.
<- read_csv("data/scoringoffense.csv") scoring
Rows: 1253 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Name
dbl (9): G, TD, FG, 1XP, 2XP, Safety, Points, Points/G, Year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("data/totaloffense.csv") total
Rows: 1253 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Name
dbl (8): G, Rush Yards, Pass Yards, Plays, Total Yards, Yards/Play, Yards/G,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- total |> left_join(scoring, by=c("Name", "Year")) offense
We’re going to need this later, so let’s grab Maryland’s 2018 stats from this dataframe.
<- offense |>
umd filter(Name == "Maryland") |>
filter(Year == 2018)
We’ll start with the basics.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey")
Let’s take changing things one by one. The first thing we can do is change the figure size. Sometimes you don’t want a square. We can use the knitr
output settings in our chunk to do this easily in our notebooks.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey")
Now let’s add a fit line.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE)
And now some labels.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team. They overperformed.",
caption="Source: NCAA | By Derek Willis"
)
Let’s get rid of the default plot look and drop the grey background.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team. They overperformed.",
caption="Source: NCAA | By Derek Willis"
+
) theme_minimal()
Off to a good start, but our text has no real heirarchy. We’d want our headline to stand out more. So let’s change that. When it comes to changing text, the place to do that is in the theme element. There are a lot of ways to modify the theme. We’ll start easy. Let’s make the headline bigger and bold.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team. They overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 20, face = "bold")
)
Now let’s fix a few other things – like the axis labels being too big, the subtitle could be bigger and lets drop some grid lines.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team. They overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 20, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_text(size=10),
panel.grid.minor = element_blank()
)
Missing from this graph is the context that the headline promises. Where is Maryland? We haven’t added it yet. So let’s add a point and a label for it.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team. They overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_text(size=10),
panel.grid.minor = element_blank()
+
) geom_point(data=umd, aes(x=`Yards/G`, y=`Points/G`), color="red") +
geom_text_repel(data=umd, aes(x=`Yards/G`, y=`Points/G`, label="Maryland 2018"))
If we’re happy with this output – if it meets all of our needs for publication – then we can simply export it as a png file. We do that by adding + ggsave("plot.png", width=5, height=2)
to the end of our code. Note the width and the height are from our knitr parameters at the top – you have to repeat them or the graph will export at the default 7x7.
If there’s more work you want to do with this graph that isn’t easy or possible in R but is in Illustrator, simply change the file extension to pdf
instead of png
. The pdf will open as a vector file in Illustrator with everything being fully editable.
33.3 Waffle charts require special attention
Frequently in my classes, students use the waffle charts library quite extensively to make graphics. This is a quick walkthough on how to get a waffle chart into a publication ready state.
library(waffle)
Let’s look at the offensive numbers from the 2021 Maryland v. Illinois football game. Maryland won 20-17, even though the Terps outgained the Illini by 113 yards. You can find the official stats on the NCAA’s website.
I’m going to make two vectors for each team and record rushing yards and passing yards.
<- c("Rushing"=131, "Passing"=350, 113)
md <- c("Rushing"=183, "Passing"=185, 0) il
So what does the breakdown of Maryland’s night look like? How balanced was the offense?
The waffle library can break this down in a way that’s easier on the eyes than a pie chart. We call the library, add the data, specify the number of rows, give it a title and an x value label, and to clean up a quirk of the library, we’ve got to specify colors.
ADDITIONALLY
We can add labels and themes, but you have to be careful. The waffle library is applying it’s own theme, but if we override something they are using in their theme, some things that are hidden come back and make it worse. So here is an example of how to use ggplot’s labs
and the theme to make a fully publication ready graphic.
waffle(md/10, rows = 5, xlab="1 square = 10 yards", colors = c("black", "red", "white")) +
labs(
title="Maryland vs Illinois on offense",
subtitle="The Terps couldn't get much of a running game going.",
caption="Source: NCAA | Graphic by Derek Willis") +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 10),
axis.title.y = element_blank()
)
But what if we’re using a waffle iron? And what if we want to change the output size? It gets tougher.
Truth is, I’m not sure what is going on with the sizing. You can try it and you’ll find that the outputs are … unpredictable.
Things you need to know about waffle irons:
- They’re a convenience method, but all they’re really doing is executing two waffle charts together. If you don’t apply the theme to both waffle charts, it breaks.
- You will have to get creative about applying headline and subtitle to the top waffle chart and the caption to the bottom.
- Using ggsave doesn’t work either. So you’ll have to use R’s pdf output.
Here is a full example. I start with my waffle iron code, but note that each waffle is pretty much a self contained thing. That’s because a waffle iron isn’t really a thing. It’s just a way to group waffles together, so you have to make each waffle individually. My first waffle has the title and subtitle but no x axis labels and the bottom one has not title or subtitle but the axis labels and the caption.
iron(
waffle(
/10,
mdrows = 2,
xlab="Maryland",
colors = c("black", "red", "white")) +
labs(
title="Maryland vs Illinois: By the numbers",
subtitle="The Terps couldn't run, the Illini could.") +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 10),
axis.title.y = element_blank()
),waffle(
/10,
ilrows = 2,
xlab="Illinois\n1 square = 10 yards",
colors = c("orange", "blue", "white")) +
labs(caption="Source: NCAA | Graphic by Derek Willis")
)
33.4 Advanced text wrangling
Sometimes, you need a little more help with text than what is easily available. Sometimes you want a little more in your finishing touches. Let’s work on some issues common in projects that can be fixed with new new libraries: multi-line chatter, axis labels that need more than just a word, axis labels that don’t fit, and additional text boxes.
First things first, we’ll need to install ggtext
with install.packages. Then we’ll load it.
library(ggtext)
Let’s go back to our scatterplot above. As created, it’s very simple, and the chatter doesn’t say much. Let’s write chatter that instead of being super spare is more verbose.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team, having to overcome a defense that often struggled to stop opponents. If you compare the offense to every other offense and how many points they score vs the number of yards they roll up, Maryland actually overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_text(size=10),
panel.grid.minor = element_blank()
+
) geom_point(data=umd, aes(x=`Yards/G`, y=`Points/G`), color="red") +
geom_text_repel(data=umd, aes(x=`Yards/G`, y=`Points/G`, label="Maryland 2018"))
You can see the problem right away – it’s too long and gets cut off. One way to fix this is to put \n
where you think the line break should be. That’s a newline character, so it would add a return there. But with ggtext, you can use simple HTML to style the text, which opens up a lot of options. We can use a
to break the line. The other thing we need to do is in the theme element, change the element_text
for plot.subtitle to element_markdown
.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team, having to overcome a defense that often struggled to stop<br/>opponents. If you compare the offense to every other offense and how many points they score vs the number of<br/> yards they roll up, Maryland actually overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_markdown(size=10),
panel.grid.minor = element_blank()
+
) geom_point(data=umd, aes(x=`Yards/G`, y=`Points/G`), color="red") +
geom_text_repel(data=umd, aes(x=`Yards/G`, y=`Points/G`, label="Maryland 2018"))
With ggtext, there’s a lot more you can do with CSS, like change the color of text, that I don’t recommend. Also, there’s only a few HTML tags that have been implemented. For example, you can’t add links because the a
tag hasn’t been added.
Another sometimes useful thing you can do is add much more explanation to your axis labels. This is going to be a silly example because “Points per game” is pretty self-explanatory, but roll with it. First, we create an unusually long y axis label, then, in theme, we add some code to axis.title.y
.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
labs(
x="Total yards per game",
y="Points per game is an imperfect metric of offensive efficiency because defenses and special teams score points as well.",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team, having to overcome a defense that often struggled to stop<br/>opponents. If you compare the offense to every other offense and how many points they score vs the number of<br/> yards they roll up, Maryland actually overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_markdown(size=10),
panel.grid.minor = element_blank(),
axis.title.y = element_textbox_simple(
orientation = "left-rotated",
width = grid::unit(2.5, "in")
)+
) geom_point(data=umd, aes(x=`Yards/G`, y=`Points/G`), color="red") +
geom_text_repel(data=umd, aes(x=`Yards/G`, y=`Points/G`, label="Maryland 2018"))
One last advanced trick: Adding a text box explainer in the graphic. This should be used in somewhat rare circumstances – you don’t want to pollute your data space with lots of text. If your graphic needs so much explainer text, you should be asking yourself hard questions about if your chart is clearly telling a story.
To add a text box explainer, you need to add a geom_textbox
to your chart. The code below does that, and also adds a geom_point
to anchor the box to a spot.
ggplot() +
geom_point(data=offense, aes(x=`Yards/G`, y=`Points/G`), color="grey") +
geom_smooth(data=offense, aes(x=`Yards/G`, y=`Points/G`), method=lm, se=FALSE) +
geom_textbox(
aes(x=225,
y=50,
label="Dots above the blue line indicate offenses that scored more points than their yards per game would suggest they should.",
orientation = "upright",
hjust=0,
vjust=1), width = unit(2.8, "in")) +
geom_point(aes(x=225, y=50), size=2) +
labs(
x="Total yards per game",
y="Points per game",
title="Maryland's overperforming offense",
subtitle="The Terps' offense was the strength of the team, having to overcome a defense that often struggled to stop<br/>opponents. If you compare the offense to every other offense and how many points they score vs the number of<br/> yards they roll up, Maryland actually overperformed.",
caption="Source: NCAA | By Derek Willis") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.title = element_text(size = 8),
plot.subtitle = element_markdown(size=10),
panel.grid.minor = element_blank()
+
) geom_point(data=umd, aes(x=`Yards/G`, y=`Points/G`), color="red") +
geom_text_repel(data=umd, aes(x=`Yards/G`, y=`Points/G`, label="Maryland 2018"))