Sports Data Analysis and Visualization

Code, data and visuals for storytellers

Author

Matt Waite & Derek Willis

Published

August 22, 2024

1 Throwing cold water on hot takes

Why do teams struggle? There are lots of potential reasons: injuries, athletes in the wrong position, poor execution. Or it could be external factors: well-prepared opponents, the weather, the altitude or, of course, the refs.

You could turn the question around: why do teams succeed? Again, there are plenty of possibilities that get tossed around on talk radio, on the sports pages and across social media. A lot of hot takes.

The more fundamental question that this course will empower you to answer is this: what do teams and athletes do? Using data, you’ll learn to ask questions and visualize the answers, ranging across sports and scenarios. What did the 2021-22 Maryland men’s lacrosse team do well en route to the national championship? How has the transfer portal (and additional eligibility) changed the nature of programs? In football, do penalties have any relationship on scoring?

To get into these and other questions, we’ll use a lot of different tools and techniques, but this class rests on three pillars:

  1. Simple, easy to understand statistics …
  2. … produced using simple code …
  3. … visualized simply to reveal new and interesting things in sports.

Do you need to be a math whiz to read this book? No. I’m not one either. What we’re going to look at is pretty basic, but that’s also why it’s so powerful.

Do you need to be a computer science major to write code? Nope. I’m not one of those either. But anyone can think logically, and write simple code that is repeatable and replicable.

Do you need to be an artist to create compelling visuals? I think you see where this is going. No. I can barely draw stick figures, but I’ve been paid to make graphics in my career. With a little graphic design know how, you can create publication worthy graphics with code.

1.1 Requirements and Conventions

This book is all in the R statistical language. To follow along, you’ll do the following:

  1. Install the R language on your computer. Go to the R Project website, click download R and select a mirror closest to your location. Then download the version for your computer.

  2. Install RStudio Desktop. The free version is great.

Going forward, you’ll see passages like this:

install.packages("tidyverse")

Don’t do it now, but that is code that you’ll need to run in your RStudio. When you see that, you’ll know what to do: click the green arrow.

1.2 About this book

This book is the collection of class materials for the Fall 2024 JOUR479X course in the Philip Merrill College of Journalism at the University of Maryland. There’s some things you should know about it:

  • It is free for students.
  • The topics will remain the same but the text is going to be constantly tinkered with.
  • What is the work of the author is copyright Derek Willis 2024 & Matt Waite 2019-2023.
  • The text is Attribution-NonCommercial-ShareAlike 4.0 International Creative Commons licensed. That means you can share it and change it, but only if you share your changes with the same license and it cannot be used for commercial purposes. I’m not making money on this so you can’t either.
  • As such, the whole book – authored in Quarto – is open sourced on Github. Pull requests welcomed!