5 min read

Looking at the worst games in 2019 football season

I am going to be going in depth and looking at the worst games in college football from the 2019 season.

First I need to load the correct library, tidyverse, and then load my data in, all of the games from the 2019 college football season.

library(tidyverse)

badlogs <- read_csv("~/Desktop/SPMC 350 Files/homework/data/badfootballlogs19.csv")

First, I need to fix the Result column. Currently, its data is in W/L format and I want separate columns called Outcome and Score using the separate command.

badlogs %>% 
  separate(Result, into=c("Outcome", "Score"), sep=" ")
## # A tibble: 1,662 × 52
##     Game Date   HomeAway Opponent Outcome Score PassingCmp PassingAtt PassingPct
##    <dbl> <chr>  <chr>    <chr>    <chr>   <chr>      <dbl>      <dbl>      <dbl>
##  1     1 8/24/… N        Miami (… W       (24-…         17         27       63  
##  2     2 9/7/19 <NA>     Tenness… W       (45-…         30         36       83.3
##  3     3 9/14/… @        Kentucky W       (29-…         21         30       70  
##  4     4 9/21/… <NA>     Tenness… W       (34-…         24         34       70.6
##  5     5 9/28/… <NA>     Towson   W       (38-…         24         28       85.7
##  6     6 10/5/… <NA>     Auburn   W       (24-…         25         39       64.1
##  7     7 10/12… @        Louisia… L       (28-…         24         44       54.5
##  8     8 10/19… @        South C… W       (38-…         21         33       63.6
##  9     9 11/2/… N        Georgia  L       (17-…         21         33       63.6
## 10    10 11/9/… <NA>     Vanderb… W       (56-…         27         40       67.5
## # … with 1,652 more rows, and 43 more variables: PassingYds <dbl>,
## #   PassingTD <dbl>, RushingAtt <dbl>, RushingYds <dbl>, RushingAvg <dbl>,
## #   RushingTD <dbl>, OffensivePlays <dbl>, OffensiveYards <dbl>,
## #   OffenseAvg <dbl>, FirstDownPass <dbl>, FirstDownRush <dbl>,
## #   FirstDownPen <dbl>, FirstDownTotal <dbl>, Penalties <dbl>,
## #   PenaltyYds <dbl>, Fumbles <dbl>, Interceptions <dbl>, TotalTurnovers <dbl>,
## #   TeamFull <chr>, TeamURL <chr>, DefPassingCmp <dbl>, DefPassingAtt <dbl>, …

Now that I have the Result column separated out into Outcome and Score, I want to get rid of the parentheses that surround the scores. This is where I’ll use the gsub command.

badlogs %>% 
  separate(Result, into=c("Outcome", "Score"), sep=" ") %>%
  mutate(Score = gsub(")", "", Score, fixed = TRUE)) %>%
  mutate(Score = gsub("(", "", Score, fixed = TRUE))
## # A tibble: 1,662 × 52
##     Game Date   HomeAway Opponent Outcome Score PassingCmp PassingAtt PassingPct
##    <dbl> <chr>  <chr>    <chr>    <chr>   <chr>      <dbl>      <dbl>      <dbl>
##  1     1 8/24/… N        Miami (… W       24-20         17         27       63  
##  2     2 9/7/19 <NA>     Tenness… W       45-0          30         36       83.3
##  3     3 9/14/… @        Kentucky W       29-21         21         30       70  
##  4     4 9/21/… <NA>     Tenness… W       34-3          24         34       70.6
##  5     5 9/28/… <NA>     Towson   W       38-0          24         28       85.7
##  6     6 10/5/… <NA>     Auburn   W       24-13         25         39       64.1
##  7     7 10/12… @        Louisia… L       28-42         24         44       54.5
##  8     8 10/19… @        South C… W       38-27         21         33       63.6
##  9     9 11/2/… N        Georgia  L       17-24         21         33       63.6
## 10    10 11/9/… <NA>     Vanderb… W       56-0          27         40       67.5
## # … with 1,652 more rows, and 43 more variables: PassingYds <dbl>,
## #   PassingTD <dbl>, RushingAtt <dbl>, RushingYds <dbl>, RushingAvg <dbl>,
## #   RushingTD <dbl>, OffensivePlays <dbl>, OffensiveYards <dbl>,
## #   OffenseAvg <dbl>, FirstDownPass <dbl>, FirstDownRush <dbl>,
## #   FirstDownPen <dbl>, FirstDownTotal <dbl>, Penalties <dbl>,
## #   PenaltyYds <dbl>, Fumbles <dbl>, Interceptions <dbl>, TotalTurnovers <dbl>,
## #   TeamFull <chr>, TeamURL <chr>, DefPassingCmp <dbl>, DefPassingAtt <dbl>, …

Looks good. Now, I want to separate the Score column into two columns named TeamScore and OpponentScore. This is also used with the separate command. After that, I need to mutate the TeamScore and OpponentScore into numerical values. I’ll rename this dataframe to notsobadlogs.

badlogs %>% 
  separate(Result, into=c("Outcome", "Score"), sep=" ") %>%
  mutate(Score = gsub(")", "", Score, fixed = TRUE)) %>%
  mutate(Score = gsub("(", "", Score, fixed = TRUE)) %>%  
  separate(Score, into=c("TeamScore", "OpponentScore"), sep="-") %>%   mutate(TeamScore = as.numeric(TeamScore), OpponentScore = as.numeric(OpponentScore)) -> notsobadlogs

Now, for the exciting part, I’m going to look at the worst games in college football. I need to mutate a new field called Differential. This field is taking TeamScore - OpponentScore.

notsobadlogs %>%
  mutate(Differential = TeamScore - OpponentScore) -> notsobadlogs

Next, I’m creating a dataframe called worstgames where the differential is greater than 65 points.

worstgames <- notsobadlogs %>%
  filter(Differential > 65)

Finally let’s create a scatter plot showcasing the worst games with TeamScore on the X axis and OpponentScore on the Y axis. I’m going to add our worstgames data frame in red.

ggplot() + geom_point(data = notsobadlogs, aes(x = TeamScore, y = OpponentScore), color = "grey") + 
  geom_point(data = worstgames, aes(x = TeamScore, y = OpponentScore), color = "red")

To circle our worstgames, I need the library called “ggalt.” After that, I’ll use the code geom_encircle.

library(ggalt)
ggplot() + geom_point(data = notsobadlogs, aes(x = TeamScore, y = OpponentScore), color = "grey") + 
  geom_point(data = worstgames, aes(x = TeamScore, y = OpponentScore), color = "red") + 
  geom_encircle(data = worstgames, aes(x = TeamScore, y = OpponentScore), s_shape=0.1, expand=0.045, colour="black")

Now, lets add a title, subtitle, labels, a caption, and other small details to make this chart pop.

ggplot() + 
  geom_point(data = notsobadlogs, aes(x = TeamScore, y = OpponentScore), color = "grey") + 
  geom_point(data = worstgames, aes(x = TeamScore, y = OpponentScore), color = "red") + 
  geom_encircle(data = worstgames, aes(x = TeamScore, y = OpponentScore), s_shape=0.1, expand=0.045, colour="black") +
  labs(title="The worst games in the 2019 college football season", subtitle="These four games showcased some very dominant offensive football and some very lowsy defense.", caption="Source: Sports-Reference | By Ethan Peterson")  + 
  theme_minimal() + 
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.title = element_text(size = 8), 
    plot.subtitle = element_text(size=10),
    plot.caption = element_text(face = "bold.italic"),
    panel.grid.minor = element_blank()
    )