# Heatmap of the hottest final digits of historical football scores.

I heard about a friendly office pool based on the outcome of the Super Important Internationally Televised Football Championship happening this weekend. Rather than betting on the actual winner of the game, everyone picks a handful of combination of two single digits, like (2, 1) or (9, 3). Nobody can have pick the same combinations, but if I choose (2, 1), then (1, 2) is still up for grabs. Each digit corresponds to the last digit in each team’s final score, so if I choose (2, 1) and the final score is 42-21, I win.

It’s reasonable to think that some combinations are more likely than others, but since I’m not very familiar with football, I didn’t have a good sense of what combinations would make the best bets. Naturally, I had to throw a bunch of data into R and make a graph! Below you can see the probabilities of all possible combinations. The data is from 14,417 professional football scores found on http://www.pro-football-reference.com.

Not all of the scores are from NFL games — there are some scores from AFPA, the AAFC, the AFL in here, too. Maybe this will add some funny outcomes in there if they had different scoring systems. Still, the results look reasonable at first glance. Scores ending in 3, 4, and 7 seem to be the best bets. That (7, 0) is so hot right now. (7, 0). R code is below.

library(stringr) library(ggplot2) data <- read.table("nflscores.csv", header=T, sep=",") # I grabbed all scores from http://www.pro-football-reference.com, ## This is gonna get ugly data <- cbind(data, t(matrix(as.numeric(unlist(str_split(data$Score, pattern="-"))), nrow=2, ncol=997))) names(data) <- c("rank", "score", "ptsw", "ptsl", "pttot", "ptdif", "count", "lastdate", "score1", "score2") data$d1 <- data$score1 %% 10 # get last digits data$d2 <- data$score2 %% 10 scores.df <- expand.grid(lastdigit1=c(0:9), # create a grid of all combinations lastdigit2=c(0:9)) scores.df$count <- NA ## I warned you! for(i in 0:9){ # loop through for(j in 0:9){ scores.df$count[which(scores.df$lastdigit1==i & scores.df$lastdigit2==j)] <- sum(data$count[which(data$d1==i & data$d2==j)]) } } scores.df$prob <- scores.df$count / sum(scores.df$count) # prob = freq of a given outcome/total number of games svg("FinalDigitsPlot.svg", width=450, height=390) score.plot <- ggplot(data=scores.df, aes(x=lastdigit1, y=lastdigit2, colour=prob)) score.plot + geom_point(shape=15, size=15) + # rather than use heatmap(), I plotted squares colored by prob theme_set(theme_bw()) + scale_colour_gradient(name="Probability", low="grey70", high="red") + scale_x_continuous(limits=c(0, 9), breaks=0:9) + scale_y_continuous(limits=c(0, 9), breaks=0:9) + labs(x="Last digit (winner)", y="Last digit (loser)") + ggtitle("Final Digits from 14,417 Pro Football Games") dev.off()

Created by Pretty R at inside-R.org