For you, it’s the beginning of college basketball season. For me it’s the beginning of “kenpom says…” season. The time of year when people on social media or legitimate media will dismiss some team’s ranking in my system by saying “even kenpom says…” and following that with something kenpom has definitely not said.
The most common accusations fall into two groups.
Preseason ratings have too much weight for Team X during the season
Past seasons ratings have too much weight for Team X in the preseason ratings
But kenpom will not say either of things because history tells us that including more data is more useful than not including it. We can prove this in two ways. One way involves actual data and the other is a theoretical approach using too much math.
Let’s start with the less-mathy approach. Florida Atlantic is the 37th team in the 64+ team era to be ranked in the preseason AP top ten after not appearing in the top 10 of any weekly AP poll for the previous five seasons. The previous 36 teams collectively finished in the final AP top ten a total of 10 times.
If this doesn’t sound like very many it’s because it’s not. If you take a team’s historic rate of finishing in the top ten based on its preseason rank, we would have expected 16.8 teams to appear in the final top ten. So there’s an under-performance of teams that start in the top ten without a recent track record.
And teams outside the traditional power conferences have performed even worse. There were nine such cases in this group and none of them ended up in the final top ten. (You can see the full list of teams in the appendix below.)
I didn’t really like cutting this off at the top ten, because FAU is tenth and every other team in the sample was ranked equal to or higher than them. However, if we included teams in the preseason top 15 the conclusion doesn’t change. This increases the group to 67 teams, and just 14 of them finished in the top ten compared to a historic expectation of 24.1. History matters.
I don’t want to overstate the under-performance, though. Over half (34) of the 67 finished in the top 20. Many of these teams still had solid seasons. Some even had great ones. But teams with a lack of recent success were less likely to live up to their preseason rating. Previous seasons (beyond last season) really are useful in making predictions for the upcoming season.
This is because a single season is merely suggestive of a team’s true ability. There’s a lot of human variation from game to game and while the good and bad breaks tend to cancel out over time, for some teams they won’t. Even by the end of a 30-ish game season, there are teams who have been affected by random chance in a significant way that distorts the picture of who they are.
We actually have an idea how much variation there is from game-to-game, and we can use that to figure out how much error there is in a team’s end-of-season rating. Please join me on an exciting excruciating mathematical journey. (Most of you will want to skip ahead to the graphs. My apologies to those who don’t.)
The betting market is the best predictor of games that’s publicly available but it is only so good. People like to say “Vegas knows” when a game’s final score hits the point spread exactly, but the reality is that rarely happens, and often a game’s final score is nowhere near the spread. Last season the final point spread had a root mean squared error (RMSE) of 11.3 if this site is to be believed.
If RMSE is foreign to you, the mean absolute error was 8.87. So the average miss in the spread was about nine points. Nine! The market knows as much as can be known and double-digit misses on the final score are quite common. You can infer some important things from this, including the accuracy of a ratings system heavily based on point spreads.
The general idea is this - if you assume a RMSE of 11 for each game1, the chance that we get an accurate representation of a team after one game is exceptionally small. Let’s call “accurate representation” to be an average absolute error of less than one point per 100 possessions of a team’s adjusted efficiency margin (AdjEM). Well, there’s a 7% chance that a single game will provide an accurate representation of a team’s ability.
With more games comes more confidence, but even after 10 games, you only have a about 22% chance of getting an accurate representation. And after 35 games - the length of a season for many teams - there’s a 39% chance of very good accuracy.
You need a lot of games to be confident that a team’s AdjEM is within one point of its true value. Like a lot more than 35 games. The following graph shows the mean absolute error for a team’s rating by game number.
And the next graph displays what percentage of all team’s ratings will have a mean absolute error of less than 1 (or 3) by game number.
At the end of a 35-game season, most teams will be within 3 points of their calculated AdjEM, but 12% (about 40 teams) will still be off by more than that. And that’s a lot! Even after a full season, we only know so much about the true ability of a team.
Knowing that the errors in a single game are huge and that errors over the whole season can still be significant, we’d be foolish to assume the season ending rating is a pure piece of information.2 This is why preseason ratings carry weight well into the season and why it’s useful to use multiple previous seasons in the preseason ratings themselves.3
We can’t totally trust that the previous season was reality, but the last two or three seasons provide a more reliable estimate of where a program is at. Going back four or five seasons is even better.4 This is especially relevant to FAU, who was 17th last season, but with a similar roster was 129th two seasons ago.
The truth is obviously closer to last season (and the most recent season gets more weight in my ratings) but you ignoring seasons prior to that makes predictions worse. We can’t really know which team’s previous season was more real than others, but as the exercise using the AP poll indicates, the likely candidates figure to be teams that had an outlier season relative to their recent history with bonus points if it’s an outlier season relative to their conference.
Anyway, I think most people truly understand that, because it seems like people that rank FAU the highest have to explain that they are doing do because the Owls “deserve” to be ranked there instead of truly believing they are one of the 5 or 10 best teams in the country.
First of all, God complex anyone? Like, look at me, I’m an AP voter! I get to be the arbiter of what teams deserve! But also, it kind of sets up FAU to be a perceived failure. Based on history, it’s going to be super difficult to live up to those expectations. Last season was amazing, defying the odds on many levels. It could happen again, but taking in all the information we have available, it would also be amazing and defying the odds.
I’m a big fan of the wisdom of (intelligent) crowd and the Massey composite of computer rankings5 has FAU 21st as of this writing while the H.U.M.A.N. poll has them 20th. FanDuel lists FAU with the 20th best odds of winning a national title.
Given that consensus from people who aren’t worried about what a team might deserve, all but one AP voter had FAU 19th or better. It’s another indication that the AP voters do not have enough diversity of thought to start the season. And that they’re more willing to ignore history than the computers or the common kenpom subscriber. But kenpom would never say he endorses this way of thinking.
Appendix
Teams ranked in the AP preseason top 10 since the 1985 season without having appeared in the top 10 of any poll in the previous five seasons. (Teams from outside the top six conferences are bolded.)
season rank teamname final_rank
1 1986 10 Auburn NR
2 1987 9 Navy NR
3 1988 8 Missouri NR
4 1988 10 Wyoming 13
5 1990 9 Arkansas 7
6 1991 10 Ohio State 5
7 1993 8 Memphis NR
8 1994 6 California 16
9 1994 8 Temple 12
10 1994 10 Minnesota 23
11 1995 7 Maryland 10
12 1995 10 Florida NR
13 1996 9 Mississippi St 19
14 1998 10 Xavier 23
15 1999 7 Temple NR
16 1999 9 Tennessee 20
17 2000 8 Florida 13
18 2001 8 Illinois 4
19 2001 10 Seton Hall NR
20 2002 8 Missouri NR
21 2002 10 St. Josephs NR
22 2003 10 Xavier 12
23 2006 5 Villanova 3
24 2007 5 LSU NR
25 2007 8 Georgetown 8
26 2009 9 Notre Dame NR
27 2012 7 Vanderbilt 20
28 2013 6 NC State NR
29 2014 9 Oklahoma St NR
30 2016 8 Oklahoma 7
31 2018 10 USC NR
32 2019 6 Tennessee 6
33 2019 7 Nevada 20
34 2021 5 Iowa 8
35 2021 8 Illinois 2
36 2022 2 UCLA 11
The betting line has an RMSE of 11.3, but it’s not perfect and adjusts to new information. So a betting line perfect knowledge is going to have an RMSE less than 11.3. Probably not much less though. My guess is 11 - and if we assume both teams are equally responsible for this error, then each team has an RMSE of about 7.7 (sqrt((11^2)/2)). An that’s the value I used to compute the error in each team’s ratings based on the number of games played. Like I said, excruciating math.
This is why I avoid putting stock in “Team X is ranked Y since Date Z” factoids. If it was better to ignore a team’s early games, ratings systems would do that. But no ratings systems do that because everyone that’s ever designed a ratings system finds out that in order to get the most accurate rating for a team, all the data matters. Every single game matters. Past seasons matter. Overtime matters. Games played after you got a lucky break in the tournament matter. It all matters. Some of it matters more than others, but it all matters.
This is what makes college hoops fun. With the season so short, there is still a lot of mystery entering the tournament.
I’ve found that going back more than five seasons does not help.
It also includes the AP and the USA Today coaches poll for some reason. Did you know the coaches poll still exists? Did you know that USA Today still exists?