Every year around this time, the 350+ Division I basketball teams wrap up their regular seasons and–save for the 32 conference champions who receive automatic bids–leave their fates in the hands of the aptly-named Selection Committee. Later today, this committee will finish pouring over each team to select the 36 at-large teams to complete the field of 68. It’s a supposedly painstaking process that takes many days and long nights for the committee members to arrive at their selections.
It’s also completely unnecessary. Actually, worse than that. It’s detrimental to the process. I’m here to tell you why.
Resume vs Quality
Before I get to the specifics of the pros and cons of a committee of humans versus a computer selecting the NCAA Tournament teams, it is important to discuss the difference between a team’s resume/body of work/strength of record (a descriptive measure) and their true strength/team quality (a predictive measure). I’ve discussed this in the past, and in the years since, this distinction has thankfully started to garner some more discussion and understanding. But it is of the utmost importance to the Selection question that we understand and accept this distinction between the two types of measures.
Here is the brief, simplistic summary:
- If you want to answer a question such as “who would win between Team A and Team B if they played on a neutral court?” then you want a measure of predictiveness, such as Ken Pomeroy’s team ratings at KenPom.com or ESPN’s BPI. One example of how predictive metrics work is to cite how a team’s scoring margin is often more predictive of a team’s future win-loss record than their actual win-loss record.
- If you want to answer a questions such as “which team’s record against their strength of schedule was most impressive?” then you want a descriptive measure, such as my Achievement Ratings or ESPN’s Strength of Record. These usually take the form of either a “WAB” (Wins Above Baseline) metric or a “difficulty to achieve” metric that measures how hard it is to achieve a team’s record against their schedule. This is the type of measurement that we should be using to determine which teams are most deserving of a selection to the tournament. It should be a reward for what a team has earned on the court.
I often point to professional team sports, where nobody bats an eye when a “worse” team with a better record makes the playoffs over a better team who ended with a worse record. There really should be no debate about this question.
Objective vs Subjective
Once we settle on a resume-style, objective system for selection, the necessity to involve humans in the process goes almost completely away. Generally, the trade off for humans vs computers is that humans have the ability to assess more complex factors but at the expense of introducing bias, inconsistency, and opacity into the process. In a predictive system, there would be more use for things like the always-controversial “eye test” to determine teams that may be better or worse than their statistics, or to assess complex issues like how much injuries affected a team. In a resume system, the main factor is a team’s win-loss record, which is simple and objective and therefore requires no additional human interpretation.
The one area where humans may be of some use is in determining the difficulty of an opponent. In a resume system, teams’ win-loss records are graded against the strength of their schedule, which is a combination of how good their opponents were and where the games were played. The latter is easy to account for, but the former could potentially use humans to augment. Purely objective team strength metrics could be used, but humans could theoretically add value by assessing injuries or other factors that affect the level of competition. However, doing so should be done in a systematic manner to ensure that biases and inconsistencies are limited as much as possible.
Benefits of removing humans from the selection process
The previous section touched on the general benefits of humans versus computers, but let’s talk specifically about why removing humans from the selection process would be almost exclusively beneficial.
- Eliminate bias and inconsistency: When humans try to assess teams for selection, it’s unavoidable that they will introduce biases and inconsistencies, whether consciously or subconsciously. They may for some reason prefer Team A to Team B and, say, overvalue Team A’s good wins but discount their bad losses, but do the opposite for Team B.
- Avoid miscalibration, double-counting, and shortcuts that reduce accuracy: Assessing something complex like the schedule and performance of dozens of teams across an entire season is really hard. Out of necessity, humans take shortcuts to help simplify the picture–breaking a team’s wins and losses down by Quadrant is an example. However, this comes with a cost, which is less accuracy. Grouping opponents by quadrant means that beating Duke at Cameron Indoor counts the same as beating the #50 team on a neutral site. Clearly those should not count the same. It’s also impossible for humans to know how much to weight all the different factors–how should 5 wins against teams ranked in the 100s compare to a win against a top 10 team and 4 other sub-250 wins? Given we can do this consistently, perfectly, and automatically without humans, all we are doing is adding a drag to the accuracy of our process by simplifying things for humans so they can try and replicate what a computer can do without any need for simplification.
- Standardize the selection criteria: Even worse than trying but failing to apply the selection criteria correctly is the fact that the guidelines on selecting teams is so laughably vague that committee members likely vary wildly on the actual criteria they are trying to apply, which just exacerbates the inconsistency exponentially.
- Make the system simple, clear, transparent, and deterministic: When the NFL gets to the final week of their regular season, every team knows where they stand. They know their record, their competitors records, and what the outcome of each scenario will be. In the NCAA selection world, teams are left in the dark–they don’t know where they currently stand, what future outcomes will do, or even exactly what the committee values. They have some idea, of course, but there’s a large amount of randomness involved. An objective system would solve all of these problems and make it clear and transparent where each team stands and why. All of the relevant factors and weights are determined and known ahead of time. No longer will teams be confused about whether they should prioritize wins or schedule strength, whether good wins matter more than avoiding bad losses, or whether their late-season surge matters more, less or the same as their early-season struggles. Throughout the season, up-to-date rankings could be produced by anyone and teams would know exactly where they stand and why they rank where they do.
Besides what I alluded to earlier about using humans to aid on the Schedule Strength calculation, there is virtually no need for human involvement during the season. Factors and weights should be decided ahead of time so that everyone knows what the rules are, they are applied in a consistent and predictable manner, and the results are transparent and available at all times.
Proposed objective system
It’s pretty clear what the basics of an objective, resume-based system should consist of. Start with a team’s win-loss record and then adjust it for the difficulty of their schedule. This is not only a better, more accurate system than what is currently used, it is extremely simple and understandable, significantly more so than the current system.
If wanted, the system could include some adjustments for things like conference championships, good wins, or recent play. We can debate whether they should be included or not, but the main point is that they will be decided ahead of time, known to every team, and applied consistently.
To me, the perfect system is an “Strength of Record” metric that evaluates a team’s record against its schedule and determines the implied quality of the team from that, such as Seth Burns’ Parcells metric. For example, going 28-2 against a mid-major schedule is indicative of the same quality of team as going 24-6 against an ACC team’s schedule (for illustrative purposes only). A similar system that is simpler but slightly less accurate is a Wins Above Baseline system, where you determine what record a bubble-team would have against each team’s schedule, and then simply compare that to the team’s actual record. If a bubble team would go 18-12 against Duke’s schedule, and Duke went 27-3, they are +9 WAB. Your team missed out on the tournament? Well, just look at their WAB and you can see exactly how many wins short of inclusion they were.
It’s time to stop using the eye test and predictive measures. It’s time to eliminate the ridiculously and unnecessarily complicated “team sheets”. It’s time to move to an objective, resume-based system to stop the committee from, consciously or not, punishing certain teams to the benefit of others. It’s time to abolish the Selection Committee.