The Case for Objective Selection

Every year around this time, the 350+ Division I basketball teams wrap up their regular seasons and–save for the 32 conference champions who receive automatic bids–leave their fates in the hands of the aptly-named Selection Committee. Later today, this committee will finish pouring over each team to select the 36 at-large teams to complete the field of 68. It’s a supposedly painstaking process that takes many days and long nights for the committee members to arrive at their selections.

It’s also completely unnecessary. Actually, worse than that. It’s detrimental to the process. I’m here to tell you why.

Resume vs Quality

Before I get to the specifics of the pros and cons of a committee of humans versus a computer selecting the NCAA Tournament teams, it is important to discuss the difference between a team’s resume/body of work/strength of record (a descriptive measure) and their true strength/team quality (a predictive measure). I’ve discussed this in the past, and in the years since, this distinction has thankfully started to garner some more discussion and understanding. But it is of the utmost importance to the Selection question that we understand and accept this distinction between the two types of measures.

Here is the brief, simplistic summary:

  • If you want to answer a question such as “who would win between Team A and Team B if they played on a neutral court?” then you want a measure of predictiveness, such as Ken Pomeroy’s team ratings at KenPom.com or ESPN’s BPI. One example of how predictive metrics work is to cite how a team’s scoring margin is often more predictive of a team’s future win-loss record than their actual win-loss record.
  • If you want to answer a questions such as “which team’s record against their strength of schedule was most impressive?” then you want a descriptive measure, such as my Achievement Ratings or ESPN’s Strength of Record. These usually take the form of either a “WAB” (Wins Above Baseline) metric or a “difficulty to achieve” metric that measures how hard it is to achieve a team’s record against their schedule. This is the type of measurement that we should be using to determine which teams are most deserving of a selection to the tournament. It should be a reward for what a team has earned on the court.

I often point to professional team sports, where nobody bats an eye when a “worse” team with a better record makes the playoffs over a better team who ended with a worse record. There really should be no debate about this question.

Objective vs Subjective

Once we settle on a resume-style, objective system for selection, the necessity to involve humans in the process goes almost completely away. Generally, the trade off for humans vs computers is that humans have the ability to assess more complex factors but at the expense of introducing bias, inconsistency, and opacity into the process. In a predictive system, there would be more use for things like the always-controversial “eye test” to determine teams that may be better or worse than their statistics, or to assess complex issues like how much injuries affected a team. In a resume system, the main factor is a team’s win-loss record, which is simple and objective and therefore requires no additional human interpretation.

The one area where humans may be of some use is in determining the difficulty of an opponent. In a resume system, teams’ win-loss records are graded against the strength of their schedule, which is a combination of how good their opponents were and where the games were played. The latter is easy to account for, but the former could potentially use humans to augment. Purely objective team strength metrics could be used, but humans could theoretically add value by assessing injuries or other factors that affect the level of competition. However, doing so should be done in a systematic manner to ensure that biases and inconsistencies are limited as much as possible.

Benefits of removing humans from the selection process

The previous section touched on the general benefits of humans versus computers, but let’s talk specifically about why removing humans from the selection process would be almost exclusively beneficial.

  • Eliminate bias and inconsistency: When humans try to assess teams for selection, it’s unavoidable that they will introduce biases and inconsistencies, whether consciously or subconsciously. They may for some reason prefer Team A to Team B and, say, overvalue Team A’s good wins but discount their bad losses, but do the opposite for Team B.
  • Avoid miscalibration, double-counting, and shortcuts that reduce accuracy: Assessing something complex like the schedule and performance of dozens of teams across an entire season is really hard. Out of necessity, humans take shortcuts to help simplify the picture–breaking a team’s wins and losses down by Quadrant is an example. However, this comes with a cost, which is less accuracy. Grouping opponents by quadrant means that beating Duke at Cameron Indoor counts the same as beating the #50 team on a neutral site. Clearly those should not count the same. It’s also impossible for humans to know how much to weight all the different factors–how should 5 wins against teams ranked in the 100s compare to a win against a top 10 team and 4 other sub-250 wins? Given we can do this consistently, perfectly, and automatically without humans, all we are doing is adding a drag to the accuracy of our process by simplifying things for humans so they can try and replicate what a computer can do without any need for simplification.
  • Standardize the selection criteria: Even worse than trying but failing to apply the selection criteria correctly is the fact that the guidelines on selecting teams is so laughably vague that committee members likely vary wildly on the actual criteria they are trying to apply, which just exacerbates the inconsistency exponentially.
  • Make the system simple, clear, transparent, and deterministic: When the NFL gets to the final week of their regular season, every team knows where they stand. They know their record, their competitors records, and what the outcome of each scenario will be. In the NCAA selection world, teams are left in the dark–they don’t know where they currently stand, what future outcomes will do, or even exactly what the committee values. They have some idea, of course, but there’s a large amount of randomness involved. An objective system would solve all of these problems and make it clear and transparent where each team stands and why. All of the relevant factors and weights are determined and known ahead of time. No longer will teams be confused about whether they should prioritize wins or schedule strength, whether good wins matter more than avoiding bad losses, or whether their late-season surge matters more, less or the same as their early-season struggles. Throughout the season, up-to-date rankings could be produced by anyone and teams would know exactly where they stand and why they rank where they do.

Besides what I alluded to earlier about using humans to aid on the Schedule Strength calculation, there is virtually no need for human involvement during the season. Factors and weights should be decided ahead of time so that everyone knows what the rules are, they are applied in a consistent and predictable manner, and the results are transparent and available at all times.

Proposed objective system

It’s pretty clear what the basics of an objective, resume-based system should consist of. Start with a team’s win-loss record and then adjust it for the difficulty of their schedule. This is not only a better, more accurate system than what is currently used, it is extremely simple and understandable, significantly more so than the current system.

If wanted, the system could include some adjustments for things like conference championships, good wins, or recent play. We can debate whether they should be included or not, but the main point is that they will be decided ahead of time, known to every team, and applied consistently.

To me, the perfect system is an “Strength of Record” metric that evaluates a team’s record against its schedule and determines the implied quality of the team from that, such as Seth Burns’ Parcells metric. For example, going 28-2 against a mid-major schedule is indicative of the same quality of team as going 24-6 against an ACC team’s schedule (for illustrative purposes only). A similar system that is simpler but slightly less accurate is a Wins Above Baseline system, where you determine what record a bubble-team would have against each team’s schedule, and then simply compare that to the team’s actual record. If a bubble team would go 18-12 against Duke’s schedule, and Duke went 27-3, they are +9 WAB. Your team missed out on the tournament? Well, just look at their WAB and you can see exactly how many wins short of inclusion they were.

It’s time to stop using the eye test and predictive measures. It’s time to eliminate the ridiculously and unnecessarily complicated “team sheets”. It’s time to move to an objective, resume-based system to stop the committee from, consciously or not, punishing certain teams to the benefit of others. It’s time to abolish the Selection Committee.

This entry was posted in College Basketball, descriptive, March Madness, predictive, team evaluation. Bookmark the permalink.

3 Responses to The Case for Objective Selection

  1. GW says:

    WAB/SOR metrics should be explained to the committee and used by them in their evaluation of teams*, but these aren’t infallible numbers as there is no “objective” metric that can avoid human bias/judgment. You are merely replacing the judgment of the committee for the judgment of someone’s computer model. The same epistemic gap is present regardless of how mathematically complicated you make the ranking system. Not only do computer rankings have to decide how to properly incorporate margin of victory, injuries, a team’s improvement, roster issues, game control, and home court advantage into their results; we can’t truly know when we “have it right.”

    We’ve conducted this experiment before in football. The BCS system ranked teams based on a predetermined model, which inevitably had years of controversy which led to the formula getting tweaked on multiple occasions. Computer models were used in order to make the process more accurate and reduce bias, yet quite often they too would fail to do what we were promised. This went on for nearly two decades until college football realized it should do it like basketball and thus a committee was created for their new playoff system. There’s always going to be controversy; that’s the nature of attempting to decide between teams of almost identical strength who play different schedules. It comes down marginal differences that aren’t universally agreed upon. It’s unavoidable.

    Let’s pretend the committee read your proposal and secretly agreed to select/seed the field based on this “objective” resume metric. When it came time to make the bracket, a committee member might inform the others that Seton Hall had earned a 10 seed since it had the 39th best resume according to ESPN BPI’s Strength of Record. The committee smiles at each other, knowing that now the bias and subjectivity has been eliminated. They have accomplished what their predecessors were unable to do, which is fairly select the field of 68. A feeling of warmth comes over the room…at least for a few seconds. And then, glancing down at his computer, another committee member informs the group that the resume metric he was showing had Seton Hall at #50 (in WAB on Bart Torvik), and thus Seton Hall should miss the tournament by a few slots. Silence takes over the room for an uncomfortable amount of time.

    That highlights the hole in this scheme. Even “objective” ranking systems don’t fully agree with one another. You are still getting controversy, except now we’re arguing about which computer model should be used to pick the resume.

    Now for most teams there will be general agreement between systems (within 2-3 places of each other) that might only disagree a seed line at most. But then again the committee also is mostly aligned with not only WAB/SOR metrics but also with the bracketologists’ consensus brackets. I believe this year the consensus bracket got 67/68 teams correct (had TCU in instead of Belmont). The seed lines were also fairly accurate, I only saw one team that was seeded more than 2 lines from what the consensus predicted (Iowa).

    In short, the committee does a fairly good job. It is still necessary, and will likely always be necessary in a sport with 350+ teams playing vastly different schedules to fill a field of 68.

    *For this reason I think UNC Greensboro should have made it in this year. They have a fairly comfortable WAB margin (+1.9) as well as SOR that would estimate them to be a 9 seed. While the committee shouldn’t just take and plug them in this seed for the aforementioned reasons, WAB/SOR/Parcells should be used as a sanity check. Perhaps if the committee was more aware of this metric (and perhaps they are, who knows); it would be enough to nudge their first-four-out status into a last-four-in tournament berth. At the same time, I feel very strongly that WAB favors teams who play weaker schedules because most computer models assume normal distributions that don’t quite capture reality at the tails (the worst teams are likely worse than computer models show, and the best teams are likely better than computer models show). If this is true, then UNC G’s WAB is likely closer to 0 than to 2.

  2. Monte says:

    I think you are making 4 points:

    1) A computer model is still subjective just like a human.

    True, but that’s not really the point here. The point is that the committee has varying definitions that are unknown and likely applied with bias and inconsistency. An objective model would have subjectivity in its creation, but FROM THAT POINT FORWARD, would not have any. At that point, the criteria are known and transparent to everyone. This is infinitely better.

    2) Not all “resume” metrics agree with each other.

    Not a big deal, we just have to settle on one and go with it. It doesn’t matter if it’s “imperfect” as long as it’s objective and consistent.

    3) Objective systems have been tried before and don’t work.

    I’d argue, in the case of the BCS, it was a flaw in implementation. College football would also be better off with an objective system and not a committee. Also, professional sports have forever determined playoff seeding with objective systems (usually “winning percentage”). College sports can’t do that due to varying schedule lengths and competition, but it just means you need to add a little extra calculation like Strength of Record metrics do.

    4) The committee ends up close enough to a good answer so there’s no need to replace them.

    The committee doesn’t add anything positive. The argument that they aren’t hugely negative does not mean they should be kept.

Leave a Reply

Your email address will not be published.