BCS Series: Review of Billingsley ratings

Next up in my review of the computer ranking systems in the BCS is Richard Billingsley. He gives a much more detailed explanation of his ratings on his website: read them here. I will pull out pertinent parts of his explanation and comment. Let’s start with his summary.

 I guess in a sense, my rankings are not only about who the “best team” is, but also about who is the “most deserving” team.

This is a decent start. As I have touched on before, I believe that postseason play–whether it be the BCS, NCAA Tournament, or NFL playoffs–should be a reward for the most deserving as opposed to the “best” teams. However, people get into trouble when they try to satisfy both types of ratings: predictive and descriptive. By straddling the line, ratings suffer from trying to do too many things. Focusing on answering just one question will provide the best, purest, most useful answer.

Starting Position– This is one of the most hotly debated subjects in rankings. Starting position DOES have an impact in rankings, especially in human polls where it is a HUGE advantage, and it does make a slight difference in my rankings…. I believe having a starting position is best, but starting everyone equal is not logical to me. We know through observation of past seasons that some teams are stronger than others. No disrespect to the Vandals, but in 2007 Idaho was not as strong a team as Texas. If we know this in advance, to a high degree of accuracy, then ranking Texas and Idaho equal is not only illogical, it is unfair to Texas and completely (in my mind) skews any hope of an accurate strength of schedule…If a team finishes #10 in 2007, they start #10 in 2008.

We’ve run into our first issue. You absolutely cannot use anything but the current year’s information to rate teams for the BCS. Can you imagine if the NFL gave the Packers an extra win for winning the Super Bowl last year? We are trying to reward teams that have the best season, letting in information from anywhere else is simply incorrect. Yes, 2008 Texas was likely better than 2008 Idaho, but we don’t need to guess at that, especially using simply last year’s rating. Auburn was the best team in the nation last year, but they lost Heisman Trophy winner and #1 NFL draft pick Cam Newton. Clearly they aren’t the same team. We can let those teams prove on the field that they are better. As we’ll see later, Billingsley has a reason for doing this, but his reason for that is also flawed and in this case, two wrongs do not make a right.

Accumulating Points– My system is the only one I am aware of that uses an “accumulating” value system. It was designed this way to emphasize a team’s most recent game as the AP and Coaches do. As a result, a team only gets credit for playing an opponent ONE TIME. Whatever happens to that opponent from that point forward is “water under the bridge.” The greatest example I could ever use to defend this philosophy came just last year (2007) in a scenario involving Oregon. After beating #4 USC and #5 Arizona State on successive weekends the Ducks rose to #3 in the Billingsley Report (#3 AP, #3 Coaches). After losing QB Dennis Dixon Oregon fell to Arizona, UCLA and Oregon State. Every time Oregon lost, USC and Arizona State suffered in some computer rankings. I don’t agree with that methodology. The Trojans and Sun Devils played an Oregon team that was playing some of the best football in the nation during those games. They should not have to suffer because of an injury that happened to Oregon after the fact.

This is one of the reasons why Billingsley uses a starting position for each team. His point system only uses information before the game is played, so if he doesn’t use a starting position, he’d give a team that beat Texas in Week 1 the same credit as a team that beat Idaho, and that is clearly wrong. However, there is no need to restrict yourself to the information at the time of the game. In Week 5, we now know that Texas is better than Idaho and can credit every team that beat Texas accordingly.

Billingsley argues the case of Oregon, who climbed to #3 before losing Dennis Dixon and clearly becoming a worse team. In a perfect world, I agree that if we knew exactly how good Oregon was each week, we should credit their opponent accordingly. However, we don’t and certainly cannot do so in any objective way. Billingsley’s method creates more issues than it solves. First, in the hand-picked example he gives, the injury occurred and Oregon got worse. But what about a team that gets a player back from injury or suspension. Then the opposite is true, and we’d get a better idea of that team’s strength by looking at their performance after that game, not before. Second, we’re simply throwing out information. For example, in evaluating a Week 3 game, why throw out all games the teams play the rest of the season? Certainly that information is useful in determining the quality of each team. And finally, in reference to the starting position, what is a better guess as to a team’s strength at the start of the season, last year’s performance, or their performance the rest of this season? To me, I’d much rather estimate Texas and Idaho’s team strength based on their performance in the current season.

 If a team is playing a #89 team, they cannot earn more points than a team with an equal record playing a #50 opponent, or a #10 opponent etc.

This I can agree with for the most part. However, I would drop the “than a team with an equal record” part and just say “than any team”. Preferably, the constraint would be that teams earn more points for better opponents, as simple as that.

If a team has a bye week, their rating does not change, with two exceptions. A special rule is in place (in the head to head section) that allows an undefeated team to ALWAYS be ranked ahead of every opponent they have beaten, and allows any team experiencing a bye week to remain ahead of a team they had just beaten the week before.

I don’t like special rules, but if there is one place for them it is in college football for undefeated teams. There may be a better way to go about this, but for now I’ll accept the first part of this. I’ll come back to the second part about bye weeks later.

Strength of opponent– This is another great topic of discussion. The value placed on the strength of an opponent is (as it should be) the core of most computer rankings. My system is unique in it’s calculation of strength of schedule as most models use wins and losses and I do not. I use an opponent’s RANK and RATING instead.

Agreed. Of course, his rank and rating use wins, losses, and strength of schedule to determine team strength, and I think most systems attempt to add in strength of schedule as well. However, this is a good method. In my achievement ratings, I prefer to go even one step further and determine the strength of the opponent using the best predictive measure of a team’s strength: their “true” strength. This may necessitate more than wins, losses, and schedule.

Instituting deductions for losses– Remaining undefeated is paramount in my system. A team with no losses has, in effect, a “ticket to the top ten” as long as they are playing a reasonable schedule. With no losses a team receives “full earnings” of their “available opponent value”, but each loss creates a percentage of deduction.

Billingsley goes on to give an example of a 1-loss team getting less credit for beating a #30 team than an undefeated team beating his #35 team. Why? I see no reason for this. The 1-loss team has already been hit for their loss, there is no reason to continue to hurt them, especially in a system like his that accumulates points. As I understand it, his system will penalize a team much more for losing early than for losing late because the team that loses early continues to accrue a penalty for having a loss on their record. This seems patently unfair and unnecessary.

Site of the game– I realize some computers do not take the site of the game into consideration, but I believe it is important. The reward, once again is slight, but it is still a consideration. I believe that playing at Tennessee in front of 106, 000 fans screaming Rocky Top is more difficult than playing in front of 15,000 at Rice stadium. There are some who say any form of measuring the value of the site of a game is biased, but I disagree. My scale is based on information available to the general public through the NCAA and is evaluated by stadium size and average attendance over a 5 year period.

Again, I see this as unnecessary and subjective. Billingsley may feel that playing in front of Tennessee’s crowd is more difficult than Rice’s but I’d like to see his proof. I agree that the location of the game–home or away–should be included, but trying to account for the specific location is extremely tricky and dangerous. Without seeing exactly how he does this, it is impossible to determine if his process is reasonable, but my guess is that it does more harm than good. Another consideration: if Tennessee’s opponent receives more credit for playing in a tougher venue, does Tennessee receive less credit because they are getting more assistance from their home crowd? If not, that is inconsistent.

Instituting head to head rules– The most powerful part of the program states that if certain criteria is met in regards to wins, losses, ratings and rankings that the winner of a game will be ranked ahead of the loser in the next poll. This is guaranteed for one week only.

This seems very ad-hoc. What “criteria” have to be met? Whatever it is, this rule will certainly have many unintended consequences. Beating the #1 rated team will vault a team high up the rankings. Billingsley likely included the “criteria” that has to be met to limit this, but now he just created some odd breaking points. For instance, let’s say a team has to be ranked within 50 spots of a team to get this reward (I am just making this up for argument’s sake, I don’t know what his criteria is). Say the normal points the team would receive for the win would move them up 10 spots on average. However, being with 50 spots moves them ahead of the team they beat. So if team #52 beats #1, they’ll move up to, say, #42. But if #51 beats #1, they’ll skyrocket into the top 10 (say #1 drops to #10 with a loss, #50 would have to move to #9 at worst!). Introducing any arbitrary rules into a rating system will usually cause issues.

Coming back to the bye week issue from above, another arbitrary rule, teams that have a bye week now get this benefit of being rated ahead of the team they beat for two weeks. Even if that team played and won they could drop below the team they had beaten the week before.

In spite of any issues in the power rating, the system still holds an average 76% of higher ranked teams beating lower ranked opponents over its 37 year history.

This goes back to one of my first points. This system should be concerned with producing the best ratings to determine the most deserving teams. The percentage of higher rated teams that beat lower rated teams should be of no concern and has no bearing on the validity of the system.


Richard Billingsley’s ratings start with a good premise: determining the most deserving teams in college football to play in the national title game. However, his ratings suffer from some fatal flaws.

  1. Using a starting position for teams based on last season is simply incorrect and should not be allowed by the BCS in any of their computer systems. Teams should be rated solely on what they have done in the current season.
  2. Crediting/debiting teams solely based on the strength of their opponent coming into their matchup is not an ideal way to do it. While it may capture some times when teams change due to injury, it will miss times when teams get players back from injury. In addition, information from future games is simply thrown out when it would be useful in better evaluating the strength of the teams. Rating systems have the benefit of hindsight, they should use it.
  3. Penalizing teams for losses multiple times is unfair to teams that lose early.
  4. His adjustment for the site of the game is possibly subjective or not weighted appropriately.
  5. Ad-hoc rules for undefeated teams, bye weeks, and head to head wins are unnecessary and arbitrary. They leave unintended consequences that hurt the overall integrity of the system.

At the end of this set of reviews, I plan on looking at how similar the rankings are between each system. Without that, it’s impossible to tell how exactly how much some of these issues affect the ratings. However, I believe that the flaws I have discussed are too much to allow the system to be included in the BCS computer ratings.

This entry was posted in BCS Series, College Football, Football, review, team evaluation and tagged , , . Bookmark the permalink.

6 Responses to BCS Series: Review of Billingsley ratings

  1. Daryl says:

    This rating is in the BCS? Wow.

  2. Ryan L. says:

    Cool analysis Monte. Pretty convincing argument against the Billingsley ratings.

  3. Patrick says:

    Thanks for the post, I appreciate the insight. I think your analysis in regard to using last season’s results for this season is spot on. I’ll have to disagree with you about SOS though,
    “First, in the hand-picked example he gives, the injury occurred and Oregon got worse. But what about a team that gets a player back from injury or suspension. Then the opposite is true, and we’d get a better idea of that team’s strength by looking at their performance after that game, not before. Second, we’re simply throwing out information.”

    While I agree that throwing out information is certainly wrong. I think the important point here is that Oregon’s opponents only get credit for beating the team that they played. I think it works both ways. In the example you gave Oregon got worse and this hurt USC/ASU ranking later in teh year because they played a stronger team, while later teams played a worse team. In the opposite example let’s say Oregon instead had someone return from injury and got better, USC and ASU should still only be evaluated for the team it faced and thus their ranking shouldn’t increase.

    As you elude, computer systems have a terrible time incorporating changes in team dynamics (eg. injuries, coaching changes) mid-season. Do you try to adjust a team’s ranking based on the change? or do you simply assume the team is the same (even though we know it is not)? I think this is the important distinction between AP/Coaching polls which try to predict the value of these adjustments, and the computer systems, which should provide objective (albeit somewhat generalized) predictions.

  4. Monte says:

    Patrick, I may be misinterpreting your argument so correct me if I’m wrong.

    You say that “USC and ASU should still only be evaluated for the team it faced and thus their ranking shouldn’t increase”. My question is, how do you know what Oregon team they faced? If you aren’t given injury information (which would have to be subjectively incorporated anyway), the only objective way to deduce a team’s ability is by their performance in other games they played.

    First, is recency important? In other words, is a game one week away a better indicator of that team’s strength than one two weeks away? I think I would guess yes, but not by much. But even if it is, I see no reason why the week BEFORE is any more indicative than the week AFTER. And certainly 5 weeks before wouldn’t be more predictive than 1 week after, right?

    Second, you now have to decide if incorporating recency is worth throwing out information. Even if you include all games, if you weight them in any way, you are essentially throwing out in formation. If recency exists as a factor, it still has to outweigh the fact that you are throwing out information. If you weight a game 1/2 of another game, you have just reduced your sample size from 2 to 1.5.

    To really answer this question, I think the best way is to try and best predict each team’s performance in a certain game. What do you think would best predict that? Only using their previous games? Only using their ensuing games? The entire season? The entire season with games weighted by weeks away (i.e. game week before and week after are 1 week away, etc.)? I would guess the last two options would far outperform either of the first two options. That answer is important and certainly warrants some research, perhaps I’ll take a look.

  5. NewsToTom says:

    Personally, I prefer Billingsley’s only explanation. Really, here’s how he describes his system:

    “I wrote the program myself and it’s not written using fancy math equations, just simple addition, subtraction , multiplication and division. It’s the RULES that make the system unique and the rules are MY RULES. Rules that make sense from a fan’s perspective. Rules that come from 32 years of experience in which I researched the ENTIRE 132 years of College Football.”

    Thankfully, for better or for worse, the BCS is primarily interested in a matchup between the top two teams in the AP Poll and gets tweaked whenever that’s not the BCS Championship Game, so the computer rankings don’t really matter.

  6. NewsToTom says:

    And that should’ve been Billingsley’s “old” explanation, not “only.” Yeesh.

Leave a Reply

Your email address will not be published.