Introducing STeloR and the MMA-DS data based ranking system
After 100+ hours of programming and checking/rechecking, STeloR’s full rollout is done!
Let's start off with the math. STeloR is built on the concept of a traditional elo algorithm. With the basic format set up, I got to work adjusting and developing an MMA based system that would accurately rate fighter contributions.
A quick overview of how an elo system works. Elo algorithms are traditionally used in zero sum situations, i.e. where someone has to win or lose. A starting score is first assigned to player A and player B. In this example, we'll use 800. If the two players have equal scores coming into the matchup, the probability of victory is considered 50/50. If player A wins, we take the probability (in this example it's .5 or 50%) and subtract it from the outcome (1) which gives us 1 - .5 = .5. We then multiply that number by a weighting factor called a K factor. For this example we'll use K = 20, which gives us 20 * .5 = 10. Player A’s score would then increase from 800 to 810 after that victory and Player B, who lost, would see it decrease by (0 - .5) * 20 = -10, leaving them with a score of 790 at the end of the game. At a high level, elo algorithms attempt to determine a probability of an event occurring and then either rewarding or punishing players based on what actually happened compared to the likelihood of it happening.
So let’s convert this to MMA. At the highest level, someone has to win a fight. We could therefore create an elo algorithm since it passes the zero sum requirement, but in the end it will lack context and there are few sports that are as context dependent as MMA. The solution to this is to break apart a fight into the smallest possible component. With our current data, a round is the smallest piece we can look at.
So now that we have that set, we need to figure out how we are rating fighters. I believe there are six true outcomes for a round.
Winning a round
Losing a round
Knocking out your opponent
Getting knocked out
Submitting your opponent
Next, we must convert that into an elo format. Winning a round is zero sum but options 3 to 6 are not. There doesn’t have to be a knockout or submission in a round so how do we account for that? My argument: set the odds to the probabilistic likelihood. What does that mean? We can count the number of rounds in heavyweight history that ended with a knockout and divide it by the total number of rounds, which would give us the specific round's probability of a knockout. If I adjust the starting scores to reflect this average rate, our rankings start to mimic reality to a much greater extent.
The five STeloR context scores that it leaves us with are:
PTS (winning rounds)
KO (knockout likelihood)
KOd (knockout defense)
SUB (submission likelihood)
SUBd (submission defense)
So now we have our starting place but there is more context that needs to be added in. A striker fighting another striker should have a much lower submission probability than a heavy grappler fighting a striker. Fighters should be rewarded and punished in fair measures based on their context dependent probability of outcome. This introduces 16 context specific situations and their corresponding probabilities to give each fighter 16 context specific STeloR scores. Those 16 situations are then split by weight class to account for competition level at the fighter's specific weight class.
Now we have our starting point. For every round in UFC history, I faced each fighter off with the above logic but we ran into a problem. Fighters change weight classes so we can’t just set them at their starting weight class and let them go. We need to add and subtract context as we go.
My solution to that is what I called the lego method (that was how I imagined it in my head). Instead of setting each fighter's starting scores as the average of their first weight class, we set it to 0 for all 5 metrics. When a fighter has their first round, we add their starting score (0) to the average fighter of the weight class in the fight specific context (e.g. striker vs striker) for all five of the scores. This gives us the formula PTS = starting PTS + average PTS.
Let's go back to the previous example where fighter A and B both faced off with a starting score of 800 and ended with scores of 810 and 790 respectively. What I did instead for them was set each one's STeloR PTS ranking as the difference between starting and ending, i.e. +10 and -10. In the next round, we would add those new context specific scores to the average and come out with the correct starting score while accounting for changes in weight class. This also has the added benefit of allowing us to compare the knockout power of Francis Ngannou with Conor McGregor's. When we subtract the context out for their actual ranking, we are able to see how these fighters stack up on the 5 traits on a pound-for-pound basis. Going back and forth between context was a huge step in the development of this ranking system. From there, we rank fighters by their peak and current STeloR totals after summing all 16 context gains.
So now what? We have our PERFECT MATHEMATICAL RANKING SYSTEM right? No. I have identified multiples issues that I am working to resolve and will comprise an updated STeloR ranking system to account for them. I want to go point-by-point to explain the problem and the decision I made for why to not change anything for the time being.
The Jose Aldo Problem
If you look at the all-time featherweight rankings, Jose Aldo is incredibly low on the list. This is because the ranking system only uses UFC fights for the results, meaning Jose’s first fight in the UFC counts as his first fight ever in STeloR. This applies to Strikeforce and all other major, older promotions that modern fighters draw their lineage from. I am going to create context specific systems to rate WEC, Strikeforce, Pride, etc, contributions so we can get a more accurate picture of MMA history. This will up the strength of the system since competition levels will be more accurate over time.
The Khabib Problem
Khabib presents an interesting case because he, like many other fighters, goes on an undefeated streak to win the belt and then retires on top. Elo algorithms don’t especially like this because they want as many rounds as possible to be able to hone in on the true value of the fighter over time. Khabib ranks artificially low at #3 all time behind Dustin Poirier and Donald Cerrone due to his lack of rounds. I don’t think anyone would argue that peak Khabib beats peak Cerrone but elo algorithms require more context than Khabib has provided.
The Beneil Dariush Problem
I knew right away looking at the lightweight rankings that I was going to get a flood of messages on this one. Beneil ranks #5 all time at lightweight behind Khabib but ahead of Tony Ferguson among others. The reason is that Beneil fights consistently good competition and fights a lot. His long career of consistently beating good competition helps his case in the algorithm. I have more thoughts on this concept and I make an argument later that maybe we should adjust our expectations as MMA fans away from momentum and more towards consistency. I believe this was the big problem with all of those who overlooked the Dustin Poirier vs Conor McGregor fight.
One of the problems I've brought up in earlier articles is how to address the lack of fights in the women’s featherweight division. I have decided to just roll them into the women’s bantamweight context but still rank them in the correct weight class. Thus, a women’s featherweight fight would utilize the average performance of the bantamweight division fighters to determine elo probabilities.
March Madness Problem
I addressed this a bit above but MMA as a whole has a bit of a problem akin to March Madness. In many ways, we mentally rank fighters based on single elimination fights which lead to tons of statistical outliers. I don’t think many would say that Francis Ngannou is a more complete or technical mixed martial artist than Curtis Blaydes, but he has beaten him twice now so he is above him in the rankings. I think this is faulty logic. Rankings should be based on talent, performance and expectation, not just what we colloquially call MMA Math or in other words "who beat who". Should we artificially rank Yoel Romero below every middleweight title holder in history due to him never winning the belt? Or do we rate him based on performance and keep in mind the context that he never got it done? Should Donald Cerrone be put below Khabib, Conor, Rafael dos Anjos and Frankie Edgar because he never won the belt or should we rate his dominance context independent and call it a day? I lean towards the latter. We need to be able to add and remove context when the argument dictates it without resorting to MMA math.
The Decay Problem
Fighters decay at insane rates. A year and a half ago, Tony Ferguson was viewed as one of the only fighters possibly capable of stopping Khabib’s dominance. Now, he is fighting Beneil Dariush and trying to make one last run at the title. Tyron Woodley was coming off a dominant win over Darren Till and then reeled off a 15 round losing streak with many asking if another fight is a good idea. How do I, as a data scientist, account for this decay and quantify it? Some fighters go out on top like GSP and Mighty Mouse. Others, like Woodley, stand against the fence for 15 rounds hoping to counter punch their way into a finish. I don’t have an answer yet but am working on this question constantly. I have many ideas but they are in the nascent stages and I need more time.
If you look at our current middleweight rankings, Chris Weidman is #8 just behind Khamzat Chimaev. I don’t think anyone sees that as a close competition but Weidman has not been active enough to see his ranking decrease, so I have to figure out a way to reduce his score over time to account for this decay. There are many wrinkles to this question and I think it provides the perfect opportunity to do our first MMA-DS Youtube video and podcast on this. I am working on the logistics of both so sign up to our newsletter to get updated immediately when it happens!
So now to finish off this insanely long write up. What does this all mean? This is a major step forward for the MMA analytics community. Building out a data science based ranking system has been my dream since the day I first came up with the idea for MMA-DS. Our rankings page will be kept automatically up-to-date by the data architecture I have built that will keep the data flowing automatically! I am working on building out further applications of this system and I am going to be churning out a bunch of interesting content diving into the different fighters, weight classes and concepts. Make sure to sign up for our newsletter to be kept up-to-date!