Utilizing the Judging Model 1.0 to gain insight from MMA Data
One of the main goals of data science is to derive insights from data. I have spent a lot of time brainstorming further utilities of the Judging Model 1.0 and one of the more interesting ones I came up with turned into this article idea. Based upon past results, which judges were the biggest negative outliers and conversely who were the most consistently accurate.
To avoid the issue of massive outliers, I only accepted judges with at least 10 final decisions, i.e. judges with 10 fights that went to a decision, whether it was majority, split or unanimous. Before I get into the results, I think it is important to explain the math behind my criteria and use this as an opportunity to further describe how the model makes decisions.
First up, the math.
I find it is easiest to explain these concepts with examples and analogies so you will find my writing rife with the above two. So let's say Judge X has judged a fight between two fighters. Judge X believes that fighter 1 won two out of three rounds thus giving fighter 1 a score of 2928 or 21 as the losing fighter won a single round. The way the data is currently captured, I am not able to see which rounds the judge said the fighters won but I am able to say that he judged 2 of the 3 as wins. What I then did was to run that same fight through the judging model to see how it graded out the fight as a whole. The judging model gave the hypothetical winning fighter all 3 rounds or a 3027/30 victory. I would assign that judge 1 outlier score meaning that he was off by one round.
Let’s investigate the inverse, what if the model instead gave the losing fighter a 2928/21 victory. That would also be a one point swing mathematically. In this instance, the judge would also be given a 1 outlier score. It is important to note that what I am doing mathematically is taking the absolute value of the difference in score from the model and the judge’s actual score card.
Now take the above concepts and repeat across all decisions in UFC history along with all three judges per fight and we have our data set. Now a smart question to ask is why we should trust the model's decision making? I would direct the reader to my article on the model here. But to dive a little deeper than the original article, let’s discuss the math behind the model and why it is trustworthy.
At the highest level, the model is taking in data from every decision in UFC history and trying to pinpoint the decision the average judge would make. Breaking this down further, imagine that a fight like last Saturday’s main event between Chang Sung Jung and Brian Ortega was judged by every single judge in UFC history. There would certainly be some variability in round scoring but overall we would expect them to be roughly the same. This model functions in much the same way. It is taking in data from every decision and trying to be the last word in judging by averaging together the scores of all these judges and giving out its decision on a round by round basis.
The value of Machine Learning in this context is that it is able to take in all of these rounds of data and process them at the rate of computing rather than having an individual manually score rounds and inflicting their own biases. Once I had cleaned and formatted the data in such a way that it could be fed into the learning algorithm to create the Judging Model 1.0, it took less than five minutes for the prototype to be ready to go.
Now that the math is out of the way, time for the fun part. Based on the scoring system I outlined above, I have fed each round through my model and compared it to the judge's results. To make it comparable, I created the following metric to compare judges:
Total outlier score / Total fights judged
Descriptively, this metric will tell us essentially on average how many mistakes they made per fight. Predictively, we can also use this as a means of saying how many rounds we expect the judge to mess up in future fights. If a judge has a 2.0 outlier score over 100 fights judged, we should expect that judge to be off by two rounds in their score for future fights as well barring any improvements in their skill level.
Now for the summaries, the top five worst judges with at least 10 fights are:

1
2.18
Vinicius Lins

2
1.33
Susan ThomasGitlin

3
1.26
Alejandro Rochin

4
1.16
Tim Vannatta

5
1.15
Nelson Hamilton
Vinicus Lins' score is particularly heinous at 2.18, considering the max value is a 3 if the judge never judges a 5 round main event (the max score would be 5 if the judge only scored 5 round main events). Luckily, Vinicus has only 11 fights so no need to worry about his contributions in the future I would guess.
In order to see some more bad judges, I figured I would up the minimum fight count from 10 to 100 to see who the worst judges are with a lot more experience:

1
1.15
Nelson Hamilton

2
1.13
Jeff Mullen

3
.92
Cecil Peoples

4
.88
Doug Crosby

5
.87
Richard Bertrand
Just for fun, I wanted to add in Sal D’amato, the iron man of judges, who has overseen 440 fights in his distinguished career with an average error of .77. For reference, the judge with second most fights overseen is Chris lee at 320 and third place is Derek Clearly at 268. Sal is in rarefied air when it comes to judging.
Now for a more positive spin, let’s look at the judge with the lowest average round errors and greater than 100 fights scored. Three cheers for the aforementioned Derek Cleary who comes in with a .66 ratio!
This was a really fun exercise for me to go through and would love to hear any interesting thoughts or questions the reader may have. Make sure to follow us on our socials and send me any messages or concerns you may have!