Rebuilding the FaST models with improved clustering

Dec. 22, 2021

Hello everyone! Before Gustavo gets into his first article on MMA-DS.com (Woo hoo!), I wanted to quickly introduce him. Gustavo and I met on reddit where we discussed some of the finer points of unsupervised learning with respect to my FaST models. I quickly realized that I was way out of my depth in comparison to Gustavo and knew right away I had to hire him. Gustavo is getting his master’s degree in AI and he’s doing his thesis on Deep Clustering which utilizes neural networks to assist the clustering of data with higher detail than the methods I utilized in my analysis. I set Gustavo to work on rebuilding the FaST models and here are the results! I will allow the master to now take the metaphorical stage and dive into his findings!

Hello everyone! I am Gustavo and as Jason said above I am a master’s student from Portugal. I was hired by MMA-DS.com to work as a contract data scientist and since I’m working on Unsupervised Learning (extracting knowledge from unlabeled data using machine learning), I thought maybe I could help in that regard. I ended up making some cool discoveries about what different strategies fighters employ by using some advanced clustering techniques.

Jason started out by trying to identify what different types of strategies fighters have. To do this, he collected a dataset where each data point is a round (from the perspective of a fighter) described by (1) the percentage of time a fighter spends on his feet and (2) the percentage of time the fighter spends in grappling control. I recommend you read his breakdown before reading mine but this article is supposed to be self-contained so you’re all good.

As any Data Scientist will tell you, I started out by looking at the data. Lucky for me, there are only two features, so we can easily visualize the data in 2D:

Let's take a step back and remember what we’re trying to do: identify different types of strategies. In unsupervised learning, we can use what are called “clustering algorithms” to achieve this. These algorithms basically try to identify different clusters of data, where a data point (a round in our case) should be similar to other points in its cluster and different from points in other clusters. In the following paragraphs, up until the first table, I will describe the methods and rationale I used to cluster the rounds and find different strategies.

In the scatter plot above, it’s hard to distinguish clusters of points since they’re all spread out pretty evenly. This makes the job for any clustering algorithm harder. To help our cluster algorithm, we can use a UMAP transformation, which will make similar data points stand closer to each other. A cool detail about UMAP is that you can tell it to pay more attention to local structure (the details) or global structure (the big picture). In this case, I want to have a global view of how different fighters approach fights and I’m not really interested in the tiny differences between each approach. Let’s take a look at the result.

Wowza, look at that! Much better. We can clearly see some intertwined but distinct clusters in the middle.By the way, don’t think too much about what the axes mean. This is a transformation so they kind of lose meaning (don’t worry though, it will all make sense).

On first glance I would say that there are about 4 distinct areas, which roughly look like this.

Now comes another issue: what clustering algorithm should I use? The classic K-means is appropriate when we want to identify globular clusters which is actually what we have (approximately). The problem, however, is that K-means can be influenced by noise, i.e. the points that don’t seem to belong to any cluster.

Here’s a look at how K-means clusters this data (Note: I had to add a 6th cluster, which turned out to be cluster 0, so that all that noise on the right wouldn’t affect the performance):

Looks like K-means agrees with my initial take, but there’s still one issue: all of those noisy points around our clusters. They will affect how we perceive the strategies, moving the average grappling control and striking times away from their true values. If you want to understand what a noisy example is, just think of Ryan Hall and his allergy to staying standing.

One thing we can do is to use HDBSCAN instead which is a density based algorithm, i.e. it sees clusters as areas of higher density and leaves out noise. Here’s how this fancy algorithm clusters the data.

Now, all of that surrounding noise is left out. This comes at the expense of leaving out some points that probably should be included, but now we have more accurate average values. Pretty cool. Another way to have a noise robust solution would be to take the median value, instead of the average, of the clusters found by K-means. In a way, this is actually better because K-means is less dependent on UMAP’s transformation, which can be a bit unstable, than HDBSCAN, but the noise will still be there (we’re just working around it) and it could affect the search for tactics inside each strategy in the future. For example, if we want to know the differences between striking approaches, it’s better if we only have examples that are really representative of strikers. So, for now, we’ll stick with HDBSCAN. If you try to replicate this and can’t get such a clean transformation, I would recommend the K-means approach though.

Anyway, let’s go back to our clusters. You might be looking at the last plot and thinking that cluster 0 just looks like noise. Although it looks small, it actually has more than two thousand data points, or roughly 10% of the data, which is hard to tell since it’s very compact. Regarding cluster 1, however, I do tend to believe it is mostly noise as it is pretty spread out and small, so we’ll ignore it.

Let’s see what the average values of each cluster looks like (see how that weird transformation actually translated into clusters that make sense?).

Cluster % Control Time % Standing
1 0% 100%
2 7.2% 82.6%
3 14.5% 63.7%
4 30.7% 30.6%
5 75% 16.8%

The first thing Jason and I thought about is whether clusters 0 and 2 really represent different strategies. The 10% of time fighters in cluster 2 spent on their backs corresponds to just 30 seconds, so we’re just looking at a fighter who got taken down for a bit and got back up. Conor McGregor and Derrick Lewis would be proud. As to the 7.2% of time they spent in control, it corresponds to just 22 seconds, so it’s probably the result of some scrambling to get back to the feet. This is a good example of an algorithm, in this case UMAP, thinking that two subsets of data are very different (look at how far clusters 0 and 2 are from each other) when, in reality, they’re very similar. In this case, UMAP thinks they’re different because fighters in cluster 0 managed to stay on their feet for the entirety of the round, which you can argue is a big difference if you don’t have the context. What I'm trying to say is that UMAP is a casual.

After merging those 2 clusters, we get the following classes.

Class % Control Time % Standing
Striking Heavy 2.4% 94.2%
Balanced Striker 14.5% 63.7%
Grappling Willing 30.7% 30.6%
Grappling Dominant 75% 16.8%

If you’ve read Jason’s analysis, you’ll notice there’s one small difference: the "grappling willing" class. Jason found a class called "balanced grappler" instead, which corresponds to 42% control time and 37% striking time. On the other hand, this new "grappling willing" class corresponds to a fighter who spends about 1/3 of the round on their feet, in control and on their back. The key difference here is that this is a fighter who is willing to grapple, as he spends around 60% of the round on the ground, but is only on top half the time.

This is an important distinction because there are, in fact, a lot of fighters who like to grapple even if they are not always on top. Think of guys like Tony Ferguson or Brian Ortega. In fact, even if a fighter does mind being on their back, you’ll still see a lot of fights where the grappling is close but both fighters are still willing to engage in that area.

Other than that, we’ve also better separated the heavy approaches and now have more robust average values. In the future, I will hone Jason's work further on the tactical side, so stay tuned!