The Library is Open: Season 9

Season 9 of RuPaul’s Drag Race has started! Drag Race is a reality competition show hosted by the legendary drag queen RuPaul to find the next drag superstar. Each season, between 12 and 14 queens compete in challenges, which could consist of sewing together new looks for the runway, acting in parody scenes of gay cult classics, singing, or dancing. At the end of the episode, one queen is named the winner of the challenge and the two queens who fall in the bottom must lip-sync for their lives. They perform a lip-sync to a pre-chosen song on the runway for RuPaul, and whoever impresses most gets to stay, while the other is asked to sashay away.

Last year, I used machine learning algorithms to predict the outcome of season 8. I introduced five different algorithms to see how well each of them individually, as well as the average ranking overall, did at predicting the season. I figured I would use those same algorithms to predict season 9 as well. This exercise was originally inspired by Alex Hanna’s survival analysis of season 5 of Drag Race. I used the data she collected as a base for my own analysis and added the data for seasons 6-9. Briefly, the variables are:

  • Age of the queen
  • Whether the queen is Puerto Rican
  • Whether the queen is Plus Size
  • The total number of main challenges a queen won during the season
  • The total number of times a queen was among the top queens for the challenge, but did not win the challenge
  • The total number of times a queen was among the worst queens for the challenge, but did not lip-sync
  • The total number of times a queen had to lip-sync for her life (including the lip-sync that she sashayed away from)

Meet the Queens

Let’s begin the challenge by meeting our contestants. First up is Support Vector Machines, a classifier with a pretty intuitive algorithm. Imagine you plot points on a two-dimensional graph. Support vector machines (SVM) attempts to separate out the groups defined by the labels using a line or curve that maximizes the distance between the dividing line and the closest points. If you have more than two features (as is often the case), the same thing happens but in a higher dimensional space.

The next to enter the work room is Gaussian Naive Bayes, an algorithm that is not as intuitive as SVM, but faster and simpler to implement. Gaussian naive Bayes algorithms assume that the data for each label is generated from a simple gaussian (or normal) distribution. Using Bayes theorem, along with some simplifying assumptions (which makes it naive), this algorithm uses the features and labels to estimate the gaussian distributions which it uses to make its predictions.

Our third contestant is the Random Forest Classifier. Random forests are aggregations of decision trees (get it!?). Decision trees are classifying algorithms composed of a series of decision points, splitting the data at each decision point to try to properly classify the data. Think of a game of Guess Who or Twenty Questions – you ask a series of yes/no questions to try to sort possibilities into different bins. Decision trees work the same way, with any number of possible bins. The problem with decision trees is that they tend to overthink the data, meaning that they do a really good job of predicting the training data, but the decision points are specific to the training data and so they aren’t so good at predicting testing data. The solution is to split the training data itself into different subsets, create decision trees for each subset, and then average those trees together to create a “forest” that typically does a much better with testing data than a single tree.

The fourth contestant is the Random Forest Regressor, a drag sister of the random forest classifier, it works much the same way the classifier does, but rather than trying to predict unordered categories, it is predicting continuous values.


Say hello to Neural Network. Neural networks are a family of methods that roughly simulate connections between neurons in a biological brain. The neural network used here consists of neurons that take some number of values as inputs, applies weights to these values (that can be adjusted in the learning process), then applies these values to a logistic function to produce an output between 0 and 1. Neural networks consist of two or more layers of neurons (an input layer, an output layer, and zero or more hidden layers). The output layer must have the same number of neurons as the number of output values, but other layers can have any number of neurons. The network I use here has two layers, with five neurons in the input layer and fourteen neurons in the output layer (one for each possible place a queen could end up in).

The Story So Far

The season began with a number of twists. The first was that no queen would be eliminated in the episode. The second was that a queen from a previous season would coming back to the competition (though who it was would not be ruvealed until the second episode). As a result, I held off on asking the algorithms to make any predictions until we had our first eliminated queen.

The first episode introduced us to the 13 queens of season 9, as well as patron saint of drag queens herself, Lady GaGa. Lady GaGa stopped into the workroom along with the 13 queens, and it took a few minutes before the queens realized it was actually her. She was guest judge for this first episode and honestly looked like she was having the time of her life.

The queens were tasked with presenting two runway looks. The first, a look inspired by their hometown and the second a look inspired by one of Lady GaGa’s own looks. Nina Bo’Nina Brown slayed both looks and took home the first win of the season.

Just as the episode ends, RuPaul announces that a fourteenth queen will be returning from a previous season to join the season 9 competition. We week later, we find out its season 8 Miss Congeniality and fan favorite, Cynthia Lee Fontaine! A month after leaving season 8, she was diagnosed with cancer. But now her cancer is in remission and she’s ready to come slay season 9. Episode two has the queens split into two groups to perform a parody skit inspired by Bring It On. It is honestly meh – it doesn’t give the queens enough room for their personalities (comedy) to shine through, and so I’m still left wondering who they are underneath the drag. The runway is white party themed, and Valentina wins the challenge for her energy in the cheer routine and her bridal inspired runway look.


Jaymes Mansfeld was faulted for being forgettable in the cheer routine and Kimora Blac got clocked for her too-nautical runway look. They both had to lip-sync for their lives to the B-52’s Love Shack, and Jaymes was the first queen of the season to sashay away.

The Predictions

I did some preliminary predictions before the season started, and the top 3 were predicted to be Charlie Hides, Farrah Moan, and Jaymes Mansfield. Obviously They got Jaymes wrong. After the first two episodes, Charlie and Farrah don’t seem like the queens who would make it to top 3, but they could surprise me. Obviously these predictions were not based on how the queens actually performed in the season, so what are the predictions now that we’ve got some wins, highs, lows, and lip-syncs?

The figure above shows where each algorithm places each queen. Ties are clustered together. On the far right are predictions based on the average placement of each queen across the five algorithms. Nina Bo’Nina Brown is predicted to come in first, with Valentina and Trinity Taylor filling out the top 3. Jaymes, who went home first, is predicted to last until midway through the season until she goes home in a double elimination with Farrah Moan. The next queen predicted to go home is Peppermint, though my instincts tell me she’s likely to go further than predicted.

So far, there’s simply not enough data to make accurate predictions, both with the algorithms and with my own gut instinct. It’ll be interesting to see how things progress through the season.

Further Reading