With the finale aired and a new Superstar crowned, I take some time to see how the algorithms perform overall, comparing predictions across all eight seasons. If you missed my earlier blog posts, see the first to understand what’s going on in the rest of this post.
The finale brought many great moments, including Carol Channing raving over Bob’s Snatch Game performance, Nancy Grace forcefully claiming Acid Betty was robbed in that same challenge, and Margaret Cho wishing Kim Chi luck. Speaking of Kim Chi, each of the top three performed a choreographed lip-sync to a song written specifically for them by Lucian Piane, and Kim Chi slayed despite not being able to dance.
Also, Violet Chachki reminded us why she deserved to win last season with her high fash-on!
So how did the algorithms do overall? The table below contains final predictions. Overall, not too bad. Kim Chi and Bob were both predicted to come out on top, with only one algorithm each predicting 2nd instead of first. The best performing algorithm was the Gaussian Naive Bayes, with a rank score of 0.932!
|Bob the Drag Queen||1||8||1||1||1||2||1|
|Chi Chi DeVayne||4||8||5||4||3||4||5|
|Cynthia Lee Fontaine||10||8||8||11||12||11||11|
Predicting Other Seasons
Throughout the season I had been training the algorithms on data from seasons 1-6, testing on season 7, then predicting season 8. But I was curious how each algorithm would do at predicting all the other seasons. So I wrote a script that would step through each algorithm and season, training the algorithm on all the seasons other than the one it was predicting, and using this to predicting the relevant season. So for season one, all the algorithms have been trained on seasons 2-8, and then asked to predict season 1. I generated the following figure to plot the rank scores for each season for each algorithm.
There’s a few interesting things going on here. Season 4 was especially tough to predict for Gaussian Naive Bayes and for the Random Forest Classifier. Season 7 was also a challenge for these algorithms. Support Vector Machines and Random Forest Regressor are both consistently good at predicting seasons, and Neural Network gets good past season 3. So what did each algorithm predict would happen in each season? Let’s find out.
|Bebe Zahara Benet||1||1||1||1||1||2||1|
|Victora "Porkchop" Parker||9||1||9||8||9||9||9|
The algorithms all seemed to agree that BeBe Zahara Benet would come out on top, but they also predicted that Ongina would join her there. Ongina’s elimintation in season one was perhaps one of the hardest for RuPaul in all eight seasons, having to excuse herself from the set before she could decide who to send home.
|Nicole Paige Brooks||11||2||12||11||12||12||12|
For season 2, the algorithms were pretty much in agreement about who would win. No other queens comes close to have the average ranking as Tyra Sanchez.
|Stacy Layne Matthews||8||3||8||2||7||6||8|
What’s really interesting in season 3 is that all five algorithms agreed that Manila Luzon would be taking home the crown, but she placed second after Raja. By the end of the season, Manila’s and Raja’s win/high/low/lipsync profile was identical, and so the three algorithms not placing Raja first must be dinging her on her age (Raja was 36 during the season, while Manila was 28). I’d also like to point out that favorite among my friends, Stacy Layne Mathews, was predicted to come in second by the Neural Network.
|Phi Phi O'Hara||3||4||1||4||2||3||2|
The algorithms in season 4 had a consensus that Sharon Needles would be taking home the crown, with Chad Michaels a close second (Chad won the first season of Drag Race All Stars shortly after season 4). Interestingly, even though Willam was eliminated at seventh place for violating the rules (her husband was making unauthorized conjugal visits to the hotel they were sequestered in), and so presumably would have made it further in the season, the algorithms predicted she would place approximately where she actually placed.
|Monica Beverly Hillz||12||5||10||11||12||11||11|
Season 5 had the algorithms placing the top three in the right order, on average. What seems to have dragged the rank scores down for this season is how far Honey Mahogany made it – she was predicted to have gone home first by most of the algorithms. Coco Montrese seems to have made it further than predicted, possibly because the drama between her and Alyssa Edwards made such good TV (Also, Coco’s best moments were when she lip synced).
|Bianca Del Rio||1||6||1||1||1||1||1|
|Trinity K. Bonet||7||6||9||7||5||8||8|
Season 6 was easy to predict. It was pretty obvious from the first time Bianca Del Rio walked into the workroom she would taking home the crown. Each episode after merely confirmed the inevitable, and the algorithms agree. All five predicted Bianca to come in first.
|Jaidynn Diore Fierce||8||7||8||6||8||9||6|
|Mrs. Kasha Davis||11||7||13||13||8||13||10|
The algorithms had a difficult time deciding who would be in the top three in season 7. Max was predicted to be there by three algorithms, despite going home relatively early in the season. Four algorithms predicted Katya making it to the top three, while only two predicted Pearl getting so high. This actually tracks pretty well with what people were expecting early on in the season – Pearl was going to go home early, Max would make it pretty far, and Katya was going to make it to the top.
Overall, Support Vector Machine has the highest average rank score across all eight seasons, followed by Neural Network, Random Forest Regressor, Random Forest Classifier, and Gaussian Naive Bayes coming in last. Her performance in season 8 was largely a fluke compared to her performance the rest of the seasons.
This was a fun exercise for me. I learned more about how each of the machine learning algorithms worked, I practiced using python for data analysis, and I got to use a whole bunch of gifs of drag queens. You can find the code I used for both the weekly predictions, as well as the predictions in this blog post, on my github.