The Library is Closed

With the finale aired and a new Superstar crowned, I take some time to see how the algorithms perform overall, comparing predictions across all eight seasons. If you missed my earlier blog posts, see the first to understand what’s going on in the rest of this post.

The finale aired a couple weeks ago and we got our new Drag Race Superstar: Bob the Drag Queen!giphy_5

The finale brought many great moments, including Carol Channing raving over Bob’s Snatch Game performance, Nancy Grace forcefully claiming Acid Betty was robbed in that same challenge, and Margaret Cho wishing Kim Chi luck. Speaking of Kim Chi, each of the top three performed a choreographed lip-sync to a song written specifically for them by Lucian Piane, and Kim Chi slayed despite not being able to dance.

giphy_3 (2)

tumblr_o7c042veuf1qiohboo2_r1_400

Also, Violet Chachki reminded us why she deserved to win last season with her high fash-on!

giphy_4

Final Predictions

So how did the algorithms do overall? The table below contains final predictions. Overall, not too bad. Kim Chi and Bob were both predicted to come out on top, with only one algorithm each predicting 2nd instead of first. The best performing algorithm was the Gaussian Naive Bayes, with a rank score of 0.932!

tumblr_n4fmksmy3h1s4jfn9o1_500

NamePlaceSeasonGNBNNRFCRFRSVC
Bob the Drag Queen1811121
Kim Chi2811211
Naomi Smalls3813763
Chi Chi DeVayne4854345
Derrick Barry5844434
Thorgy Thor6874679
Robbie Turner7854455
Acid Betty8889989
Naysha Lopez98898107
Cynthia Lee Fontaine108811121111
Dax ExclamationPoint1181212101211
Laila McQueen1181181097
Rank Score0.9320.8980.8680.8980.857

Predicting Other Seasons

Throughout the season I had been training the algorithms on data from seasons 1-6, testing on season 7, then predicting season 8. But I was curious how each algorithm would do at predicting all the other seasons. So I wrote a script that would step through each algorithm and season, training the algorithm on all the seasons other than the one it was predicting, and using this to predicting the relevant season. So for season one, all the algorithms have been trained on seasons 2-8, and then asked to predict season 1. I generated the following figure to plot the rank scores for each season for each algorithm.

season_ranks

There’s a few interesting things going on here. Season 4 was especially tough to predict for Gaussian Naive Bayes and for the Random Forest Classifier. Season 7 was also a challenge for these algorithms. Support Vector Machines and Random Forest Regressor are both consistently good at predicting seasons, and Neural Network gets good past season 3. So what did each algorithm predict would happen in each season? Let’s find out.

Season 1

NamePlaceSeasonGNBNNRFCRFRSVC
Bebe Zahara Benet1111121
Nina Flowers2133442
Rebecca Glasscock3144334
Shannel4155775
Ongina5111112
Jade6167667
Akashia7165555
Tammie Brown8169888
Victora "Porkchop" Parker9198999

The algorithms all seemed to agree that BeBe Zahara Benet would come out on top, but they also predicted that Ongina would join her there. Ongina’s elimintation in season one was perhaps one of the hardest for RuPaul in all eight seasons, having to excuse herself from the set before she could decide who to send home.

tumblr_mkyeb8IG3F1s6glxso1_500

Season 2

NamePlaceSeasonGNBNNRFCRFRSVC
Tyra Sanchez1211211
Raven2232522
Jujubee3234756
Tatianna4236434
Pandora Boxx5234185
Jessica Wild6227242
Sahara Davenport7232777
Morgan McMichaels82311968
Sonique921086108
Mystique102981198
Nicole Paige Brooks1121211121212
Shangela-2122101091111

For season 2, the algorithms were pretty much in agreement about who would win. No other queens comes close to have the average ranking as Tyra Sanchez.

giphy

Season 3

NamePlaceSeasonGNBNNRFCRFRSVC
Raja1314122
Manila Luzon2311111
Alexis Mateo3345333
Yara Sofia4342455
Carmen Carrera5367677
Shangela-36335543
Delta Work73897108
Stacy Layne Matthews8382768
Mariah93681096
India Ferrah1031110131310
Mimi Imfurst1138109811
Phoenix1231110111211
Venus D-Lite1331110111111

What’s really interesting in season 3 is that all five algorithms agreed that Manila Luzon would be taking home the crown, but she placed second after Raja. By the end of the season, Manila’s and Raja’s win/high/low/lipsync profile was identical, and so the three algorithms not placing Raja first must be dinging her on her age (Raja was 36 during the season, while Manila was 28). I’d also like to point out that favorite among my friends, Stacy Layne Mathews, was predicted to come in second by the Neural Network.

tumblr_nf7or1CPru1s2b0gro1_500

Season 4

NamePlaceSeasonGNBNNRFCRFRSVC
Sharon Needles1411211
Chad Michaels2412123
Phi Phi O'Hara3414232
Latrice Royale4494443
Kenya Michaels54431395
DiDa Ritz641288105
Willam7457759
Jiggy Caliente84104585
Milan94510575
Madame LaQueer104118869
The Princess11451081113
Lashauwn Beyond1245128129
Alisa Summers13412128139

The algorithms in season 4 had a consensus that Sharon Needles would be taking home the crown, with Chad Michaels a close second (Chad won the first season of Drag Race All Stars shortly after season 4). Interestingly, even though Willam was eliminated at seventh place for violating the rules (her husband was making unauthorized conjugal visits to the hotel they were sequestered in), and so presumably would have made it further in the season, the algorithms predicted she would place approximately where she actually placed.

tumblr_minrmePvxm1r9g27ko1_r2_500

Season 5

NamePlaceSeasonGNBNNRFCRFRSVC
Jinkx Monsoon1511131
Alaska2512213
Roxxxy Andrews3515221
Detox4553546
Coco Montrese5587885
Alyssa Edwards6573776
Ivy Winters7515253
Jade Jolie851498108
Lineysha Sparx95586610
Honey Mahogany1051014141311
Vivienne Pinay115109898
Monica Beverly Hillz1251011121111
Serena ChaCha1351013121211
Penny Traition14591181411

Season 5 had the algorithms placing the top three in the right order, on average. What seems to have dragged the rank scores down for this season is how far Honey Mahogany made it – she was predicted to have gone home first by most of the algorithms. Coco Montrese seems to have made it further than predicted, possibly because the drama between her and Alyssa Edwards made such good TV (Also, Coco’s best moments were when she lip synced).

9805109_orig

Season 6

NamePlaceSeasonGNBNNRFCRFRSVC
Bianca Del Rio1611111
Adore Delano2611323
Courtney Act3613131
Darienne Lake4674454
BenDeLaCreme5614544
Joslyn Fox6674567
Trinity K. Bonet7697588
Laganja Estranja8659879
Milk961081096
Gia Gunn10612914139
April Carrion116599109
Vivacious1261012111212
Magnolia Crawford1361213111412
Kelly Mantle1461414111112

Season 6 was easy to predict. It was pretty obvious from the first time Bianca Del Rio walked into the workroom she would taking home the crown. Each episode after merely confirmed the inevitable, and the algorithms agree. All five predicted Bianca to come in first.

erumzPF

Season 7

NamePlaceSeasonGNBNNRFCRFRSVC
Voilet Chachki1711311
Ginger Minj2781122
Pearl3713666
Kennedy Davenport4714352
Katya5714332
Trixie Mattel6787876
Miss Fame77686105
Jaidynn Diore Fierce8786896
Max9719139
Kandy Ho10761011810
Mrs. Kasha Davis117131381310
Jasmine Masters1271110131110
Sasha Belle1371110111210
Tempest DuJour1471314131410

The algorithms had a difficult time deciding who would be in the top three in season 7. Max was predicted to be there by three algorithms, despite going home relatively early in the season. Four algorithms predicted Katya making it to the top three, while only two predicted Pearl getting so high. This actually tracks pretty well with what people were expecting early on in the season – Pearl was going to go home early, Max would make it pretty far, and Katya was going to make it to the top.

tumblr_nniwf6tVWI1qiohboo9_500

 

Overall, Support Vector Machine has the highest average rank score across all eight seasons, followed by Neural Network, Random Forest Regressor, Random Forest Classifier, and Gaussian Naive Bayes coming in last. Her performance in season 8 was largely a fluke compared to her performance the rest of the seasons.

This was a fun exercise for me. I learned more about how each of the machine learning algorithms worked, I practiced using python for data analysis, and I got to use a whole bunch of gifs of drag queens. You can find the code I used for both the weekly predictions, as well as the predictions in this blog post, on my github.