A Cultural Trend I Understood
I am a pop culture expert. And by that, I mean I am completely oblivious to pop culture. I never watch the news, I don’t keep up with the latest trends, and I rarely use Facebook.
I do, however, listen to pop music. I jam out to the latest Taylor Swift when I exercise/read/contemplate the infinitesimally small impact of my life as compared to the whole of the universe.
So when I stumbled across a video about the Millennial Whoop, I was fascinated. This was a culturally relevant topic I could actually relate to. In a nutshell, the Millennial Whoop is a “Wa-oh-wa-oh” sound pattern that is a trending in pop songs – from the Kings of Leon’s “Use Somebody” to Katy Perry’s “California Gurls”. Watch the MTV Video Music awards or visit a local middle school dance and you’ll instantly recognize the sound.
The video made me wonder if there were other recognizable patterns in popular music. I stumbled upon the Four Chord theory – a chord progression (I-V-vi-IV) commonly used to produce Billboard hits. The Axis of Awesome’s popular (and awesome) music video demonstrates the chords in action and an analysis of 1300 pop songs proves that the four chords are indeed the most widely used chords.
In fact, the homogenization of popular music is a widely documented trend. Researchers analyzing a dataset of over 450,000 music recordings determined that popular music has become blander in terms of chords, number of novel harmony transitions, and “timbral palette” (sounds instruments). The study also proved that music has become louder as record producers compete to grab your attention.
Aside from patterns in the sound of popular music, I wondered if there were any patterns within song lyrics as well. Using a database of lyrics from pop songs of the last 50 years, I used a clustering algorithm to group songs based off their lyrical content (methodology at the end). Here are the clusters that emerged.
X-RATED SONGS (AKA BANNED MIDDLE SCHOOL SONGS)
Common Words: nigga, bitch, money, cause, right, started
Example Songs: Thong Song, Big Pimpin, In Da Club, Get Low, Moment 4 Life, No Mediocre, Believe Me, Rack City, Truffle Butter, Coco, Back to Back
The cluster with the dirtiest lyrics was also the most consistent. The X-Rated Songs cluster (aka Banned Middle School Songs) showed up every time I ran the analysis.
From “Thong Song” to “Rack City” to “Truffle Butter”, these songs ensured that you were constantly turning to Urban Dictionary to expand your vocabulary.
DANCE ANTHEMS (AKA SONGS THAT GET YOU TO DANCE BY TELLING YOU TO DANCE)
Common Words: dance, shake, floor, cause, girls, thing,
Example Songs: Shake Ya Ass, I Hope you Dance, I Hope You Dance, Rock Your Body, Lose Control, Lean Wit It Rock Wit It, Pop Champagne, In The Dark, Shower, Shut Up and Dance, Uma Thurman
The only thing people like more than dancing to music is dancing to music explicitly telling them to dance. There was an entire cluster of songs whose only purpose was to get people to shake their body parts.
Songs ranged from the encouraging (“I Hope You Dance”) to the demanding (“Shut Up and Dance”) to the downright instructional (“Lean Wit it Rock Wit it”).
SONGS ABOUT TONIGHT/THE NIGHT (AKA SONGS THAT ENCOURAGE SMART DECISION MAKING
Common Words: night, tonight, right, little, cause, alright
Example Songs: Waiting for Tonight, Its Five O’Clock Somewhere, Overnight Celebrity, Day N Nite, Last Friday Night, Die Young, Tonight Tonight, I Gotta Feeling, I Love College, Hotling Bling, Time of Our Lives, Heartbeat Song
Tonight is important and these songs won't let you forget it. While some songs simply celebrated the night for its possibilities (“Die Young”, “I Gotta Feeling”, “I Love College”), the majority of songs asked you to celebrate the night in a specific way (“Give Me Just One Night”, “Tonight I’m Lovin You”)...with someone else…if you get my drift... .
BREAKUP SONGS (AKA “WE NO LONGER HAVE A RELATIONSHIP BUT I STILL HAVE FEELINGS FOR YOU” SONGS)
Words clustered on: heart, cause, breaking, every, leave, believe
Example songs: Show Me the Meaning of Being Lonely, Gotta Get Thru This, Miss You, Everytime We Touch, Angel, Teenage Dream, Boulevard of Broken Dreams, From the Bottom of My Broken Hearts, Bleeding Love, Brokenhearted, Elastic Heart
These songs are about offering your heart to someone (“Angel”, “Everytime We Touch”, “Teenage Dream”) and watching it be shattered into a million little pieces, ground into fine dust, and scattered across the Eastern Seaboard. (“From the Bottom of My Broken heart”, “Bleeding Love”, “Brokenhearted”)
Hopefully the stacks of money these artists made can help mend their broken hearts.
LOVE BALLADS (AKA SONGS GUYS PLAY TO APPEAR DEEP)
Words clustered on: thing, everything, world, something, friend, cause
Example Songs: Kryptonite, Chasing Cars, Hey There Delilah, Viva La Vida, Replay, Tattoo, Sweetest Girl, Payphone, Counting Stars, Hey Brothers, Runaway Love, Dear Future Husband
Reconnect with your sensitive side through this cluster of songs. There’s a lot of rock/alternative hits every teenage boy knows how to play on guitar -“Chasing Cars”, “Hey There Delilah”, and “Viva La Vida”.
CRAZY SONGS (AKA “IS THIS PROMOTING A HEALTHY RELATIONSHIP” SONGS)
Common Words: think, crazy, cause, dreamed, better, maybe
Example Songs: A Thousand Miles, Just Like A Pill, Can’t Get You Out of my Head, Toxic, Lovestoned, Your Love is my Drug, Irreplaceable, Before He Cheats, Womanizer, Cooler Than Me, Bad Blood
There’s a thin, light grey, dotted line between being crazy about someone and just plain crazy. This grouping of songs straddle that line. Some songs use slightly disturbing analogies to describe the artist's love life (“Toxic”, “Lovestoned”, “Your Love is My Drug”).
Other songs talk about the consequences of ending a relationship - from the relatively tame “Irreplaceable” (“Everything you own in a box to the left”) to the much more aggressive “Before He Cheats” (“I dug my key into the side / Of his pretty little souped-up four-wheel drive”)
BONUS CLUSTER: SONGS ABOUT “LIGHTS” AKA “SERIOUSLY, JUST SONGS ABOUT LIGHTS”
Common Words: lights, shine, cause, night, dreamed, inside
Example Songs: Gimme the Light, Green Light, Firework, All of the Lights, Diamonds, Get Your Shine On, Dynamite, Flashing Lights, Shower
This cluster only showed up once when I ran my clustering program but it was such a unique cluster that I thought I would include it.
All the songs talked about lighting in some way – yes, lighting. (“Gimme the Light,” “Flashing Lights,” “All of the Lights,” “Diamonds”).
There’s definitely opportunities to dig deeper into the actual meanings of the lyrics rather than just the words. I also think it would be interesting to see how the lyrics of pop songs have changed over time.
Nevertheless, I was pleasantly surprised by how well the clusters turned out. Although there were songs that didn’t perfectly fit into any of the clusters, the clusters that did emerge had clear patterns and were fairly distinct.
Other Articles You May Enjoy
My methodology was inspired by a tutorial on clustering movies based off their synopses. I applied the same idea to song lyrics.
Someone had already created a dataset of pop song lyrics. In fact, Kaylin did much of the preliminary analysis I was planning on doing and seems way better at data analysis than I am. That's cool too I guess.
For each song in the dataset, I..
- Removed “stopwords".
Words like “the”, “a”, “if”, etc.
- Stemmed the lyrics
Words like “loves” and “loved” were reduced to “love”
- Removed words with less than 5 letters
While not perfect, this was a quick way to filter out any remaining words that don’t really have any meaning (“don’t”, “want”, etc).
- Created a tf-idf matrix for the “corpus” of lyrics.
This is a fancy way of saying I determined the similarity of two songs by looking at words that were common between the lyrics of the two songs but not common across every song.
- Clustered the songs
I grouped similar songs together and found the words that the groupings were based on.
After testing different numbers of clusters, I found that 7 clusters worked reasonably well in providing meaningful clusters.
Because clusters aren't completely stable, I ran the clustering program 5 times and compared the outputted clusters from each trial against each other.