The Lyrics Repetition Analyzer
I have just returned from a two week vacation, which contained a sizeable amount of road tripping. And with every great road trip comes great tunes!
One of the albums I was excited to be listening to was The Arcade Fire's newest release, called Everything Now. It has received some mixed, but I try to keep an open mind.
Upon first listen I actually really liked the album (and will listen many more times), but something stuck out to me. It seemed like several of their songs were just the same couple of words repeated for the whole track. The instrumentals were great, but the lyrics seemed like an after thought in a few songs.
Now "same couple of words repeated" is actually a pretty decent description of song lyrics in general. So I was curious if this was an accurate observation, or just my perception being critically biased in light of reviews.
"I should pull the lyrics of the album, do some word counting, and compare to past albums" I proclaimed. My passenger reacted with mild to light enthusiasm.
As the idea matured, I decided word counting is not the best metric. Repeating the same few words throughout an album is not a problem if their order is varying. So instead I looked at groupings of words, counting duplicate pairs, triplets, and quadruplets of word groupings.
Here is what I got:
In the figure above, you can see the duplicate rate of pairs, triples, and quadruplets as a percentage of all words. As expected, Everything Now had a higher rate in all cases. Furthermore, it also had a higher relative rate of triplets and quadruplets than the other albums (green and blue bars were closer to the red bar height).
So this gives me peace of mind knowing in fact they were repeating themselves more, and I wasn't just influenced by the reviews.
Beyond The Arcade Fire
Throwing this graph together would have been a pretty easy task if I assembled all the data by hand, generate some numbers, and used a tool like google docs or excel to plot the graph. That would require a lot of work for each new artist though, and my burgeoning career as a music critic requires efficiency.
So I decided to automate the whole process by artist. This was done using a python script that will take a musician's name, pull all their available lyrics using the Genius API / web scraping (based on this Bigish Data post), gather statistics for each album using the Natural Language Tool Kit, and plot a bar graph using matplotlib.
This allowed me to look into several more bands, trying to spot trends and worst offenders. Here are a few more.
From these graphs we can see that Daft Punk repeats themselves a lot (as expected), that T Swift's most repetitious album by a long shot was 1989, and that The Beatles have a large variation.
Try it yourself
If you'd like to run this script yourself to track down albums, or modify it to count something, the first step is to register with Genius to get an access token by following this link.
Click the "Generate Access Token" button. Copy the client id, client secret, and access token to a text file for future use.
Now you need to make sure you have the following python modules installed:
nltk, counter, requests, matplotlib, BeautifulSoup (bs4), and counter.
These can been installed by running the following commands with python version 3.x:
$> python -m pip install requests
$> python -m pip install bs4
$> python -m pip install matplotlib
$> python -m pip install counter
nltk (extra step):
$> python -m pip install -U nltk
$> python -m textblob.download_corpora
Copy the Scripts
Once the dependencies on your system are in place, you need to grab two python files and put them in the same directory.
File 1: album_counter.py
File 2: plot_albums.py
In album_counter.py, edit the top 3 lines to contain your client id, client secret, and access token for the Genius API.
Run the Script
With these changes made, you should be properly authorized to run the script and generate artist info. This is done as follows (from the directory of the python files):
$> python album_counter.py "Arcade Fire"
This will run for a minute or two, printing out each album as it gets scraped. When the script completes, there will be a file called "Arcade Fire.png" and "Arcade Fire.csv" in your running directory. This is the graph output, and a csv file incase you would like to modify the visualization process yourself.