For "Rest of You," we were assigned to do text analysis on some part of our digital footprint. I thought that the music I listen to could provide some very revealing data. The music we listen to can become very personal in its meaning, and music is a very emotionally powerful medium. Song lyrics are an unusual genre of text, but I suspect they can reveal a lot.

I wrote a Python script to collect the lyrics that I most often hear. First, the script uses Spotify's API to fetch a list of my top 50 most listened songs in the last 6 months (50 is a limit set by the API, so unfortunately the corpus of text I'm working with is a little small). Then, the script scrapes Genius for the song lyrics. This is actually a combination of calling the Genius API and then scraping the lyrics webpage for each song - Genius provides an API, but for copyright reasons won't serve the lyrics through the API. Last, it does a little bit of cleaning up of the text - for instance, removing things like "[Chorus]", and combines it all into one document.

I was interested in discovering insights into my psyche, not just seeing which words I hear the most. So I decided to use the LIWC linguistic analysis tool. Here are some of the results I found most interesting - chosen from the dozens of measures the tool produces.

Measure Value
Word Count 10224
Analytical Thinking (0-100) 46.73
Clout (0-100) 70.92
Authenticity (0-100) 65.23
Emotional Tone (0 negative, 100 positive) 33.53
I 6.13
We 2.07
You 3.52
She/He 1.31
They 0.54
Anxiety 0.36
Anger 0.78
Sadness 0.77
Social Words 11.5
Female Referrents 0.94
Male Referrents 1.17
Cognitive Processes 9.44
Perceptual Processes 3.6
Body 1.76
Health/Illness 0.84
Sexuality 0.12
Drive/Need: Affiliation 2.95
Drive/Need: Achievement 0.93
Drive/Need: Power 2.8
Reward Focus 1.76
Risk Prevention Focus 0.58
Past Focus 2.63
Present Focus 14.2
Future Focus 1.84

Apart from the four "summary values" - analytical thinking, clout, authenticity, and emotional tone - which LIWC scales to percentages, it's hard to make much sense of these numbers without a baseline to compare to. So, I went to Spotify Charts and downloaded a table of the top songs in the U.S. (as of Feb 9, 2019). I ran these songs through the same Python scripts against the Genius API, and analyzed their lyrics with LIWC as well. If the song lyrics I listen to the most are a proxy for my mind, we'll treat the lyrics from the top 200 chart as a proxy for the average American mind.

Measure My Songs Top Charting Songs
Word Count 10224 85429
Analytical Thinking (0-100) 46.73 41.63
Clout (0-100) 70.92 47.35
Authenticity (0-100) 65.23 64.53
Emotional Tone (0 negative, 100 positive) 33.53 14.63
I 6.13 8.95
We 2.07 0.84
You 3.52 3.92
She/He 1.31 1.22
They 0.54 0.47
Anxiety 0.36 0.18
Anger 0.78 1.81
Sadness 0.77 0.52
Social Words 11.5 11.06
Female Referrents 0.94 1.49
Male Referrents 1.17 0.72
Cognitive Processes 9.44 8.58
Perceptual Processes 3.6 3.31
Body 1.76 1.71
Health/Illness 0.84 0.63
Sexuality 0.12 0.66
Drive/Need: Affiliation 2.95 2
Drive/Need: Achievement 0.93 0.73
Drive/Need: Power 2.8 2.54
Reward Focus 1.76 1.75
Risk Prevention Focus 0.58 0.47
Past Focus 2.63 3.11
Present Focus 14.2 12.51
Future Focus 1.84 1.68

I was surprised to find that the emotional tone of my music is over twice as positive (or rather, less negative - 34 vs 15, but both well below 50%) as the theoretical average American's. I was also surprised that "analytical thinking" scored about the same, since I'm a bit of a snob and tend to look down on popular "top 40" type music as lacking lyrical depth. Maybe I'm wrong about that. It's not surprising to find similar "authenticity" scores - so much of music is confessional and emotionally vulnerable, I would expect a high score there in any case.

The only dramatic difference between myself and the average on the summary measures is "clout," and wow what a difference - 71 for me, 47 average. Either I think very highly of myself and listen to music that reflects that, or I seek powerful-sounding music to compensate for a feeling of not having status or confidence. Speaking from my own self-knowledge, it's probably both. I'm very confident in certain areas of my life, but feel very low-status in others. I'm ambitious and view myself as having extraordinary potential, but also set overly high expectations for myself and face constant self-doubt - and I often use music as a way of renewing confidence. Looking through my list of top songs, I see a few from my "Pump Up" playlist that just ooze confidence, even arrogance: Dessa's "5 out of 6" and "Fighting Fish," Cinematic Orchestra (Roots Manuva, really, for the lyrics) "All Things to All Men."

This page at the LIWC website describes the meaning of the four summary measures.

What else stands out? I don't use as many "I" phrases as average, but more "we" - Pennebaker would call that a high-clout thing as well. I'm unsurprisingly almost twice as high in anxiety as average, and noticeably higher in sadness. I'm a little surprised that my anger score is very low versus the average American, but then I've been realizing in my own therapeutic process recently that I'm very uncomfortable with my own anger and work very hard to keep it hidden - so I don't really listen to angry music. Apparently I like music that talks more about other men and less about women, versus the average. Honestly not sure what to make of that. I'm more concerned with health and illness than average, unsurprisingly. I'm not surprised that my music is low on the sexuality score - more repression there, if I had to summarize - but I am surprised by how comparatively extremely low it is. Although maybe that's more of a reflection on popular music. I'm not surprised to see myself as slightly high on all of the "drives/needs," and especially high on "need for affiliation." I'm surprised that I don't skew more towards "risk prevention" than I do, and I'm also surprised that my "past focus" isn't even lower - I'd describe my relationship to past, present and future as: uncomfortable with nostalgia because I'm constantly trying to put the past behind me so that I can transform myself into a better future because I'm generally unhappy and stressed out in the present.

I'm not sure I learned anything about myself with this exercise that I didn't already know, but certain things were very striking.

You can get my Python scripts here. I followed this example for key parts of it.