Welcome to my website :)

🎵 About the project

While going through a break-up, I tried to distract myself with music - just like most of us do. However, I quickly realised that songs weren't of much help: almost all of them were cherishing love and romantic feelings.

I started wondering:

Is love indeed the most common topic in all countries and languages?

Can it be so that in different countries, different topics are preferred? One could blindly assume the topic of money to be popular in the US, and the topic of politics, for example, in Russia.

What languages of songs are popular in different countries?

And since I had to choose a project for my Computational Linguisitcs class, I decided to explore this idea.

💬 Results

After looking through all the countries using topic modeling, I distinguished the following most common themes:

Topic 1

Love-related emotions/Relationships (who could have guessed?)

Topic 2

Urban Life/ Hip-Hop culture/ Gangsta life

Topic 3

Self-reflection, abstract topics

Topic 4

"Find", "better", "cry", "hurt", "stay"... Looks like a heartbreak to me

Topic 5

"Night", "crazy", "kiss", "tequila", "dancing"... Sounds like a wild party!

Topic 6

This topic appeared in Egypt only and deviates notably from other common topics.

Topic 7

This topic appeared in Greece only and deviates notably from other common topics.

Topic 8

This is a data artifact containing song and artist names.

Topic 9

This is a data artifact containing poorly translated words from low-resource languages.

As you can see, love in different forms (and money) indeed rules the chart in practically all the countries!

To my disappointment, I didn't find very specific topics, like "food", "family", or something even more unexpected. Well, maybe, the topics weren't that diverse to begin with? Maybe the most popular songs usually don’t mention anything too eccentric?

However, there were singular cases that diverged from the common topics. I would name them:

Politics and religion - a topic observed in Egypt

Violence and conflict - a topic observed in Greece

Sometimes clusters were also formed of noise and artifacts:

Music and artist names

Untranslated words (for low-resource languages)

I took top-200 songs popular in each of the 38 countries. To select the songs, I used Spotify's weekly total charts and selected the most popular songs of all times* for that country.

Then I scraped the lyrics, translated them to English and preprocessed using basic techniques (tokenization, lemmatization, lowercasing, deleting punctuation and stop-words**). Then I applied topic modeling using LDA (Latent Dirichlet Allocation) to get the topics for each country.

Tools utilized:

Genius API - to scrape the lyrics of the songs;
Google Translate API - to translate the lyrics to English;
Gensim library - for topic modeling;
pyLDAvis library - to visualize the results;
Kworb.net - for the names of the songs

The very first results were pretty meaningless (image below), so I had to go through many iterations of preprocessing to start seeing some coherence it topics.

Just like in many other unsupervised methods, in LDA, we can't directly see the name or the topic of the cluster: instead, we have to label them ourselves. Naming the topics was one of the hardest parts! Click on the results to see what I found out.

*from the moment Spotify entered the country till March 2023

**I also used filter_extremes function from gensim module to eliminate the most common words (that are met in more than 50% of the documents) and the least common, rather eccentric/too specific words (that are met in less than 3 documents) – for example, names of the entities, rare words (like “pathointelligence” or “bermuda triangle”), etc. The function also takes only the top N words from the remaining corpus (in my case, N was set to 15 000).

🎶 Most popular song topics around the world 🌍

Discover most popular song topics

Compare most popular song languages

Explore in detail the topics of each country