Spotify listen without affecting recommendations

Uncovering How the Spotify Algorithm Works

Whats behind the Spotify Algorithm? From NLP over content based filtering to collaborative filtering.

Hucker Marius

Nội dung chính Show

Uncovering How the Spotify Algorithm Works
Whats behind the Spotify Algorithm? From NLP over content based filtering to collaborative filtering.
Spotify Home Screen
Spotifys Acquisitions
Recommender System
BaRT of Spotify Explained Whats that?
How Spotify solves the cold start problem
Natural Language Processing and Music Intelligence

Nov 23·12 min read

Source: Photo by Alexander Shatov on Unsplash

In 2008 Spotify started changing the world around music by introducing music streaming. Since then music on CDs and DVDs has left all our lives and the music industry changed a whole lot.

Nowadays, Spotify is the biggest player (with 365 Mio. users and 165 Mio. subscribers [1] ) in the music streaming market but has to maintain its position between American giants like Apple (Apple Music), Amazon (Amazon Music), and Google (YouTube Music). To do so, two peer groups have to be in the focus of Spotify: artists and users. To deliver the best service to them, there is one thing at the heart of Spotify: Algorithms & Machine Learning. The better Spotify understands the users and the greater the customer experience is, the more users can be convinced, converted to paying customers, and held as customers. In other words, data and algorithms are Spotifys opportunity to not be crunched between Apple, Amazon, and Google and so far they do a pretty good job. So, lets take a closer look at how the Spotify algorithm works, but first the basics:

Spotify Home Screen

At the center of the Spotify, recommendation system is the home screen which is peppered with many customized playlists and recommendations. The recommended playlists include Discover weekly, B Side, Release Radar, your mixtapes, and many more. Other sections of the cleverly arranged home screen are Jump Back In, Recently Played, or Recommended for today. The home screen is created and curated by an AI called BaRT (Bandits for Recommendations as Treatments), which will be explained in detail further. Spotifys sections like Jump back in are also called Shelves.

Where the recommendation system appears. Source: Own image.

Next to these recommended playlists, there are the sections Recommended songs (automatic playlist continuation) and you might also like below each playlist and each album. So, you can find almost on every Spotify screen a customized section.

To recommend music to customers and to predict fitting songs on and off the home screen, Spotify has to rely on data. So lets take a look at the available data sources.

Data

The data gathered and used by Spotify is quite extensive. At Spotify, almost everything is tracked. You might have experienced it gently: Spotifys year in review. In this section you can see the minutes you listened to over the year, all genres, preferred artists, newcomers, top playlists, favorite podcasts, and honestly, this is only the tip of the iceberg. In the case of Spotify, most users are okay with being tracked because they benefit from a great user experience, nice stats, and top-notch recommendations.

Spotifys year in review. Source: Own image.

Now lets take a more detailed look at the data gathered. Lets start with the artists. Of course, Spotify stores all data entered by the artists: song names, description, genre, images, lyrics, and song files. Next to this sort of data entered from the provider side, Spotify gathers and tracks the data of the counterpart, the consumers. This data comprises consumers listening history, skipped songs, how often a song has been played, playlists stored, music downloaded, social interactions such as shared playlists or sharing music, and more variables. Next to these two sources of internal data, Spotify probably also uses external data such as articles, blog posts, or other text data about songs or artists-related topics.

Data & Models Spotify uses. Source: Own image

Spotifys Acquisitions

2014 Spotify acquired Echonest, a MIT related start-up for music intelligence, for 100 million dollars. Echonest had already 2014 more than 1 billiard of data about songs & artists. [4]

In 2012, the Echonest founder Brian Whitman said that their system is scoring 10 million music-related websites every single day to analyze what is trending and what is going on on the music market. Next to Echonest, Spotify strengthened its M&A strategy and acquired in 2015 Seed Scientific, a data science and analytics consultancy to gain knowledge and expertise in-house.[5] Another major acquisition on their AI path was in 2017 when Spotify incorporated Sonalytic, an audio detection startup. [6] Audio detection you might know from Shazam or similar apps that help you identify songs from which you do not know the name. Furthermore, audio detection systems can also be used for the prevention of copyright violations. Spotify uses audio detection to enhance personalization in playlists and songs, to match songs with compositions and to improve its publishing data system. [6]

Acuqisitions of Spotify. Source: Own image.

All of the mentioned acquisitions have been and are the fundament for Spotify to build its sophisticated recommendation system.

Recommender System

Recommender Systems are nothing new at all and you encounter them on every corner of the Internet: Netflix, Google, Amazon, E-Commerce Shops, and so on. Whenever you get a recommendation on possibly fitting products, services, or persons we talk about recommendation systems. In the case of LinkedIn the recommender system suggests persons based on your network, working history, and interests, the Netflix algorithm recommends movies and series that fit best to your film flavor and Amazon provides you with similar products or complementary products that other customers bought together with the product at hand.

So, what a recommender system simply does is deliver suggestions based on behavior or characteristics that have been tracked by the system. As the Spotify Research Team states Users are overwhelmed by the choice of what to watch, buy, read, and listen to online and hence recommender systems are necessary to help to navigate and facilitating the decision process [2].

Recommender System in Amazon. Source: Own image

Recommender systems can basically be divided into 3 types:

Collaborative-filtering: Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future and that they will like similar kinds of objects as they liked in the past.
Content-based: Content-based recommender systems rely on the available features of the user-item relationship to build a model, such as a gender age etc.
Hybrid: This type obviously combines both worlds, content-based and collaborative filtering.

Spotify makes use of both Collaborative-filtering and content-based recommendation systems, but there is much more to explore. This was now only a pretty generic introduction to understand the basics and lets see more in detail whats behind the algorithm!

BaRT of Spotify Explained Whats that?

The system behind the Spotify customization is called BaRT (Bandits for Recommendations as Treatments). The BaRT model has two modes: Exploitation and Exploration.

tl;dr: Exploitation is used when the system uses the information it gathered about you (the user), such as your skips, your favorite songs etc.. Exploration instead, is used when the system suggests you songs based on all other information that the system can use, e.g. what other users listened to, what playlists have been built, what is trending, what was published etc.

The Exploitation mode is the conventional mode that recommender systems based on collaborative-filtering use. In the case that there exists enough historical data about the users and what users listened to, this mode is perfect. The exploitation mode uses all the data available about the user and the item, e.g. songs skipped, how often a song has been played, shared songs, playlists etc. One of the major problems of systems only based on Exploitation is the item relevance. If there is barely data about a user or an item (in this case a certain song), e.g. the song has not been played often, then the system is uncertain whether to recommend (exploit) or not to recommend (ignore).

Source: [2]

This is the point where the Bandit Algorithm comes into place and where the second mode steps into the spotlight: Exploration. Since the Exploitation mode does not perform well when there is uncertainty (not enough data), the exploration mode covers exactly these cases. Exploration recommends content with uncertain predicted user engagement for the purpose of gathering more information. The importance of exploration has been recognized in recent years, particularly in settings with new users, new items, non-stationary preferences, and attributes. [2]

In other words, if a new song has not been played often yet, you need data to verify if the song has potential. So, Spotifys BaRT recommends these songs via exploration and gathers information for the new song. A song is considered a positive recommendation after 30 seconds. This means if you listen to a song for less than a half minute, it is counted negative. If you listen for more than 30 seconds, you will get positive feedback for the recommendation.

Source: [2]

As can be seen in the graph above, when there is low certainty about the relevance of a song, the exploration mode is necessary and more helpful than the exploitation mode.

The BaRT model is able to learn and predict satisfaction, which is measured for example in click-through rates and consumption probabilities. It constantly logs, retrains, and learns from its own mistakes. [2]

So, in other words, BaRT is based on Reinforcement learning and tries to get feedback to maximize user satisfaction and correct predicted recommendations. For this the BaRT model uses a multi-armed bandit, that is trained to execute a certain action A for which the probability is the highest to receive a reward R. So, every action A depends on the previous actions and rewards. The goal of multi-armed bandit (MAB) is to choose actions that maximize the total sum of rewards[2]. Conventional MAB ignores their context, such as time of day, device, playlist, user features etc., completely. Thats why the Spotify team introduced the contextual multi-armed bandit. It has an eye on contextual information and evaluates this information before it decides to execute an action.

Source: [2]

Four elements are crucial for a well-performing contextual bandit: of course the context, the reward model, the training procedure, and the exploitation-exploration policy. [2] In the reward model exist three parameters: an explanation e (explanation why the item has been chosen), an item j (the song), and a context x.

The reward model. Source: [2]

where θ denote the coefficients of the logistic regression,
and 1_i represents a one-hot vector of zeros with a single 1 at index [2]

Source: [2]

Now this reward function has an impact on the action. In a certain context x the user u performs the optimal action with

Source: [2]

For the exploration approach, the authors use epsilon-greedy. This gives equal probability mass to all non-optimal items in the validity set f (e,x) and (1ε) additional mass to the optimal action (j,e). [2] The policy is set to either exploit or explore the item and explanation simultaneously [2].

Source: [2]

The training procedure itself is nothing special. BaRT retrains periodically in batch mode. [2]

Collaborative filtering works fine for users and music that have already uploaded a song or listened to a song. But what if an artist is completely new and did not even have any song yet? This type is called the cold-start problem. The recommendation system fails for new and unpopular songs.

How Spotify solves the cold start problem

In this case, the only thing that helps is analyzing raw audio files. Even though this is probably the most complicated type for recommendation and also quite compute intensely, it plays a crucial role for creating an industry-leading recommender system.

One of the Spotify employees Sander Dieleman wrote in 2014 a blog article about a model that he created. It consisted of four convolutional layers and three dense layers. The inputs for the machine learning model are not the raw audio files, but rather representations of them called spectograms. So each song is converted into a mel-spectorgram, which is an individual pattern, kind of a fingerprint (as you can see on the image below).

A Spectogramm. Source: Spectogramm

Mel spectrograms are time-frequency plots of the audio. The idea behind the mel scale is to replicate the human hearing response.

The network Dieleman illustrates in his blog post is the following (with the four conv layers and three dense layers): [6]

Source: [6]

As illustrated in the image by the little MPs between two conv layers, Dieleman implemented a pooling layer behind each convolutional layer. After the last of the four conv layers he set another global temporal pooling layer, which pools across the entire time axis. The resulting features are then given into the three fully-connected dense layers with rectified linear units. The last of these layers outputs 40 latent factors.

About 750000 gradient updates are performed in total. I dont remember exactly how long this particular architecture took to train, but all of the ones Ive tried have taken between 18 and 36 hours. Sander Dieleman

Dielenmans network learns different filters, alone in the first conv layer up to 256 filters. As he states one filter learns vibrato singing, another ringing ambience, a third bass drum sounds and another one vocal thirds (multiple singers and voices). Other filters are noise, distortion, specific pitches, low-pitched drones, certain chords like the A chord and many many more.

As you already might have wondered the referred article and explanation of Dielenman is from 2014, so pretty outdated. I do believe it is still good to get an understanding how the Spotify algorithms. Of course, 7 years later the models are way more complex and Spotify has gained much more insight about how to track and analyze, but in its main traits, the system might be similar.

Natural Language Processing and Music Intelligence

Did you already think this was the last model? Nope, of course, Spotify also uses NLP. As described above Spotify acquired Echonest and hence, its music intelligence platform. And this platform is obviously based on Web crawling and Natural Language Processing. This helps Spotify to track more than 10 million websites and analyze their content via Natural Language Processing. These websites include blogs, artist websites, social media, forums, and many more. In the end, Spotify can manage to observe discussions, track whats trending, whats new, what people like, and what people dont like. Even which language they are speaking to track discussions in certain countries or regions. [8]

The system identifies descriptive terms and noun phrases related to songs or artists and classifies these keywords into cultural vectors and top terms.
This results in a set of terms that shows the importance of each term for the artist and song based on weights. [8]

Source: https://notes.variogr.am/2012/12/11/how-music-recommendation-works-and-doesnt-work/

Conclusion

The three-fold strategy to rely on NLP, content-based filtering, and collaborative filtering is a quite sophisticated and also complex way to determine customers music taste. In the future, Spotify has to rely and focus even more on analyzing audio data, which is more complex and compute intense than collaborate-filtering. Nonetheless, all three types are very important, and with more and more data and more complex models, Spotify will be able to predict our music taste better and better.

In using Spotify privately, I can tell they do already a quite good job in offering a huge variety of customized content and recommendations. Starting from the recommendation section, over the year in reverse Stories, the mix of the week, and of course many more individualized playlists based on what I listen to and what other people listen to (that have a similar music taste). Of course, Apple Music or Amazon Music can also offer these services, but in the end, the quality of the recommendation system is crucial for the user experience. Spotify has not yet won the battle, but they have a clear strategy that seems promising. Honestly, I hope Spotify will prevail against the tech giants.

Sources

[1] Company Info (n.d.). https://newsroom.spotify.com/company-info/

[2] MCInerney, J. et al. (2018). Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. https://static1.squarespace.com/static/5ae0d0b48ab7227d232c2bea/t/5ba849e3c83025fa56814f45/1537755637453/BartRecSys.pdf

[3] https://chaitanyabelhekar.medium.com/recommendation-systems-a-walk-trough-33587fecc195

[4] https://www.br.de/puls/musik/aktuell/spotify-the-echo-nest-discover-weekly-100.html

[5 ]https://techcrunch.com/2015/06/24/pulling-the-data-rug-out-from-under-apple/

[6] https://techcrunch.com/2017/03/07/spotify-acquires-audio-detection-startup-sonalytic/

[7] https://www.univ.ai/post/spotify-recommendations

[8] https://outsideinsight.com/insights/how-ai-helps-spotify-win-in-the-music-streaming-world/