Ambeone Student’s Projects Gallery
Based on the Big Data Analytics & Machine Learning Techniques taught in Ambeone’s Programs
This is a Gallery of some glimpses Data Science projects done by recent Ambeone students as part of their program.In case you are interested to know more about a particular project/projects, you may contact us for details .
Data Science based Recommender System to recommend Artists from Spotify and creating their segmentation using Clustering
Submitted by: Zehra
Recommender System using Data Science for Spotify Artist Database
Overview
This Data Science project was based on a dataset from Spotify (available on Kaggle) with the attributes of 27,606 artists and the objectives were:
- To create an artist recommender system (correlation based)
- To classify artists into separate clusters based on their attributes (k-means clustering)
(https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks).
Variables
The dataset contains multiple attributes like Contains mean acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, valence, duration, popularity, genres for each artist in the spotify dataset.
These were used to construct a Recommender System and Artists Clusters.
Data Cleaning
- Some tracks had the same names and artists but had slightly varying attributes. These duplicates were removed.
- After removing all duplicate artists, 27,606 artists remained.
Key Results
Artist Recommender System
- The artist recommender system allows individuals to enter their preferred artists and the system recommends artists that highly correlate with the preferred artist’s attributes.
- Users can enter more than one preferred artist to get recommendations.
- When users enter more than one preferred artist, the system finds recommendations for each artist individually, and then returns a list of all artists after removing any duplicates.
Clustering Artists
- There are 27,606 artists in the dataset.
- While the silhouette method suggested two clusters as the optimal number, the gap method and the elbow method did not run because of memory issues.
- The artists were classified into 50 clusters. This was mainly because there are more than 50 unique genres mentioned in the artist-genre-kaggle.csv.
- between_SS / total_SS = 6 %
Business Applications
- The artist recommender can be used to recommend artists to streaming users.
- Furthermore, using the k-means clusters of artists, a YouTube channel can be created with curated playlists of similar artists. The channel can then be monetized for advertisements.