Using Python to scrape the Billboard Hot-100 playlist to generate a Spotify playlist
In this post I shall go over how I used Python to create a Spotify playlist using the tracks taken from the billboard.com Hot-100 chart and the Spotify Web API.
At a high level a quick overview.
Here is a list of tools/technologies used:
Beautiful Soup
This Python module allowed me to easily extract data out of the billboard.com Hot-100 webpage. I chose this library since the billboard.com RSS feed did not have the sufficient data to pull the required track information.
I used pip to install this using.
pip install beautifulsoup4
HTML page parsing
Before we dig deep into the Python code, it is important to look at the HTML page taken from the billboard.com webpage.
Each track on the Hot-100 chart is included in a parent ** **denoted tag.
The example below only shows the 2 main tags used, there are various others such as weeks on chart, which are not in scope for this.
<span class="chart-element__information">
<span class="chart-element__information__song text--truncate color--primary">No Guidance</span>
<span class="chart-element__information__artist text--truncate color--secondary">Chris Brown Featuring Drake</span>
....
</span>
<table class="has-fixed-layout" ><tr >Span classDescription</tr><tbody ><tr >
<td >chart-element__information__song text
</td>
<td >Track name
</td></tr><tr >
<td >chart-element__information__artist text--truncate color--secondary
</td>
<td >Artist name
</td></tr></tbody></table>
Now that there was a pattern to use to extract the relevant information, the next step was to look at how to extract this data using the Beautiful Soup library.
Here is an example of the code to use the library.
HTML_FILE = is the saved version of this webpage.
You can use the requests library in Python to do a get for the page, and save the text to disk. For testing I cached the page to disk so that I didn’t keep requesting the page each time.
# define a class to encapsulate a 'track'
class Track():
def __init__(self,name,artist):
self.name=name
self.artist=artist
def getTop100Tracks():
try:
logger.info("Parsing Billboard Hot-100 tracks.")
# use the BeautifulSoup constructor to pass in the location of the html file, features tells the library the format of the html, and is set to the default html5lib.
soup_parse = BeautifulSoup(open(HTML_FILE),features="html5lib")
# find all span elements with the name 'chart-element__information
spans = soup_parse.find_all('span', {'class':'chart-element__information'})
if(len(spans) > 0):
# iterate over the span elements extracting the track and artist names
for s in spans:
# extract the track name, and artist name
track = Track(s.find('span', {'class':'chart-element__information__song'}).text,s.find('span', {'class':'chart-element__information__artist'}).text)
# save track info in a dictionary
ALL_TRACKS[track.name] = track.artist
except:
logger.error("Parsing of html page failed.")
raise
Now that the extraction of the data from the webpage is complete, lets take a look at the Spotify Web API.
Spotify Web API
Before using the Web API, please ensure you have completed the following. For more information see: https://developer.spotify.com/documentation/web-api/quick-start/
Set Up Your Account
When you have a user account, go to the Dashboard page at the Spotify Developer website and, if necessary, log in. Accept the latest Developer Terms of Service to complete your account set up.
To use the Web API, start by creating a Spotify user account (Premium or Free). To do that, simply sign up at www.spotify.com.
Register Your Application
Any application can request data from Spotify Web API endpoints and many endpoints are open and will return data without requiring registration. However, if your application seeks access to a user’s personal data (profile, playlists, etc.) it must be registered. Registered applications also get other benefits, like higher rate limits at some endpoints.
You can register your application, even before you have created it.
Spotify Web API
Authentication
Spotify has 2 ways to authenticate, based on the requirements.
App Authorization: Spotify authorizes your app to access the Spotify Platform (APIs, SDKs and Widgets).
User Authorization: Spotify, as well as the user, grant your app permission to access and/or modify the user’s own data. For information about User Authentication, see User Authentication with OAuth 2.0. Calls to the Spotify Web API require authorization by your application user. To get that authorization, your application generates a call to the Spotify Accounts Service
/authorize
endpoint, passing along a list of the scopes for which access permission is sought.Spotify Web API
Further details can be found at https://developer.spotify.com/documentation/general/guides/authorization-guide/
User Authorization
I used the User Authorization route since I needed access to modify user’s own data.
Note. When registering your application make sure you note down the client id, client secret and call back URL provided in the registration form. You will need those details later on.
This authentication flow typically uses the following path.
- Request Authorization code 2. Use Authorization code to request Access and Refresh token 3. Use Access token to access the Spotify Web API 4. Use Refresh token to update expired Access token
Step 1 - Request Authentication code
- Parameters:
- client id (required)
- call back URL (required)
- scope _(optional) _- this is list of authorization scopes which you require access to
- state (optional but recommended) - a random string; provides protection against attacks such as cross-site request forgery
- show_dialog - boolean _(optional) _to force the user to approve the app again if they’ve already done so
Details of some of the authorization scopes are listed below for a complete list please see https://developer.spotify.com/documentation/general/guides/scopes/
playlist-read-collaborative
- Include collaborative playlists when requesting a user’s playlists.
playlist-modify-private
- Write access to a user’s private playlists.
playlist-modify-public
- Write access to a user’s public playlists.
playlist-read-private
- Read access to user’s private playlists.
user-read-private
- Read access to user’s subscription details (type of user account).
user-library-modify
- Write/delete access to a user’s “Your Music” library.
Using curl to get an authorization code.
curl -v 'https://accounts.spotify.com/authorize?client_id=<replace with client id>&response_type=code&redirect_uri=<replace with call back url>&scope=<replace with authorization scope>&state=<replace with random string>&show_dialog=false'
When provided a list of authorization scope items - ensure they are separated using the escape character %20 .e.g. user-read-private%20user-read-email%20playlist-modify-private
When running curl using the verbose (-v) flag this exposes the location header which is similar to the following.
<location: https://accounts.spotify.com/login?continue=https%3A%2F%2Faccounts.spotify.com%2Fauthorize%3Fscope%3Duser-read-private%2Buser-read-email%2Bplaylist-modify-private%26response_type%3Dcode%26redirect_uri%3Dhttps%253A%252F%252Fexample.com%252Fcallback%26state%3D34fFs29kd09%26client_id%<client_id>%26show_dialog%3Dfalse
Open the URL in a web-browser.
The user is asked to authorize access within the scopes. The Spotify Accounts service presents details of the scopes for which access is being sought.
The user is redirected back to the specified redirect_uri
. After the user accepts, or denies your request, the Spotify Accounts service redirects the user back to your redirect_uri
.
Once the app access has been authorized you will see a redirect uri with the authorization code like below.
https://example.com/callback?code=AQBcyrnRH8CnShgs...&state=<this is the state code used in the earlier curl cmd>
get access token
Step 2 - Request an Access token
Requesting an Access token requires the following.
- base64 encoded client id:client secret
- authorization code
- redirect uri
- state
Here is a curl POST example.
curl -H "Authorization: Basic <base64 client_id:client_secret>" -d grant_type=authorization_code -d 'code=AQAXSYw9unYk.....' -d 'redirect_uri=https://example.com/callback' -d 'state=<your state code>' https://accounts.spotify.com/api/token
The response if successful.
{
"access_token": "NgCXRK...MzYjw",
"token_type": "Bearer",
"scope": "user-read-private user-read-email",
"expires_in": 3600,
"refresh_token": "NgAagA...Um_SHo"
}
Step 3: Once you have the <access_token> you can use this to make calls to the Spotify Web API.
**Step 4: **Refresh expired access token
If the access token has expired. Make the following request which will return a new access token.
curl -H "Authorization: Basic <base 64 client_id:client_secret>" -d grant_type=refresh_token -d refresh_token=NgAagA...Um_SHo https://accounts.spotify.com/api/token
Spotify Web API Endpoints
I’ve used several endpoints as part of the playlist creation. For a complete list see the Spotify Web API documentation.
Python Functions
Add tracks to playlist.
API Endpoint used = SPOTIFY_API_PLAYLIST_TRACKS
HTTP method used = POST
Response = 201
def addTracksPlaylist(playlistId,trackUids,access_token):
headers={"Accept" : "application/json", "Content-Type" : "application/json", "Authorization" : "Bearer {}".format(access_token)}
try:
response = requests.post(SPOTIFY_API_PLAYLIST_TRACKS.format(playlistId,trackUids),headers=headers,verify=False)
if(response.status_code == 201):
return response.json()
else:
print("[ERROR] " + response.text)
raise
except:
raise
Get current users playlists
API Endpoint used = SPOTIFY_API_PLAYLIST_CURRENT
HTTP method used = GET
Response = JSON 200
def getPlaylists(access_token):
headers={"Accept" : "application/json", "Content-Type" : "application/json", "Authorization" : "Bearer {}".format(access_token)}
try:
response = requests.get(SPOTIFY_API_PLAYLIST_CURRENT,headers=headers,verify=False)
if(response.status_code == 200):
return response.json()
else:
if( 'error' in response.json() and 'The access token expired' in response.json()['error']['message']):
print("[WARNING] The access token expired, requesting refresh.")
access_token = getNewAccessToken()
with open('.accesstoken',mode='w') as w:
w.writelines(access_token)
w.close()
getPlaylists(access_token)
except:
raise
Create new playlist
API Endpoint used = SPOTIFY_API_CREATE_PLAYLIST
HTTP method used = POST
Payload = { name, description, public }
Response = 201
def createPlaylist(uid,name,description,access_token):
headers={"Accept" : "application/json", "Content-Type" : "application/json", "Authorization" : "Bearer {}".format(access_token)}
data = {}
data['name'] = name
data['description'] = description
data['public'] = 'false'
try:
response = requests.post(SPOTIFY_API_CREATE_PLAYLIST.format(uid),data=json.dumps(data),headers=headers,verify=False)
if(response.status_code == 201):
return response.json()['id']
else:
if( 'error' in response.json() and 'The access token expired' in response.json()['error']['message']):
print("[WARNING] The access token expired, requesting refresh.")
access_token = getNewAccessToken()
with open('.accesstoken',mode='w') as w:
w.writelines(access_token)
w.close()
createPlaylist(uid,name,description,access_token)
except:
raise
Search for Artist ID
API Endpoint used = SPOTIFY_API_SEARCH_ARTIST
HTTP method used = GET
HTTP request query requires artist name to search
Response = JSON 200
def searchArtistId(artist,access_token):
headers={"Accept" : "application/json", "Content-Type" : "application/json", "Authorization" : "Bearer {}".format(access_token)}
try:
response = requests.get(SPOTIFY_API_SEARCH_ARTIST.format(artist),headers=headers,verify=False)
if(response.status_code == 200 and len(response.json()['artists']['items']) > 0):
#print(response.text)
return response.json()['artists']['items'][0]['id']
else:
if( 'error' in response.json() and 'The access token expired' in response.json()['error']['message']):
print("[WARNING] The access token expired, requesting refresh.")
access_token = getNewAccessToken()
with open('.accesstoken',mode='w') as w:
w.writelines(access_token)
w.close()
searchArtistId(artist,access_token)
else:
pass
except:
raise
Get Artist Top Tracks
API Endpoint used = SPOTIFY_API_SEARCH_ARTIST_TOP
HTTP method used = GET
Response = JSON 200
def getArtistTopTracks(artistId,access_token):
headers={"Accept" : "application/json", "Content-Type" : "application/json", "Authorization" : "Bearer {}".format(access_token)}
try:
response = requests.get(SPOTIFY_API_SEARCH_ARTIST_TOP.format(artistId),headers=headers,verify=False)
if(response.status_code == 200):
return response.json()
else:
raise
except:
raise
Refresh Expired Access token
Endpoint used = SPOTIFY_API_TOKEN
Payload = { grant_type, refresh_token }
HTTP method = POST
def getNewAccessToken():
headers={"Authorization" : "Basic " + SPOTIFY_API_BASE64_CLIENT}
data={}
data['grant_type']="refresh_token"
data['refresh_token']=SPOTIFY_API_REFRESH_TOKEN
try:
response = requests.post(SPOTIFY_API_TOKEN,data=data,headers=headers,verify=False)
if(response.status_code == 200):
return response.json()['access_token']
else:
print("[ERROR] Failed to get access token {}".format(response.text))
raise
except:
print("[ERROR] Failed to send request to get new access token.")
raise