Daniel Bourke
Shivan

Shivan

My First Data Project: Exploring Daniel Bourke’s Youtube Channel (Part 1)

Print Friendly, PDF & Email

Well-groomed mullet, check.

Sick physique, check.

Familiar ‘Straya twang, check.

Daniel Bourke is an aspiring machine learning engineer and content creator. If you check out his about page, the born-and-bred Queenslander’s reach is spread wide on many publishing and online learning platforms related to the area of machine learning.

I first came across this bloke on another one of his growing platforms, YouTube. In fact, that’s where the inspiration for this project came.

While browsing his channel, I noticed two distinct categories of videos: machine learning and health/fitness.

A question then popped into my head: what is the popularity of his machine learning videos compared to his health/fitness videos?

A simple mention on Twitter would be enough to answer the question. The mulleted man is very responsive. I know this because I did.

But… the hard way is always the better way, right…?

Turns out it is. We get the opportunity to write code and look at data.

And so it begins, the birth of my first data-related project. The primary aim is a learning experience. A learning experience for me is coding and writing this post; for you, it’s reading it.

In a multi-part post, we are going to learn how to parse Daniel Bourke’s YouTube channel. From numbers on the internet to graphs we can visualise, like the one below.

Daniel Bourke

So, grab some bean juice or a cold one, depending on what time it is in the country you are in. Sit down and enjoy. I’ve got a good one for ya.

Preflight Inspection

Once again, I’m fresh to coding/data/machine learning. I just want to have some fun with this. There is probably better ways to do this, so please give me some feedback in the comments below.

If you want to follow along, all you need:

YouTube Data API

I could go into every single video made by Daniel Bourke and copy and paste the view count. But that is tedious and these values change regularly. We are going to access the YouTube Data API v3 instead.

  1. First, we need to register and sign up for https://console.developers.google.com/.
  2. Hit the ‘New Project’ button and create a new project.
Daniel Bourke

3. Search and activate the YouTube Data API v3 app.

Daniel Bourke

4. Create credentials. All you need is the API key since you will only be doing is making requests. The OAuth client would be necessary if you want to make changes to your own channel.

Daniel Bourke

5. Save this API key.

Your API key

In a separate python file, setAPIkey.py, we will use this to store our API key. The reason we do this so when we share this on GitHub we can put this file in .gitignore. This is so it doesn’t get accidently uploaded for the entire world to see. And sadly exploit.

This short piece of code will create an environment variable, which will store our API key.

#!python3
# setAPIkey.py - Sets API key as an environment variable
# IMPORTANT: this must not be shared and is kept in .gitignore 
import os

os.environ['YT_API'] = 'xxxxxAPIKEYgoesHERExxxxxx'

Importing Libraries

Let’s import all those libraries.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random as r
import seaborn as sns

%matplotlib inline

import os
import requests
import math
import re
import isodate

from datetime import date, time, datetime

import googleapiclient.discovery
import googleapiclient.errors

Set up Google API Client

We are doing to run our separate python file that we created for our API key. We can use the magic function, %run, which will run python files for us, as long it’s in the same directory.

%run setAPIkey.py
api_service_name = 'youtube'
api_version = 'v3'
youtube = googleapiclient.discovery.build(api_service_name, api_version, developerKey=os.getenv('YT_API')
)

Getting the Uploads playlist

The uploads playlist will contain all the uploaded videos by a specific channel. In order to access this, we must use the channels().list() method. More documentation here: https://developers.google.com/youtube/v3/getting-started.

We use the channel ID, which is located on URL for Daniel Bourke’s channel: https://www.youtube.com/channel/UCr8O8l5cCX85Oem1d18EezQ.


channel_ID = 'UCr8O8l5cCX85Oem1d18EezQ' # Daniel Bourke's channel ID

channel_request = youtube.channels().list(
    part='snippet,contentDetails',
    id=channel_ID
)
channel_res = channel_request.execute()

Below is the output of channel_res. I’ve decorated some hashtags to show what we need, the upload playlist ID.

{'kind': 'youtube#channelListResponse',
 'etag': 'FAB8A6FOE8iWtZd1ZKzWj453hKo',
 'pageInfo': {'resultsPerPage': 1},
 'items': [{'kind': 'youtube#channel',
   'etag': 'hn79H0m-K3PIGztc7HX-w1XdZQo',
   'id': 'UCr8O8l5cCX85Oem1d18EezQ',
   'snippet': {'title': 'Daniel Bourke',
    'description': "I'm a machine learning engineer who plays at the intersection of technology and health.\n\nMy videos will help you learn better and live healthier.\n\nFeel free to introduce yourself, I'd love to hear from you.\n\nDaniel",
    'publishedAt': '2016-08-02T21:36:26Z',
    'thumbnails': {'default': {'url': 'https://yt3.ggpht.com/a/AATXAJwKBdy84WKVUIoOt37SBAeWjUhPdGDAx9WxNt1Rdz4=s88-c-k-c0xffffffff-no-rj-mo',
      'width': 88,
      'height': 88},
     'medium': {'url': 'https://yt3.ggpht.com/a/AATXAJwKBdy84WKVUIoOt37SBAeWjUhPdGDAx9WxNt1Rdz4=s240-c-k-c0xffffffff-no-rj-mo',
      'width': 240,
      'height': 240},
     'high': {'url': 'https://yt3.ggpht.com/a/AATXAJwKBdy84WKVUIoOt37SBAeWjUhPdGDAx9WxNt1Rdz4=s800-c-k-c0xffffffff-no-rj-mo',
      'width': 800,
      'height': 800}},
    'localized': {'title': 'Daniel Bourke',
     'description': "I'm a machine learning engineer who plays at the intersection of technology and health.\n\nMy videos will help you learn better and live healthier.\n\nFeel free to introduce yourself, I'd love to hear from you.\n\nDaniel"},
    'country': 'AU'},
   'contentDetails': {'relatedPlaylists': {'likes': '',
     'favorites': '',
     'uploads': 'UUr8O8l5cCX85Oem1d18EezQ',
 ### Upload Playlist ID ###
     'watchHistory': 'HL',
     'watchLater': 'WL'}}}]}

We can use indexing to access the playlist ID for Daniel Bourke’s channel.

uploadPlaylist_ID = channel_res['items'][0]['contentDetails']['relatedPlaylists']['uploads']

Getting the videos

Using similar syntax to the channels().list() method. We are going to use playlistItems().list(). And we are going to define it as a function.

The reason for this is because we can request up to 50 videos at a time. Daniel Bourke has well over this amount, over 250 at the time of writing.

We can run this function once to get the first 50 videos. Additionally, I am able to obtain the maximum videos in the upload playlist as well as the pageToken which will allows me to pull the next 50 lot of videos.

def playlist_requester(pageToken=None,uploadPlaylist_ID=uploadPlaylist_ID):
    playlist_request = youtube.playlistItems().list(
        part='snippet,contentDetails',
        maxResults=50,
        pageToken=pageToken,
        playlistId=uploadPlaylist_ID
    )
    playlist_res = playlist_request.execute()
    
    return playlist_res

We are going to use a list comprehension, which Daniel Bourke explains very well in his blog post, to build a list of first 50 video ID.

playlist_res = playlist_requester()

TOTAL_UPLOADS = playlist_res['pageInfo']['totalResults'] # total videos in upload playslist
nextPageToken = playlist_res['nextPageToken'] # needed to access the next page

listOfVideo_IDs = [ video_ID['contentDetails']['videoId'] for video_ID in playlist_res['items'] ] # first 50

We can construct this into a while loop to pull the rest of the videos and put this into a list. The reason I run the function one before is so we can get the total uploads in the play list, as well as the page token to run the next loops.

while TOTAL_UPLOADS > len(listOfVideo_IDs):
    
    nextpage_playlist_res = playlist_requester(nextPageToken)
    listOfVideo_IDs.extend([ video_ID['contentDetails']['videoId'] for video_ID in nextpage_playlist_res['items'] 
                            if video_ID['contentDetails']['videoId'] not in listOfVideo_IDs ]) # using .extend() vs .append()
    if 'nextPageToken' in nextpage_playlist_res:
        nextPageToken = nextpage_playlist_res['nextPageToken'] # if there no more pages this tag does not exist
        
    print('Number of Uploaded Videos: ' + str(len(listOfVideo_IDs)))

Something interesting to note is that I use extend() as a method of concatenating lists together (well, a list with a list comprehension), which is less computationally intensive compared to using the + operator. This is according to Python for Data Analysis by Wes Mckinney.

This loop also nicely prints out how big the list gets with each loop.

Number of Uploaded Videos: 100
Number of Uploaded Videos: 150
Number of Uploaded Videos: 200
Number of Uploaded Videos: 250
Number of Uploaded Videos: 252

Getting Video IDs

Now that we have a list of video IDs stored in listOfVideo_IDs, it’s time to get the information from each video. This involves using a pretty complicated list comprehension. This is then saved to a intermediary variable before being passed through pd.json_normalize(). This will turn our dictionary string into a dataframe.

video_response = [ youtube.videos().list(
        part='snippet,contentDetails,statistics',
        id=listOfVideo_IDs[50*i:50*(i+1)]
    ).execute() for i in range(math.ceil(len(listOfVideo_IDs)/50)) ]

df = pd.json_normalize(video_response, 'items')

df.head()

Finally, all the videos from this channel along with the video’s statistics in a single dataframe.

Daniel Bourke

Conclusion-ish

Now that we have out information in a single dataframe, I think it will be a good time to conclude this post as its getting pretty long.

I’m hoping you were able to set up the YouTube Data API v3 app.

Feedback is welcomed. Please do so in the comments below or send me an email! Thank you for reading and please stay in tuned for future postings.

Finally, courtesy has to go to Daniel Bourke for giving me full permission to go all out with his YouTube data. What a legend.

Here is a working GitHub: https://github.com/ShivanS93/MrBourkeYoutubeStats

Related posts

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on pinterest
Share on email
Print Friendly, PDF & Email

2 thoughts on “My First Data Project: Exploring Daniel Bourke’s Youtube Channel (Part 1)”

Leave a Comment

Your email address will not be published. Required fields are marked *

Join My Journey

Subscribe for a free weekly email newsletter