I’m going to be embarking on a journey. Learning Python. For data science and machine learning, potentially. If you are too, I want you to join me on my intended learning pathway.
At the time of writing, I’m still working through it.
And it involves reading through three books:
- Automate the Boring Stuff by Al Sweigart
- Python for Data Analysis by Wes Mckinney
- Hands On Machine Learning by Aurélien Géron
We will also run through a little bit of my learning philosophy before we dive into the books. Here we go.
Simple and elegant. Python is a general purpose programming language. Simple, it is easy to get started with. Elegant, it can be taken all the way to professional use.
My first foray into programming involved creating a game on the Casio fx-9750G Plus calculator back in high school. Fast forward a decade later, I have that itch to learn programming.
Following the first question of how to program comes, what language to program in?
Instead of learning programming, I spent time researching what would be the best language to learn. I was being a donkey.
The question shouldn’t be what is the best language to start programming in. There is not best. You just have to choose and go with it based on a long-term goal.
At the moment, I’m interested in data science and machine learning. Learning Python will be suffice for this.
Another key is to just start. You will regret not starting sooner than what language you picked. So pick one (it doesn’t have to be Python), and let’s go!
Why Three Books?
You might be wondering where is the university course? The online subscription? Why only three books?
I’m a big fan of simplicity. Three is a good number. Not too much and not too little.
The reason I am going with books is that I have tried online resources and code-along videos.
Online resources are overwhelming.
If I want to learn, all I have to do is open up the book. If it’s online, I have to boot up my computer and find the online resource. I’m exposed to distraction thank to the vastness of the internet. I end up getting distracted by checking my emails, social media and unreplied messages. No real learning gets done.
Location and reference is easy with book. I know where I am in a chapter of a book. Switching between answer and question, section to section, is smooth.
Code-along videos are good for small things you need extra explanation for. But relying on a code-along video to learn an entire language isn’t the best way to approach. I end up watching the person code instead of soaking in and applying what I have been learnt.
This is why I prefer just three books. And the order is important as well.
Automate the Boring Stuff by Al Sweigart
This is a great introductory book to Python, aimed at those who have no real programming experience. For me, I’m not going to include my calculator programming. Written by Al Sweigart, you can read the book for free on the author’s website.
Split into two parts, the first teaches you the basics of Python. This includes how to install Python on to your machine and the very basics of programming.
The second covers how to use Python in real life situations.
Sweigart offers simple explanations of how the programs work. Then, there are code-along projects, which he runs through step-by-step. An effective strategy is to follow along and type out each line of code instead of copying and pasting or simply glancing over it.
In addition to this, at the end of each chapter are questions to answer. This tests your knowledge on the content. Following most of these questions are some projects. These projects you will have to do by yourself.
This is the most important thing to do.
You can’t learn Python passively. You have to learn by doing. By doing these projects, you are applying your knowledge and working on actual problems.
Learning by doing or project-based learning is regarded as one of the best way to learning coding. The major hurdle of this is that you don’t know what you can do. Your current skill limits you in what problems you can tackle. However, Sweigart has laid out some problems for you to solve at the appropriate skill level.
Once you have completed this book, you will have a good skill base to tackle your own projects. You can now branch out and do whatever you want to with Python. It doesn’t have to be data science or machine learning.
But if you want to continue, then onto to the next book.
Python for Data Analysis by Wes Mckinney
The important Python libraries for data analysis include:
Mckinney, who is one of the core developers of pandas, covers these libraries in extensive detail. In addition to this, he introduces us to Jupyter, an environment to do your coding.
This book is more of a code-along with detailed explanations of using Python for data analysis. There are not many projects in this book. But your curiosity must be wagging by now.
You can easily learn from this book and then apply it to some projects you have in mind. That is what I did for my YouTube analysis.
Let’s talk a little bit about the libraries.
numPy is a powerful library that allows for quick mathematical operations, especially concerning arrays. Here is a very simple example:
We have used numPy to create an array of numbers ranging from 0 to 99,999. We have done the same with a Python list.
We then multiply the numPy array by 2. We also perform the same function of the Python list by using a list comprehension.
The results using numPy take a fraction of the time compared to using pure Python. This is one example where numPy is faster, but it also has more functionality as well.
Let’s take a look at pandas. This library allows quick wrangling of you data. Here is an example:
The first cell uses the
%run magic operator. This activates a separate Python script,
getCovidData.py. Using what was learnt from Automate The Boring Stuff called web scraping using
beautifulsoup4, this script downloads the COVID cases from the Ministry of Health, New Zealand website as an Excel file,
pandas has special functions,
read_excel(), which can open the Excel file and transfer this to an object called a
DataFrame. With a
DataFrame we can perform simple data transformation.
One example looks at counting all the cases per day.
This moves us on to the Matplotlib, which can be used to visual data.
In the example where we have the confirmed COVID cases per day in New Zealand, we can plot this as a time-series graph.
These libraries are foundational for the next book on the list.
Hand on Machine Learning by Aurélien Géron
The final book in the series involves using principles in machine learning to build models.
What is required is some Python proficiency as well as knowledge of the libraries: numPy, pandas and matplotlib. This is why we did Automate The Boring Stuff and then followed by Python for Data Analysis, first.
Géron, in his book, runs through a variety of machine learning implementations in a project-based learning format. There is a good balance of explanations and practical use.
He discusses the main libraries to get you started with machine learning. These include:
I do not have a strong grasp of these libraries, yet. So please keep updated as we learn together.
Programming in Python and getting into data science and machine learning is going to be one long journey. Thankfully, this has been simplified into three books to get you started on that journey.
The approach we want to take is to do projects. Projects make you actively apply what has been learnt rather than soaking it all in and forgetting.
Automate the Boring Stuff by Al Sweigart is a great introduction in Python. Covering the basics of Python as well as some interesting use cases, Sweigart tests you with some projects to get you really learning.
Python for Data Analysis by Wes Mckinney provides excellent explanations needed to get a understand the underpinning libraries of machine learning and data science: numPy, pandas and matplotlib. This provides a good handle onto the next book.
If you decide to embark on this same journey, then I wish you the best of luck.
Once again, I hope you found this useful. If you think anyone else would, please share this with them.