Recommended Textbooks and Links

Andrew Ng’s material.

Andrew Ng is one of the giants in machine learning. He has written numerous very important papers, and it one of the founders of Coursera.

He has made of videos on machine learning of various formats. He is a far better lecturer than me, and his lectures will be of much higher quality than I can record. Some of his lectures are on a whiteboard and some of them are him talking over his slides. I prefer him talking over slides and the playlist on youtube can be found here.

Somebody has put the slides that he used on here, and here you can find notes on the lectures. The following Github repository has his course notes, although they are a bit too advanced for this course.

Other Online Lectures

If you find some particular good online lectures, then please email them to me.

So far:

  • There is a good channel DataCamp that has lots of machine learning videos with some real Python examples in that illustrate many of the concepts in this course.

Machine Learning Terminology

There are a lot of different terms in machine learning that refer to the same thing. A good resource is

Text Books and other material.

All the material in this course is standard material. There are a lot of books on machine learning, some more theoretical and some more practical. It is difficult to recommend a book that fits the background for all students on this course. Some books are too mathematical and some books are too practical.

If you buy one book then I recommend

Cover of the Hundered-Page Machine Learning
Book Cover of the Hundered-Page Machine Learning
Book.

It can be found here. You can buy an electronic version of the book for as little as $20. Unfortunately the book is not available the university library. If you follow the link above you can access individual chapters of the book. In the lecture notes I will give references to the book. The book being a little over 100 pages (141) does not go into all the topics in detail, and omit some background material that is necessary for some students. My advice is to start there, and move onto other references for more background or more detail.

Other Books and resources

In the course we are going to use Python and scikit-learn for a lot of the assignments. The user guide not only contains a lot of examples, but also a lot of the background mathematics. I strongly recommend the dive into scikit-learn’s website. You are also going to use pandas to parse input data, numpy for fast manipulation and processing of arrays, and matplotlib to plot data. All these libraries have extensive documentation with examples.

Technical assistance

  • How to run Python notebooks locally.

There are a lot of books on machine learning. Some are very theoretical that go into great depth, and some contain a lot of hands on material that you can use in your own projects. The following list contains some books that I have found useful, together with links to the online text at the university library. The university library has a lot of online resources on machine learning, as well as access to O’Reilly’s book catalogue.

I used to put up direct links to the books at the University Library, but these links are never stable. To find the books below such for them. You can access the contents of books by logging in via CAS. The system is a bit flaky, and sometimes you might have to refresh the book’s page for the system to notice that you have logged on.

More theoretical books

More practical books

Subsections of Recommended Textbooks and Links

Creating and using Python Virtual Environments

Windows users

These instructions are mostly for Linux or MacOS. If you are windows there are other options such as AnaConda. You also have problems with integrating with various windows IDEs. If you are using windows then maybe these instructions are useful.

In a lot of my courses I encourage students to use python virtual environments. Virtual environments are a great way of making sure that you have the correct version of packages installed. This is very short cheat sheet on how to set them up. I will assume that we are using python 3. Luckily python 3 has virtual environments set up. It is all in the documentation, but then sometimes people are too lazy to google, or do not know what to google for.

To create a virtual environment in the current directory do the following

python3 -m venv env

If you look in the directory you’ll see a sub-directory env this contains all files that drive the virtual environment. In particular there is a sub-directory env/bin/

Justins-MacBook-Air-2:Example justin$ ls env/bin/ 
Activate.ps1		easy_install		pip3.9
activate		easy_install-3.9	python
activate.csh		pip			python3
activate.fish		pip3			python3.9

The most important file is the activate script. To start your virtual environment you simply do

 source ./env/bin/activate

This executes the activate script. Notice that you prompt has changed to (env)

You are now free to install what ever packages want. For exampe

pip3 install panda
pip3 install numpy

Once you have done all your python goodness you should leave your virtual environment.

(env) Justins-MacBook-Air-2:Example justin$ deactivate 

Setting up Python notebooks locally for Machine Learning

While there are many options for hosting python notebooks for free remotely, but I prefer to do things locally. If you wish to do the same then please read on. Note that I did home some problems installing skikit-image on a new Apple with M1 silicon, but after upgrading the latest version of the operating system and doing and updating my homebrew setup everything works. Note that these instructions are for using pip. If you are using an alternative package manager such as Anaconda then you are on your own. I know nothing about setting up Python on a windows machine.

Setting up the Virtual Environment

First it is better set up a virtual environment (see Creating and Using Virtual Environments in Python . In a suitable place on your machine do:

python3 -m venv env
source ./env/bin/activate

Note that you might be prompted to upgrade pip. If you are, then follow the instructions. You are only upgrading the version in your local virtual environment.

Installing the relevant packages

When you prompt changes to (env) you are inside your virtual environment. You can now use pip to install packages. For reasons that I’m not going to explain your life is made a little easier if you install wheel

pip install wheel

You can now install various useful packages for machine learning

 pip install numpy  scipy matplotlib scikit-learn pandas scikit-image
 treelib nltk

This might take a while. The package jupyter drives the notebooks, so install that

pip install jupyter

Emacs key bindings

I have been using the editor Emacs for over 30 years. The key binding are hardwired into my brain. If you are like me then installing the jupyter-emacskeys package will make your life a little less frustrating.

pip install jupyter-emacskeys

Launching notebooks

You talk to Python notebooks using the web browser when you are running them locally you start your own little server

jupyter notebook

If you have thing configured then a web browser might open up. If not then look at the output on the terminal and you will see something like:

[I 09:47:31.185 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 09:47:31.216 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///Users/justin/Library/Jupyter/runtime/nbserver-49899-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36
     or http://127.0.0.1:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36

So in this case you can point your browser at http://127.0.0.1:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36 and you are ready to go.

Using Anaconda with Windows.

Disclaimer

These instructions come with no warranty. I do not use Windows. If you have any questions about Windows then I cannot answer them. If you have any constructive comments the please contact me.

Anaconda is a distribution of Python and R that includes many popular Data Science packages, such as Jupyter Notebook, NumPy, pandas, and scikit-learn. For numerous reasons, it is suggested for working with Jupyter Notebook.

  1. Ease of installation: Anaconda comes with a built-in package manager called conda, which makes it easy to install and manage packages, including Jupyter Notebook. Anaconda also automatically manages dependencies between packages, so you do not have to worry about conflicts between different versions of packages.

  2. Integrated development environment (IDE): Anaconda includes an IDE called Anaconda Navigator, which provides a user-friendly interface for managing your Python environment and launching Jupyter Notebook.

  3. Large collection of pre-installed packages: Anaconda comes with a large collection of pre-installed packages, which can save you a lot of time compared to installing each package individually.

  4. Virtual environment management: Anaconda comes with conda which is a powerful virtual environment manager that allows you to create and manage multiple isolated environments, each with its own set of packages. This can help you to avoid conflicts between packages and maintain reproducibility of your projects.

Now that we know the advantages of using Anaconda, we proceed with a detailed explanation of how to install anaconda and subsequently open Jupyter notebooks through Anaconda. There are two options to do this.

First option

  1. First, download the Anaconda distribution for Windows from https://www.anaconda.com/products/distribution.

  2. Once the download is complete, double-click the installer file to begin the installation process.

  3. Follow the prompts in the installer to complete the installation. During this process, you will be prompted to select the installation location. By default, Anaconda will be installed in the C:\ProgramData\Anaconda3 folder, but you can choose a different location if you prefer. The next screen is the Advanced Installation Options. Here you have some additional options like if you want to add anaconda to your PATH variable, and if you want to register Anaconda as your default Python. Make sure you check the “Add Anaconda to my PATH environment variable” option, it will make your life easier.

  4. Once the installation is complete, open the Start menu and search for “Anaconda Prompt”. Click on the Anaconda Prompt shortcut to open a command prompt window.

  5. In the command prompt window, type the following command to verify that Anaconda is installed correctly: conda –version

  6. Next, we need to create a virtual environment in which to install Jupyter Notebook. In the Anaconda prompt, type the following command: conda create -n env_name , replacing env_name with the name you want to give to your virtual environment.

  7. After the environment is created, activate it by typing the following command: conda activate env_name

  8. Once the virtual environment is active, we can install Jupyter Notebook by typing the following command: conda install jupyter

  9. After Jupyter Notebook is installed, we can start the Jupyter Notebook server by typing the following command in the Anaconda prompt: jupyter notebook

  10. This will open a web browser window with the Jupyter Notebook interface. From here, you can create new notebooks, open existing notebooks, and run code cells.

This is a step by step explanation of how to install Anaconda and how to open Jupyter Notebook through the Anaconda prompt, however, with the installation of Anaconda it comes Anaconda Navigator. This is a graphical user interface (GUI) that allows you to easily manage your Anaconda environments and packages, as well as launch Jupyter Notebook and other applications. Let us see how to open Jupyter Notebook with Anaconda Navigator (more intuitive than the first option).

Second option

  1. Open the Anaconda Navigator application. You can find it by searching for “Anaconda Navigator” in the Start menu (Windows) or Spotlight (Mac).

  2. Once the Anaconda Navigator window is open, you should see Jupyter Notebook in the list of applications. Click on the “Launch” button next to Jupyter Notebook.

  3. A new tab will open in your default web browser, and you will be taken to the Jupyter Notebook interface.

  4. You should see the list of files and directories in the current working directory, you can navigate to the desired folder and create a new notebook by clicking the “New” button on the right corner and select “Python 3”.