Subsections of Resources
Recommended Books and other study material
All the material in this course is standard material. There are a lot
of books on machine learning, some more theoretical and some more
practical. It is difficult to recommend a book that fits the
background for all students on this course. Some books are too
mathematical and some books are too practical. The course has two
recommended text books. One is more theoretical, and one is more hands
on with Python. Both of them cover most of the material in the course.
Course Text Book: The Hundred-Page Machine Learning Book
.
It can be found here. You can buy an
electronic version of the book for as little as $20. Unfortunately the
book is not available the university library. If you follow the link
above you can access individual chapters of the book. In the lecture
notes I will give references to the book. The book being a little over
100 pages (141) does not go into all the topics in detail, and omit
some background material that is necessary for some students. My
advice is to start there, and move onto other references for more
background or more detail.
Course Text Book: A Hands-On Introduction to Machine Learning
This is a much more practical book that has lots of Python code.
Other Books and resources
In the course we
are going to use Python and
scikit-learn for a lot of the
assignments. The user
guide not only
contains a lot of examples, but also a lot of the background
mathematics. I strongly recommend the dive into
scikit-learn’s website. You are
also going to use pandas to parse input
data, numpy for fast manipulation and processing
of arrays, and matplotlib to plot data. All
these libraries have extensive documentation with examples.
Andrew Ng’s material.
Andrew Ng is one of the
giants in machine learning. He has written numerous very important
papers, and it one of the founders of
Coursera.
He has made of videos on machine learning of various formats. He is a
far better lecturer than me, and his lectures will be of much higher
quality than I can record. Some of his lectures are on a whiteboard
and some of them are him talking over his slides. I prefer him talking
over slides and the playlist on youtube can be found
here.
Somebody
has put the slides that he used on
here,
and here you can find notes on
the lectures. The following Github
repository
has his course notes, although they are a bit too advanced for this course.
Other Online Lectures
If you find some particular good online lectures, then please email
them to me.
So far:
- There is a good channel
DataCamp
that has lots of machine learning videos with some real Python
examples in that illustrate many of the concepts in this course.
Machine Learning Terminology
There are a lot of different terms in machine learning that refer to
the same thing. A good resource is
Other recommended text books
There are a lot of books on machine learning. Some are very
theoretical that go into great depth, and some contain a lot of hands
on material that you can use in your own projects. The following list
contains some books that I have found useful, together with links to
the online text at the university
library. The university library has a lot of
online resources on machine learning, as well as access to
O’Reilly’s book catalogue.
I used to put up direct links to the books at the University Library,
but these links are never stable. To find the books below such for
them. You can access the contents of books by logging in via
CAS. The system is a bit flaky, and sometimes you
might have to refresh the book’s page for the system to notice that
you have logged on.
More theoretical books
More practical books
Creating and using Python Virtual Environments
Windows users
These instructions are mostly for Linux or MacOS. If you are windows
there are other options such as AnaConda. You
also have problems with integrating with various windows IDEs. If you
are using windows then maybe these instructions are useful.
In a lot of my courses I encourage students to use python virtual
environments. Virtual environments are a great way of making sure
that you have the correct version of packages installed.
This is very short cheat sheet on how to set them up. I
will assume that we are using python 3. Luckily python 3 has virtual
environments set up. It is all in the
documentation, but then
sometimes people are too lazy to google, or do not know what to google
for.
To create a virtual environment in the current directory do the
following
If you look in the directory you’ll see a sub-directory env
this
contains all files that drive the virtual environment. In particular
there is a sub-directory env/bin/
Justins-MacBook-Air-2:Example justin$ ls env/bin/
Activate.ps1 easy_install pip3.9
activate easy_install-3.9 python
activate.csh pip python3
activate.fish pip3 python3.9
The most important file is the activate
script. To start your
virtual environment you simply do
source ./env/bin/activate
This executes the activate script. Notice that you prompt has changed
to (env)
You are now free to install what ever packages want. For exampe
pip3 install panda
pip3 install numpy
Once you have done all your python goodness you should leave your
virtual environment.
(env) Justins-MacBook-Air-2:Example justin$ deactivate
Setting up Python notebooks locally for Machine Learning
While there are many options for hosting python notebooks
for free remotely, but I prefer to do things locally. If you wish to do
the same then please read on. Note that I did home some problems
installing skikit-image
on a new Apple with M1 silicon, but after
upgrading the latest version of the operating system and doing and
updating my homebrew
setup everything works. Note that these
instructions are for using pip
. If you are using an alternative
package manager such as Anaconda
then you are on your own. I know
nothing about setting up Python on a windows machine.
Setting up the Virtual Environment
First it is better set up a virtual environment (see Creating and
Using Virtual Environments in Python . In a suitable place on your
machine do:
python3 -m venv env
source ./env/bin/activate
Note that you might be prompted to upgrade pip
. If you are, then
follow the instructions. You are only upgrading the version in your
local virtual environment.
Installing the relevant packages
When you prompt changes to (env)
you are inside your virtual
environment. You can now use pip
to install packages. For reasons
that I’m not going to explain your life is made a little easier if you
install wheel
You can now install various useful packages for machine learning
pip install numpy scipy matplotlib scikit-learn pandas scikit-image
treelib nltk
This might take a while. The package jupyter
drives the
notebooks, so install that
Emacs key bindings
I have been using the editor
Emacs for over 30 years. The
key binding are hardwired into my brain. If you are like me then
installing the jupyter-emacskeys
package will make your life a
little less frustrating.
pip install jupyter-emacskeys
Launching notebooks
You talk to Python notebooks using the web browser when you are
running them locally you start your own little server
If you have thing configured then a web browser might open up. If not
then look at the output on the terminal and you will see something
like:
[I 09:47:31.185 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 09:47:31.216 NotebookApp]
To access the notebook, open this file in a browser:
file:///Users/justin/Library/Jupyter/runtime/nbserver-49899-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36
or http://127.0.0.1:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36
So in this case you can point your browser at
http://127.0.0.1:8888/?token=83e4ad6deece18a89f8c244f28bb3268d115a4a047764f36
and you are ready to go.
Using Anaconda with Windows.
Disclaimer
These instructions come with no warranty. I do not use Windows. If you
have any questions about Windows then I cannot answer them. If you
have any constructive comments the please contact me.
Anaconda is a distribution of Python and R that includes many popular Data Science packages, such as Jupyter Notebook, NumPy, pandas, and scikit-learn. For numerous reasons, it is suggested for working with Jupyter Notebook.
Ease of installation: Anaconda comes with a built-in package
manager called conda, which makes it easy to install and manage
packages, including Jupyter Notebook. Anaconda also automatically
manages dependencies between packages, so you do not have to worry
about conflicts between different versions of packages.
Integrated development environment (IDE): Anaconda includes an
IDE called Anaconda Navigator, which provides a user-friendly
interface for managing your Python environment and launching
Jupyter Notebook.
Large collection of pre-installed packages: Anaconda comes with
a large collection of pre-installed packages, which can save you a
lot of time compared to installing each package individually.
Virtual environment management: Anaconda comes with conda which
is a powerful virtual environment manager that allows you to create
and manage multiple isolated environments, each with its own set of
packages. This can help you to avoid conflicts between packages and
maintain reproducibility of your projects.
Now that we know the advantages of using Anaconda, we proceed with a
detailed explanation of how to install anaconda and subsequently open
Jupyter notebooks through Anaconda. There are two options to do this.
First option
First, download the Anaconda distribution for Windows from
https://www.anaconda.com/products/distribution.
Once the download is complete, double-click the installer file to
begin the installation process.
Follow the prompts in the installer to complete the
installation. During this process, you will be prompted to select
the installation location. By default, Anaconda will be installed
in the C:\ProgramData\Anaconda3 folder, but you can choose a
different location if you prefer. The next screen is the Advanced
Installation Options. Here you have some additional options like if
you want to add anaconda to your PATH variable, and if you want to
register Anaconda as your default Python. Make sure you check the
“Add Anaconda to my PATH environment variable” option, it will make
your life easier.
Once the installation is complete, open the Start menu and search
for “Anaconda Prompt”. Click on the Anaconda Prompt shortcut to
open a command prompt window.
In the command prompt window, type the following command to verify
that Anaconda is installed correctly: conda –version
Next, we need to create a virtual environment in which to install
Jupyter Notebook. In the Anaconda prompt, type the following
command: conda create -n env_name , replacing env_name with the
name you want to give to your virtual environment.
After the environment is created, activate it by typing the
following command: conda activate env_name
Once the virtual environment is active, we can install Jupyter
Notebook by typing the following command: conda install jupyter
After Jupyter Notebook is installed, we can start the Jupyter
Notebook server by typing the following command in the Anaconda
prompt: jupyter notebook
This will open a web browser window with the Jupyter Notebook
interface. From here, you can create new notebooks, open existing
notebooks, and run code cells.
This is a step by step explanation of how to install Anaconda and how
to open Jupyter Notebook through the Anaconda prompt, however, with
the installation of Anaconda it comes Anaconda Navigator. This is a
graphical user interface (GUI) that allows you to easily manage your
Anaconda environments and packages, as well as launch Jupyter Notebook
and other applications. Let us see how to open Jupyter Notebook with
Anaconda Navigator (more intuitive than the first option).
Second option
Open the Anaconda Navigator application. You can find it by
searching for “Anaconda Navigator” in the Start menu (Windows) or
Spotlight (Mac).
Once the Anaconda Navigator window is open, you should see Jupyter
Notebook in the list of applications. Click on the “Launch” button
next to Jupyter Notebook.
A new tab will open in your default web browser, and you will be
taken to the Jupyter Notebook interface.
You should see the list of files and directories in the current
working directory, you can navigate to the desired folder and
create a new notebook by clicking the “New” button on the right
corner and select “Python 3”.