Lecture 7: Clustering and Nearest Neighbours

Today’s Topics

In this segment you are looking at some unsupervised algorithms, as well as one supervised learning method (K-nearest neighbours) The main unsupervised algorithms are Hierarchical clustering, K-means clustering and DBSCAN. It is important not to get K-nearest neighbours and K-means clustering confused.

The K-means algorithm works by gradient descent. Unlike a lot of the algorithms that we have been looking at, K-means often suffers from the problem of many local minima. In Andrew Ng’s lectures you will meet various ways of dealing with local minima.

If you have taken Algorithms and Data Structures II(1DL231) or AD3 (1DL481), then you will have met the concept of NP-hardness. K-means clustering is NP-hard (see the reference below). This means that the problem is not easy to solve. If you could guarantee that there would only be one global minimum then gradient descent would be an efficient algorithm. This implies that there will always often be local minima in K-means clustering.

Slides

I used these slides in the lecture.

Reading Guide

K-Means Clustering

Hundred-Page Machine Learning Book Chapter 3 section 3.5 and Chapter 9 all of 9.2
Chapter 7 of A Hands-On Introduction to Machine Learning.
Chapter 6 (6.1 and 6.2) of A First Course in Machine Learning. The link takes you to the electronic copy in the library.
This is only for reference: NP Hardness of K-means clustering. You don’t need to understand the proof, although you should be aware of its implications. No greedy/gradient descent algorithm for K-means is going to be exact.

K-Nearest Neighbours

The Wikipedia page on K-Nearest Neighbours is a good starting point.

What should I know by the end of this lecture?

What are some of the applications of clustering?
What is hierarchical clustering and what algorithms are there?
How does the K-means algorithm work? What is the cost function?
What is a local optima and why is it a problem with the K-means algorithm?
What are some approaches to choosing the number of clusters in K-means?
How does the K-nearest neighbour algorithm work and what are some of its applications?
What is DBSCAN and how does it work?