Lecture 4: Logistic Regression as Machine Learning
Today’s Topics
Today’s slides.
Logistic Regression
Logistic regression, overfitting and regularisation. Again Logistic regression is an algorithm that comes from statistics, but it can also seen as a machine learning algorithm. The hypothesis is very similar to linear regression is it a set of values that defines a linear function. The difference between logistic regression and linear regression is the linear function goes through a logistic function that works as a threshold function. Unlike linear regression it is not possible to solve the model exactly, and gradient descent is necessary.
There are a lot of ways of thinking about how logistic regression works.
- As a modification of linear regression to get $0$ or $1$ values to divide the data set into two halves, or two find a separating hyperplane between the two classes.
- As an estimator of the probability that point begins to one class or another.
- As a single neuron. You can see logistic regression as the beginning of neural networks.
Overfitting and Regularisation
Both linear and logistic regression can be improved with a regularisation term that avoids overfitting. You should try to begin to understand why overfitting is a problem and some strategies for avoiding it.
Reading Guide
Logistic Regression
- Hundred-Page Machine Learning Book Chapter 3 section 3.2.
Overfitting and Regularisation
- Hundred-Page Machine Learning Book Chapter 3 section 3.1.2 and Chapter 5 sections 5.4 and 5.5.
Multiclass classification.
- One-vs-Rest and One-vs-One an excellent article by Jason Brownlee.
Confusion Matrices
- Again the Hundred-Page Machine Learning Book Chapter 5 section 5.6 (but not 5.6.5 or 5.6.4).
What should I know by the end of this lecture?
- What is logistic regression and how does it differ from linear regression?
- What is the cost function? What does the logistic function do?
- How do I implement gradient descent for logistic regression?
- How does logistic regression relate to log-odds and what is it relationship with probability.
- What is overfitting?
- How does the regularisation term work in linear and logistic regression and how does it avoid overfitting.
- How do you use a binary classifier for multi-class classification? What is one-vs-all classification?
- What is a confusion matrix?