Supervised learning

3.2 What is supervised learning?

Computers are exact, and they can perform billions of well-defined operations accurately and without errors. Writing software consists of specifying precise rules on how to process information.

This, however, stands in stark contrast to the real world, where the concepts and rules are changing, noisy, and even inconsistent. For example, imagine you are trying to create a program for identifying a cat. Now, you might be able to create rules to look for certain patterns or body parts, such as ears, fur, and four legs. But even these sub-concepts are extremely vague. Cats vary in breed, colour, and under different angles, some parts of the body cannot be seen. There is no precise rule. The task seems extremely hard for a computer, yet to teach a child a few pictures would suffice. It is often easier to show what we mean than to describe it. Done properly, machine learning allows us to step away from precise rules, and just show what we want.

Supervised learning captures the idea of learning from examples. In supervised learning, the learning algorithm is given input-output pairs - examples - from which it tries to learn a function mapping inputs to outputs.

The classic example is identifying whether there is a cat or a dog in a given photo. This can be formulated as a supervised learning problem: the inputs would be the photos themselves, and the outputs would be labels (the right answer), cat or dog.

Let's formulate these concepts mathematically. Formally, the inputs to the algorithm are denoted by $x$ and the corresponding outputs as $y$ . Note, $x$ and $y$ are usually high-dimensional vectors or matrices. We also use $X$ and $Y$ as the sets of all possible inputs and outputs, respectively. Provided examples, $D$ , can now be expressed as a set:

D = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{n}, y_{n})}

The main goal of learning is to learn a function $f : X \to Y$ , that maps inputs to the corresponding outputs. Obviously, we would want $f (x_{i}) \approx y_{i}$ for each $(x_{i}, y_{i}) \in D$ . But, it is crucial that for the inputs outside the ones provided as examples the function $f$ still predicts correctly. Otherwise, simple memorisation of $D$ would be considered as learning. We will consider the problem of generalisation again a bit later.

The set of possible outputs, $Y$ , can be finite or infinite. When $Y$ is finite and preferably small, we say that it is a classification problem. Predicting what animal appears in the picture is an example of classification.

Alternatively, we might want to predict a number, or a real vector, in which case it is a regression problem. For example, predicting stock prices, $y$ , given the time of day, $x$ .

The line between classification and regression can get a little blurry, with the same problem being represented in different ways. Algorithms for classification often predict the probability for each class present in $Y$ , and since the probability is a real number it could be considered a regression problem. Output probabilities can also be used as a measure of confidence, with high probability indicating that the algorithm is certain about its prediction.

Pause and reflect

In this section how the learning is done doesn't matter. We focus only on the problem itself, not the algorithm that solves it.

The learning can be done with decision trees, deep learning methods, etc. For now, the actual learning algorithm is best thought of as a black box.

Individual activity: Classification vs. regression

Question 1: We have seen how a classification problem can be solved as a regression problem. Can the reverse be true?

Question 2: Consider the problem of predicting rent prices, where the prices vary from £500-£5000. How would you formulate it as a classification problem?

Reveal

Answer: The prices can be discretised (i.e. the values made discrete as opposed to continuous) into non-overlapping buckets. For instance, £100 can be a separate bucket. The question to consider is how small should a bucket be, as we might want to avoid having too many categories, while at the same time we want to be precise.

Discussion activity: Supervised learning problems

Consider the problem of engineering a self-driving car.

How can this problem be addressed using supervised learning?

Hint: Dividing the problem into smaller chunks can help!

Post your thoughts in the Supervised learning forum to discuss with your classmates and tutor.