Description

Let's try to understand how least squares works in one dimension.

In this example, we're trying to predict a\vec{a} based off of b\vec{b}. However, a\vec{a} has components that are not in span{b}\text{span}\{\vec{b}\}, making it impossible to estimate a\vec{a} completely accurately. Our best estimate would be a1\vec{a_1}, i.e. the component of a\vec{a} that is in span{b}\text{span}\{\vec{b}\}. The error between a\vec{a} and a1\vec{a}_1 is denoted by a2\vec{a}_2, which is perpendicular to b\vec{b} as that minimizes its norm and the error.

We know that a1\vec{a}_1 must be a multiple of b\vec{b} as it is in span{b}\text{span}\{\vec{b\}}. This means that a1=xb\vec{a}_1 = x \vec{b} where xx is a constant. Let's try solving for xx. Since a2\vec{a}_2 is orthogonal to b\vec{b}, we know that their dot product is 00. Hence,

a2b=(aa1)b=(axb)b=0\vec{a}_2 \cdot \vec{b} = (\vec{a} - \vec{a}_1) \cdot \vec{b} = (\vec{a} - x \vec{b}) \cdot \vec{b} = 0

Rearranging this equation to solve for xx gives us x=(aTa)1aTbx = (\vec{a}^T\vec{a})^{-1}\vec{a}^T\vec{b}.

This same concept can be applied to higher dimensions as well by simply replacing a\vec{a}

by A\textbf{A} where

A=[a1an]\textbf{A} = \begin{bmatrix}\vec{a}_1 & \cdots & \vec{a}_n\end{bmatrix}

The general form for least squares is therefore

ATAx=ATb\textbf{A}^T\textbf{A}\vec{x} = \textbf{A}^T\vec{b} or x=(ATA)1ATb\vec{x} = (\textbf{A}^T\textbf{A})^{-1}\textbf{A}^T\vec{b}.

Note: Refer to the lecture notes to see the general derivation.

Last updated