Description

Let's try to understand how least squares works in one dimension.

In this example, we're trying to predict $\vec{a}$ based off of $\vec{b}$ . However, $\vec{a}$ has components that are not in $\text{span}\{\vec{b}\}$ , making it impossible to estimate $\vec{a}$ completely accurately. Our best estimate would be $\vec{a_1}$ , i.e. the component of $\vec{a}$ that is in $\text{span}\{\vec{b}\}$ . The error between $\vec{a}$ and $\vec{a}_1$ is denoted by $\vec{a}_2$ , which is perpendicular to $\vec{b}$ as that minimizes its norm and the error.

We know that $\vec{a}_1$ must be a multiple of $\vec{b}$ as it is in $\text{span}\{\vec{b\}}$ . This means that $\vec{a}_1 = x \vec{b}$ where $x$ is a constant. Let's try solving for $x$ . Since $\vec{a}_2$ is orthogonal to $\vec{b}$ , we know that their dot product is $0$ . Hence,

$\vec{a}_2 \cdot \vec{b} = (\vec{a} - \vec{a}_1) \cdot \vec{b} = (\vec{a} - x \vec{b}) \cdot \vec{b} = 0$

Rearranging this equation to solve for $x$ gives us $x = (\vec{a}^T\vec{a})^{-1}\vec{a}^T\vec{b}$ .

This same concept can be applied to higher dimensions as well by simply replacing $\vec{a}$

by $\textbf{A}$ where

$\textbf{A} = \begin{bmatrix}\vec{a}_1 & \cdots & \vec{a}_n\end{bmatrix}$

The general form for least squares is therefore

$\textbf{A}^T\textbf{A}\vec{x} = \textbf{A}^T\vec{b}$ or $\vec{x} = (\textbf{A}^T\textbf{A})^{-1}\textbf{A}^T\vec{b}$ .

Note: Refer to the lecture notes to see the general derivation.

PreviousLeast Squares NextExample Problems

Last updated 4 years ago