Eigen-everything

What and why

We will now deal with eigenvalues, eigenvectors, and eigenspaces of matrices. From the previous chapter, we know that matrices can be viewed as linear transformations. Eigenvalues, eigenvectors, and eigenspaces deal with special cases of transformations where the direction of the vector is preserved after the operation.

Eigenvalues and eigenvectors are important because they let us quickly find the steady state of a system, i.e. how the system looks in the long run. They are also invariant under changes of basis.

Fun fact: Google’s PageRank algorithm is based on eigenvalues and eigenvectors!

Quick overview of eigen-everything:

Definitions

An eigenvector of a linear transformation is a non-trivial vector that is only scaled when a linear transformation is applied to it. This means that the direction of the eigenvector is preserved after the operation.

In mathematical notation: Let v\vec{v} be an eigenvector of the matrix A\textbf{A}. Then Av=λv\textbf{A}\vec{v} = \lambda\vec{v} for a unique λR\lambda \in \mathbb{R}.

This equation says that the transformed vector Ax\textbf{A}\vec{x} is just a scaled version of the original vector v\vec{v}.

We also define the eigenvector to be non-trivial because A0=0\textbf{A}\vec{0} = \vec{0} for all matrices.

The eigenvalue associated with this eigenvector is the scalar quantity by which the eigenvector is scaled. In the above equation, λ\lambda would the eigenvalue associated with the vector v\vec{v}. The eigenvalue associated with an eigenvector can be 0.

The eigenspace of an eigenvalue is the span of all eigenvectors with this eigenvalue because any linear combination of the eigenvectors is still an eigenvector with the same eigenvalue. In the above equation the eigenspace Eλ=span{v}E_{\lambda} = \text{span}\{\vec{v}\} because there is one eigenvector v\vec{v} corresponding to λ\lambda.

Proof: Let v\vec{v} be an eigenvector of the matrix A\textbf{A}. Then Av=λv\textbf{A}\vec{v} = \lambda \vec{v} for a unique λ\lambda. Now, let’s scale v\vec{v} by a constant cc. A(cv)=c(Av)=c(λv)=λ(cv)\textbf{A}(c\vec{v}) = c(\textbf{A}\vec{v}) = c(\lambda \vec{v}) = \lambda (c\vec{v}) Therefore, λ\lambda is still the associated eigenvalue of cvc\vec{v}.

Calculation

Let us derive a method for solving for the eigenvalues, eigenvectors, and eigenspaces of a given matrix.

We know that for a given eigenvector v\vec{v}, Av=λv\textbf{A}\vec{v} = \lambda\vec{v}

However, we have 2 unknowns: v\vec{v} and λ\lambda, so we try to eliminate one of them.

We can re-write the equation as Av=λIv\textbf{A}\vec{v} = \lambda\textbf{I}\vec{v}

where I\textbf{I} is the identity matrix. We then subtract Av\textbf{A}\vec{v} from both sides.

λIvAv=0\lambda\textbf{I}\vec{v} - \textbf{A}\vec{v} = \vec{0}

Factoring out the v\vec{v}, we get

(λIA)v=0(\lambda\textbf{I} - \textbf{A})\vec{v} = \vec{0}

We observe that v\vec{v} is in the null space of the matrix λIA\lambda\textbf{I} - \textbf{A}. Since we know that any eigenvector v\vec{v} is non-trivial, we have to make sure that the null space λIA\lambda\textbf{I} - \textbf{A} is non-trivial as well.

Recall the Invertible Matrix Theorem, which states that if the null space of a matrix is non-trivial, its determinant must be 0. Therefore, to solve for the eigenvalues, we use the equation det(λIA)=0\text{det}(\lambda\textbf{I} - \textbf{A}) = 0

det(λIA)=0\text{det}(\lambda\textbf{I} - \textbf{A}) = 0 is called the the characteristic polynomial of A\textbf{A} because it is a polynomial in λ\lambda. To find the eigenvalues of a matrix, we find the roots of the characteristic polynomial det(λIA)=0\text{det}(\lambda\textbf{I} - \textbf{A}) = 0.

Sometimes, an eigenvalue λi\lambda_i can appear multiple times as a root. We define the algebraic multiplicity of the eigenvalue λi\lambda_i to be the number of times λi\lambda_i appears as a root in the characteristic polynomial.

Notes:

  1. Observe what happens if an eigenvalue of A\textbf{A} is 0. Then, det(0IA)=det(A)=0    det(A)=0\text{det}(0\textbf{I} - \textbf{A}) = \text{det}(-\textbf{A}) = 0 \implies \text{det}(\textbf{A}) = 0. Therefore, a matrix A\textbf{A} has an eigenvalue of 0 if and only if A\textbf{A} is non-invertible.

  2. If A\textbf{A} is a triangular matrix, we can easily find the eigenvalues of A\textbf{A}. Assume that d1,d2,,dnd_1, d_2, \ldots, d_n are the entries along the diagonal in such a matrix. From cofactor expansion, we know that det(A)=d1d2dn\text{det}(\textbf{A}) = d_1d_2\ldots d_n. This means that det(λIA)=(λd1)(λd2)(λdn)=0\text{det}(\lambda\textbf{I} - \textbf{A}) = (\lambda - d_1)(\lambda - d_2)\cdots(\lambda - d_n) = 0, so d1,d2,,dnd_1, d_2, \ldots, d_n are the roots of the characteristic polynomial, i.e. the eigenvalues. The eigenvalues of a triangular matrix are its diagonal entries.

Once we have all of the eigenvalues λ1,λ2,,λn\lambda_1, \lambda_2, \ldots, \lambda_n of a A\textbf{A}, we can find the eigenvectors associated with each eigenvalue by using the equation (λiIA)v=0(\lambda_i\textbf{I} - \textbf{A})\vec{v} = \vec{0}. In fact, we are basically finding the basis vectors for the null space of λiIA\lambda_i\textbf{I} - \textbf{A} for i=1,2,,ni = 1, 2, \ldots, n.

If the dimension of the null space of λiIA\lambda_i\textbf{I} - \textbf{A} for an eigenvalue λi\lambda_i is larger than 1, we will expect to have multiple eigenvectors associated with the same eigenvalue λi\lambda_i.

We therefore define the geometric multiplicity of an eigenvalue to be the dimension of the associated null space of λiIA\lambda_i\textbf{I} - \textbf{A}.

Example

Let us try to find the eigenvalues, eigenvectors, and eigenspaces of the matrix A=[112002013]\textbf{A} = \begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix}.

We first find the roots of the characteristic polynomial.

det(λIA)=det([λ000λ000λ][112002013])=det([λ1120λ201λ3])=0\text{det}(\lambda\textbf{I} - \textbf{A}) = \text{det}\left(\begin{bmatrix}\lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda\end{bmatrix} - \begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix}\right) = \text{det}\left(\begin{bmatrix}\lambda - 1 & 1 & -2 \\ 0 & \lambda & -2 \\ 0 & 1 & \lambda - 3\end{bmatrix}\right) = 0

Using cofactor expansion along the first column, we get

(λ1)(λ(λ3)+2)=0(\lambda - 1)(\lambda(\lambda - 3) + 2) = 0

(λ1)(λ23λ+2)=0(\lambda - 1)(\lambda^2 - 3\lambda + 2) = 0

(λ1)(λ1)(λ2)=(λ1)2(λ2)=0(\lambda - 1)(\lambda - 1)(\lambda - 2) = (\lambda - 1)^2(\lambda - 2) = 0

We see that A\textbf{A} has the eigenvalues λ1=1\lambda_1 = 1 with algebraic multiplicity of 2 and λ2=2\lambda_2 = 2 with algebraic multiplicity of 1.

We then find the basis vectors for the null space of λiIA\lambda_i\textbf{I} - \textbf{A} for each eigenvalue.

λ1=1\lambda_1 = 1:

Null(IA)=Null([012012012])=span{[100],[121]}\text{Null}(\textbf{I} - \textbf{A}) = \text{Null}\left(\begin{bmatrix}0 & 1 & -2 \\ 0 & 1 & -2 \\ 0 & 1 & -2\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}\right\}

Therefore, the eigenvectors associated with λ1=1\lambda_1 = 1 are [100]\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} and [121]\begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}. Since there are 2 eigenvectors associated with λ1=1\lambda_1 = 1, we say that the eigenvalue λ1=1\lambda_1 = 1 has a geometric multiplicity of 2. The eigenspace corresponding to λ1=1\lambda_1 = 1 is E1=span{[100],[121]}E_1 = \text{span}\left\{\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}\right\}.

λ2=2\lambda_2 = 2:

Null(2IA)=Null([112022011])=Null([101011000])=span{[111]}\text{Null}(2\textbf{I} - \textbf{A}) = \text{Null}\left(\begin{bmatrix}1 & 1 & -2 \\ 0 & 2 & -2 \\ 0 & 1 & -1\end{bmatrix}\right) = \text{Null}\left(\begin{bmatrix}1 & 0 & -1 \\ 0 & 1 & -1 \\ 0 & 0 & 0\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}\right\}

[111]\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix} is the eigenvector associated with λ2=2\lambda_2 = 2, which has a geometric multiplicity of 2. E2=span{[111]}E_2 = \text{span}\left\{\begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}\right\}.

Diagonalization

What and why

Imagine a matrix A\textbf{A} represented a state transition matrix, and we want to find out the steady state. We would have to calculate limkAk\lim_{k \to \infty} \textbf{A}^k, but matrix multiplication is a very tedious process. If we could express A\textbf{A} as a product of 3 matrices, PDP1\textbf{P}\textbf{D}\textbf{P}^{-1}, where P\textbf{P} is an invertible matrix and D\textbf{D} is a diagonal matrix, then we could easily find the limit. Assume A=PDP1\textbf{A} = \textbf{P}\textbf{D}\textbf{P}^{-1}, that is A\textbf{A} is diagonalizable.

Ak=(PDP1)k=PDP1PDP1PDP1=PD(P1P)D(P1DP)(P1DP)DP1=PDkP1\textbf{A}^k = (\textbf{P}\textbf{D}\textbf{P}^{-1})^k = \textbf{P}\textbf{D}\textbf{P}^{-1}\textbf{P}\textbf{D}\textbf{P}^{-1} \cdots \textbf{P}\textbf{D}\textbf{P}^{-1} = \textbf{P}\textbf{D}(\textbf{P}^{-1}\textbf{P})\textbf{D}(\textbf{P}^{-1}\textbf{D}\textbf{P}) \cdots (\textbf{P}^{-1}\textbf{D}\textbf{P})\textbf{D}\textbf{P}^{-1} = \textbf{P}\textbf{D}^k\textbf{P}^{-1}

Calculating Dk\textbf{D}^k is very easy; in fact, for any diagonal matrix D=[d1000d20000dn]\textbf{D} = \begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}, Dk=[d1k000d2k0000dnk]\textbf{D}^k = \begin{bmatrix}d_1^k & 0 & \cdots & 0 \\ 0 & d_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n^k\end{bmatrix}, so we just need to find the kk-th power of each diagonal entry in D\textbf{D}.

Calculation

Let ARn×n\textbf{A} \in \mathbb{R}^{n \times n}, D=[d1000d20000dn]\textbf{D} = \begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}, and P=[p1p2pn]\textbf{P} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}.

A=PDP1\textbf{A} = \textbf{P}\textbf{D}\textbf{P}^{-1}

AP=PD\textbf{A}\textbf{P} = \textbf{P}\textbf{D}

A[p1p2pn]=[p1p2pn][d1000d20000dn]\textbf{A}\begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n \\ \end{bmatrix} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & d_n\end{bmatrix}

We then get nn equations:

Ap1=d1p1\textbf{A}\vec{p}_1 = d_1\vec{p}_1

Ap2=d2p2\textbf{A}\vec{p}_2 = d_2\vec{p}_2

\vdots

Apn=dnpn\textbf{A}\vec{p}_n = d_n\vec{p}_n

We recognize these equations as the definitions of eigenvectors and eigenvalues. Therefore, the diagonal matrix D\textbf{D} is just a matrix with the eigenvalues of A\textbf{A} along its diagonal and the invertible P\textbf{P} is just a matrix with the eigenvectors of P\textbf{P} as column vectors.

If (λ1,v1),(λ2,v2),,(λn,vn)(\lambda_1, \vec{v}_1), (\lambda_2, \vec{v}_2), \ldots, (\lambda_n, \vec{v}_n) are the eigenpairs of the matrix A\textbf{A}, then D=[λ1000λ2000λn]\textbf{D} = \begin{bmatrix}\lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n\end{bmatrix} and P=[v1v2vn]\textbf{P} = \begin{bmatrix}\vec{v}_1 & \vec{v}_2 & \cdots & \vec{v}_n\end{bmatrix}.

Note: Order matters! The eigenvector vi\vec{v}_i in the ii-th column of P\textbf{P} must correspond to the eigenvalue λi\lambda_i in the ii-th column of D\textbf{D}.

We would then need to find P1\textbf{P}^{-1} by finding the inverse of P\textbf{P}.

Let's now see what happens if we want to calculate Ak\textbf{A}^k. For this, we will use the row vectors of P1\textbf{P}^{-1}, which we will define as P1=[q1Tq2TqnT]\textbf{P}^{-1} = \begin{bmatrix}\vec{q}_1^T \\ \vec{q}_2^T \\ \vdots \\ \vec{q}_n^T\end{bmatrix}. We know that

Ak=[p1p2pn][λ1k000λ2k000λnk][q1Tq2TqnT]=[p1p2pn][λ1kq1Tλ2kq2TλnkqnT]\textbf{A}^k = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}\lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k\end{bmatrix}\begin{bmatrix}\vec{q}_1^T \\ \vec{q}_2^T \\ \vdots \\ \vec{q}_n^T\end{bmatrix} = \begin{bmatrix}\vec{p}_1 & \vec{p}_2 & \cdots & \vec{p}_n\end{bmatrix}\begin{bmatrix}\lambda_1^k\vec{q}_1^T \\ \lambda_2^k\vec{q}_2^T \\ \vdots \\ \lambda_n^k\vec{q}_n^T\end{bmatrix}

Ak=λ1kp1q1T+λ2kp2q2T++λnkpnqnT\textbf{A}^k = \lambda_1^k\vec{p}_1\vec{q}_1^T + \lambda_2^k\vec{p}_2\vec{q}_2^T + \cdots + \lambda_n^k\vec{p}_n\vec{q}_n^T

We could express Ak\textbf{A}^k as a sum of matrices piqiT\vec{p}_i\vec{q}_i^T scaled by the respective eigenvalue to the kk-th power λik\lambda_i^k. If λi<1\lambda_i < 1, then limkλik=0\lim_{k \to \infty} \lambda_i^k = 0, so we could basically eliminate the matrix associated with λi<1\lambda_i < 1 from the above equation because its scaling factor λik\lambda_i^k approaches 00.

Example

Taking our previous example, we have the following eigenvalues and eigenvectors.

λ1=1:[100],[121]\lambda_1 = 1: \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix}

λ2=2:[111]\lambda_2 = 2: \begin{bmatrix}1 \\ 1 \\ 1\end{bmatrix}

We can therefore diagonalize [112002013]\begin{bmatrix}1 & -1 & 2 \\ 0 & 0 & 2 \\ 0 & -1 & 3\end{bmatrix} as PDP1\textbf{P}\textbf{D}\textbf{P}^{-1}, where D=[100010002]\textbf{D} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2\end{bmatrix} and P=[111021011]\textbf{P} = \begin{bmatrix}1 & 1 & 1 \\ 0 & 2 & 1 \\ 0 & 1 & 1\end{bmatrix}. Finally, we have to calculate P1\textbf{P}^{-1}.

Also note that there are multiple ways to diagonalize the above matrix. We could have just as well diagonalized it where D=[100020001]\textbf{D} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 1\end{bmatrix} and P=[111210110]\textbf{P} = \begin{bmatrix}1 & 1 & 1 \\ 2 & 1 & 0 \\ 1 & 1 & 0\end{bmatrix}. Just make such that the eigenvector in the first column of P\textbf{P} is associated with the eigenvalue in the first column of D\textbf{D}, that the eigenvector in the second column of P\textbf{P} is associated with the eigenvalue in the second column of D\textbf{D}, etc.

Diagonalizability

We note that a matrix ARn×n\textbf{A} \in \mathbb{R}^{n \times n} is only diagonalize if it is square and has exactly n eigenvectors. This leads to two conclusions about the diagonalizability of a matrix A\textbf{A}:

  1. If A\textbf{A} has nn unique eigenvalues, then it must have nn eigenvectors, so it is automatically diagonalizable.

  2. If the sum of the geometric multiplicities of all eigenvalues of A\textbf{A} equals nn, i.e. A\textbf{A} has nn eigenvectors, then A\textbf{A} is diagonalizable.

Note: Even if the sum of the algebraic multiplicities equals nn, this does not mean the A\textbf{A} has nn eigenvectors. Take A=[1101]\textbf{A} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix} for example. The characteristic equation is (λ1)2=0(\lambda - 1)^2 = 0, so λ1=1\lambda_1 = 1 has an algebraic multiplicity of 2. However, Null([0100])=span{[10]}\text{Null}\left(\begin{bmatrix}0 & 1 \\ 0 & 0\end{bmatrix}\right) = \text{span}\left\{\begin{bmatrix}1 \\ 0\end{bmatrix}\right\}, so the geometric multiplicity of λ1=1\lambda_1 = 1 is only 1. In general, number of columns in Asum of algebraic multiplicitiessum of geometric multiplicities\text{number of columns in } \textbf{A} \geq \text{sum of algebraic multiplicities} \geq \text{sum of geometric multiplicities}

Last updated