Foundations of Numerical Linear Algebra

Graduate Quantitative Economics and Datascience

Jesse Perla

University of British Columbia

Going Beyond “reg y x, robust

  • Data science, econometrics, and macroeconomics are built on linear algebra.
  • Numerical linear algebra has all sorts of pitfalls, which become more critical as we scale up to larger problems.
  • Speed differences in choosing better algorithms can be orders of magnitude.
  • Crucial to know what goes on under-the-hood in Stata/R/python packages for applied work, even if you don’t implement it yourself.

Extra Materials

Packages

This section uses the following packages:

import numpy as np
import matplotlib.pyplot as plt
import scipy
from numpy.linalg import cond, matrix_rank, norm
from scipy.linalg import inv, solve, det, eig, lu, eigvals
from scipy.linalg import solve_triangular, eigvalsh, cholesky

Basic Computational Complexity

Big-O Notation

For a function \(f(N)\) and a positive constant \(C\), we say \(f(N)\) is \(O(g(N))\), if there exist positive constants \(C\) and \(N_0\) such that:

\[ 0 \leq f(N) \leq C \cdot g(N) \quad \text{for all } N \geq N_0 \]

  • Often crucial to know how problems scale asymptotically (as \(N\to\infty\))
  • Caution! This is only an asymptotic limit, and can be misleading for small \(N\)
    • \(f_1(N) = N^3 + N\) is \(O(N^3)\)
    • \(f_2(N) = 1000 N^2 + 3 N\) is \(O(N^2)\)
    • For roughly \(N>1000\) use \(f_2\) algorithm, otherwise \(f_1\)

Examples of Computational Complexity

  • Simple examples:
    • \(x \cdot y = \sum_{n=1}^N x_n y_n\) is \(O(N)\) since it requires \(N\) multiplications and additions
    • \(A x\) for \(A\in\mathbb{R}^{N\times N},x\in\mathbb{R}^N\) is \(O(N^2)\) since it requires \(N\) dot products, each \(O(N)\)

Numerical Precision

Machine Epsilon

For a given datatype, \(\epsilon\) is defined as \(\epsilon = \min_{\delta > 0} \left\{ \delta : 1 + \delta > 1 \right\}\)

  • Computers have finite precision. 64-bit typical, but 32-bit on GPUs
print(f"machine epsilon for float64 = {np.finfo(float).eps}")
print(f"1 + eps/2 == 1? {1.0 + 1.1e-16 == 1.0}")
print(f"machine epsilon for float32 = {np.finfo(np.float32).eps}")
machine epsilon for float64 = 2.220446049250313e-16
1 + eps/2 == 1? True
machine epsilon for float32 = 1.1920928955078125e-07

Basic Linear Algebra

Norms

  • Common measure of size is the Euclidean norm, or \(L^2\) norm for \(x \in \mathbb{R}^2\)
  • Complexity is \(O(N)\), square \(N\) times then \(N\) additions \[ ||x||_2 = \sqrt{\sum_{n=1}^N x_n^2} \]
x = np.array([1, 2, 3]) # Calculating different ways (in order of preference)
print(np.sqrt(sum(xval**2 for xval in x))) # manual with comprehensions
print(np.sqrt(np.sum(np.square(x)))) # broadcasts
print(norm(x)) # built-in to numpy norm(x, ord=2) alternatively
print(f"||x||_2^2 = {norm(x)**2} = {x.T @ x} = {np.dot(x, x)}")
3.7416573867739413
3.7416573867739413
3.7416573867739413
||x||_2^2 = 14.0 = 14 = 14

Solving Systems of Equations

  • Solving \(A x = b\) for \(x\) is equivalent \(A^{-1} A x = A^{-1} b\)
  • Then since \(A^{-1}A = I\), and \(I x = x\), we have \(x = A^{-1} b\)
  • Careful since matrix algebra is not commutative!
A = np.array([[0, 2],
              [3, 4]]) # or ((0, 2), (3, 4))
b = np.array([2,1])  # Column vector
x = solve(A, b)  # Solve Ax = b for x
x
array([-1.,  1.])

Using the Inverse Directly

  • Can replace the solve with a calculation of an inverse
  • But it can be slower or less accurate than solving the system directly
A_inv = inv(A)
A_inv @ b # i.e, A^{-1} * b
array([-1.,  1.])

Linear Combinations

We can think of solving a system as finding the linear combination of columns of \(A\) that equal \(b\)

b_star = x[0] * A[:, 0] + x[1] * A[:, 1] # using x solution
print(f"b = {b}, b_star = {b_star}")
b = [2 1], b_star = [2. 1.]

Column Space and Rank

  • The column space of a matrix represents all possible linear combinations of its columns.
  • It forms a basis for the space of solutions when solving systems of linear equations represented by the matrix
  • The rank of a matrix is the dimension of its column space
A = np.array([[0, 2],
              [3, 4]])
matrix_rank(A)
np.int64(2)

Hence, can solve \(A x = b\) for any \(b \in \mathbb{R}^2\) since the column space is the entire space \(\mathbb{R}^2\)

Singular Matrices

On the other hand, note

A = np.array([[1, 2],
              [2, 4]])
matrix_rank(A)
np.int64(1)

So we can only solve \(A x = b\) for \(b \propto \begin{bmatrix} 1 \\ 2 \end{bmatrix} \propto \begin{bmatrix} 2 \\ 4 \end{bmatrix}\)

Checking Singularity

A = np.array([[1, 2],
              [2, 4]])
# An (expensive) way to check if A is singular is if det(A) = 0
print(det(A) == 0.0)
print(matrix_rank(A) != A.shape[0]) # or check rank
# Check before inverting or use exceptions
try:
    inv(A)
    print("Matrix is not singular (invertible).")
except np.linalg.LinAlgError:
    print("Matrix is singular (non-invertible).")
True
True
Matrix is singular (non-invertible).

Determinant is Not Scale Invariant

  • Reminder: numerical precision in calculations makes it hard to compare to zero
  • The determinate is useful but depends on the scale of the matrix
  • A more robust alternative is the condition number (more next lecture)
eps, K = 1e-8, 100000
A = np.array([[1, 2],
              [1 + eps, 2 + eps]]) # not 1 + eps, 2(1+eps)!
print(f"det(A)={det(A):.5g}, det(K*A)={det(K*A):.5g}")
print(f"cond(A)={cond(A):.5g}, cond(K*A)={cond(K*A):.5g},")
print(f"det(inv(A))={det(inv(A)):.5g}, cond(inv(A))={cond(inv(A)):.5g}")
det(A)=-1e-08, det(K*A)=-100
cond(A)=1e+09, cond(K*A)=1e+09,
det(inv(A))=-1e+08, cond(inv(A))=1e+09

Interpreting Condition Numbers

  • The condition number of the matrix \(A\) is \(\kappa(A) = ||A|| \cdot ||A^{-1}||\), which can be shown in terms of ratio of the largest and smallest eigenvalues
    • \(\kappa(A) = \left| \frac{\lambda_{\text{max}}}{\lambda_{\text{min}}} \right|\) for \(\lambda\) the eigenvalues of \(A\). More soon!
  • Crude intuition: for machine epsilon \(\epsilon_{\text{mach}}\) when calculating some \(x\)
    • The relative error, \(||x - x_{\text{approx}}||/||x||\) is roughly \(\kappa(A) \cdot \epsilon_{\text{mach}}\)
    • Solving \(A x = b\) when \(\epsilon_{\text{mach}} = 1e^{-16}\) it amplifies errors in \(b\), etc.
    • if \(\kappa(A) \approx 1e^{16}\) errors amplified so the scale of 100% relative error

Rules of Thumb

  • Rule of thumb for standard floating points where \(\epsilon_{\text{mach}} = 1e^{-16}\):
    • \(\kappa(A) \approx 1\) well-conditioned
    • \(\kappa(A) < 100\) fairly well-conditioned
    • \(\kappa(A) < 1e^{5}\) moderately ill-conditioned. Take care
    • \(\kappa(A) < 1e^{8}\) ill-conditioned and might introduce significant errors, especially in algorithms which repeatedly use the same calculations
    • \(\kappa(A) > 1e^{8}\) very ill-conditioned and likely to introduce significant errors
  • Choose solution algorithms based on “numerical stability” and “conditioning” when worried
  • Much more extreme with 32-bit floats such as when using GPUs.

Solving Linear Systems of Equations

Solving Systems with Multiple RHS

  • Inverse is nice because you can reuse the \(A^{-1}\) to solve \(A x = b\) for many \(b\)
  • However, you can do this with solve as well
  • Or can reuse LU factorizations (discussed next)
A = np.array([[0, 2],
              [3, 4]])
B = np.array([[2, 3],
              [1, 2]])  # [2,1] and [3,2] as columns
# or: B = np.column_stack([np.array([2, 1]),np.array([3,2])])
X = solve(A, B)  # Solve AX = B for X
print(X)
print(f"Checking: A*{X[:,0]} = {A@X[:, 0]} = {B[:,0]}, column of B")
[[-1.         -1.33333333]
 [ 1.          1.5       ]]
Checking: A*[-1.  1.] = [2. 1.] = [2 1], column of B

LU(P) Decompositions

  • We can “factor” any square \(A\) into \(P A = L U\) for triangular \(L\) and \(U\). Invertible can have \(A = L U\), called the LU decomposition. “P” is for partial-pivoting
  • Singular matrices may not have full-rank \(L\) or \(U\) matrices
A = np.array([[1, 2],
              [2, 4]])
P, L, U = lu(A)
print(f"L*U =\n{L @ U}")
print(f"P*A =\n{P @ A}")
L*U =
[[2. 4.]
 [1. 2.]]
P*A =
[[2. 4.]
 [1. 2.]]

P, U, and L

The \(P\) matrix is a permutation matrix of “pivots” the others are triangular

print(f"P =\n{P}")
print(f"L =\n{L}")
print(f"U =\n{U}")
P =
[[0. 1.]
 [1. 0.]]
L =
[[1.  0. ]
 [0.5 1. ]]
U =
[[2. 4.]
 [0. 0.]]

LU Decompositions and Systems of Equations

  • Pivoting is typically implied when talking about “LU”
  • Used in the default solve algorithm (without more structure)
  • Solving systems of equations with triangular matrices: for \(A x = L U x = b\)
    1. Define \(y = U x\)
    2. Solve \(L y = b\) for \(y\) and \(U x = y\) for \(x\)
  • Since both are triangular, process is \(O(N^2)\) (but LU itself \(O(N^3)\))
  • Could be used to find inv
    • \(A = L U\) then \(A A^{-1} = I = L U A^{-1} = I\)
    • Solve for \(Y\) in \(L Y = I\), then solve \(U A^{-1} = Y\)
  • Tight connection to textbook Gaussian elimination (including pivoting)

LU for Non-Singular Matrices

A = np.array([[1, 2],
              [3, 4]])
P, L, U = lu(A)
print(f"L*U =\n{L @ U}")
print(f"P*A =\n{P @ A}")
L*U =
[[3. 4.]
 [1. 2.]]
P*A =
[[3. 4.]
 [1. 2.]]

L, U, P

print(f"P =\n{P}")
print(f"L =\n{L}")
print(f"U =\n{U}")
P =
[[0. 1.]
 [1. 0.]]
L =
[[1.         0.        ]
 [0.33333333 1.        ]]
U =
[[3.         4.        ]
 [0.         0.66666667]]

Backwards Substitution Example

\[ \begin{aligned} U x &= b\\ U &\equiv \begin{bmatrix} 3 & 1 \\ 0 & 2 \\ \end{bmatrix}, \quad b = \begin{bmatrix} 7 \\ 2 \\ \end{bmatrix} \end{aligned} \]

Solving bottom row for \(x_2\)

\[ 2 x_2 = 2,\quad x_2 = 1 \]

Move up a row, solving for \(x_1\), substituting for \(x_2\)

\[ 3 x_1 + 1 x_2 = 7,\quad 3 x_1 + 1 \times 1 = 7,\quad x_1 = 2 \]

Generalizes to many rows. For \(L\) it is “forward substitution”

Use Triangular Structure if Possible

  • Triangular matrices of size \(N\) can be solved with back substitution in \(O(N^2)\)
  • Is \(O(N^2)\) good or bad? Beats, \(O(N^3)\) typical of general methods
U = np.array([[3, 1],
              [0, 2]])
b = np.array([7,2])
solve(U,b) # works, but internally does an LU which is O(N^3)
solve_triangular(U, b, lower=False) # fast O(N^2)
array([2., 1.])

Symmetric and Positive Definite Matrices

Symmetric Matrix Structure

Another common matrix type are symmetric, \(A = A^{T}\)

A = np.array([[1, 2],
              [2, 5]]) # also posdef, not singular
b = np.array([1,4])
# With scipy 1.11.3 check with scipy.linalg.issymmetric(A)
solve(A, b, assume_a="sym") # could also use "pos" since positive definite
array([-3.,  2.])

Positive Definite Matrices

  • A symmetric matrix \(A\) is positive definite if \(x^T A x > 0\) for all \(x \neq 0\)
  • Useful in many areas, such as covariance matrices. Example
A = np.array([[1, 2],
              [2, 5]])
x = np.array([0, 1]) # can't really check for all x
print(f"x^T A x = {x.T @ A @ x}")
x^T A x = 5
  • Example of a symmetric matrix that is not positive definite
A = np.array([[1, 2],
              [2, 0]])
print(f"x^T A x = {x.T @ A @ x}")  # one counterexample is enough
x^T A x = 0
  • We can check these with eigenvalues

Cholesky Decomposition

  • For symmetric positive definite matrices: \(L = U^T\)
  • Called a Cholesky decomposition: \(A = L L^T\) for a lower triangular matrix \(L\).
  • Equivalently, could find \(A = U^T U\) for an upper triangular matrix \(U\)
A = np.array([[1, 2],
              [2, 5]])
L = cholesky(A, lower=True) # cholesky also defined for upper=True
print(L)
print(f"L*L^T =\n{L @ L.T}")
[[1. 0.]
 [2. 1.]]
L*L^T =
[[1. 2.]
 [2. 5.]]

Solving Positive Definite Systems

A = np.array([[1, 2],
              [2, 5]])
b = np.array([1, 4])
print(solve(A, b, assume_a="pos")) # uses cholesky internally

L = cholesky(A, lower=True)
y = solve_triangular(L, b, lower=True)
x = solve_triangular(L.T, y, lower=False)
print(x)
[-3.  2.]
[-3.  2.]

Cholesky for Covariance Matrices

  • Covariance matrices are positive-definite, semi-definite if degenerate
  • Key property of Gaussian random variables:
    • \(X \sim N(\mu, \Sigma)\) for \(\mu \in \mathbb{R}^N, \Sigma \in \mathbb{R}^{N \times N}\)
    • \(X = \mu + A Z\) for \(Z \sim N(0_N, I_N)\) where \(A A^T = \Sigma\)
  • That is, \(A\) is the Cholesky decomposition of the covariance matrix

Matrices as Linear Transformations

  • Recall: for \(x \in \mathbb{R}^N\) we should think of a \(f(x) = A x\) for \(A\in\mathbb{R}^{M \times N}\) as a linear transformation from \(\mathbb{R}^N\) to \(\mathbb{R}^M\)
    • Definition of Linear: \(f(a x_1 + b x_2) = a f(x_1) + b f(x_2)\) for scalar \(a,b\)
  • Similarly, the \(y = f(x) = A x\) then \(f^{-1}(y) = A^{-1} y\) goes from \(\mathbb{R}^M\) to \(\mathbb{R}^N\)
    • If the matrix is square and invertible, we can go back and forth without losing information (i.e., bijective). Otherwise we may be projected onto a lower-dimensional “manifold”.

Norms and Linear Transformations

  • The vector norm \(||x||_2\) is an important feature in many applications

    • Hence \(||f(x)||_2 = ||A x||_2\) frequently comes up in Quantitative Economics and Datascience
    • e.g. linear regression is written as minimizing a vector norm

    \[ \min_{\beta} ||y - X \beta||_2 \]

  • Matrix structure or decompositions of \(A\) help us better understand the \(f(x)\) mapping

Orthogonal Matrices

Orthogonal Matrices — Definition

  • A square matrix \(Q\) is orthogonal if \(Q^\top Q = Q Q^\top = I\) (equivalently, \(Q^{-1} = Q^\top\)).
    • Columns are orthonormal: if \(Q = \begin{bmatrix} q_1 \; | \; \ldots \; | \; q_N \end{bmatrix}\) then
      • \(q_i \cdot q_j = 0\) for \(i \neq j\)
      • \(q_i \cdot q_i = 1\).
    • Geometric action: combination of rotation and/or reflection
    • Inverse action: if \(y = Q x\), then \(x = Q^\top y\).
  • Example (reflection): \(Q = \begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix}\) flips the second component while preserving \(\|\cdot\|_2\).

Orthogonal Matrices — Preserving Norms

  • Rotation and reflection preserve lengths
  • For all \(x \in \mathbb{R}^N\), \(\|Q x\|_2 = \|x\|_2\).
  • Proof: \(\|Qx\|_2^2 = (Qx) \cdot (Qx) = (Q x)^\top Q x = x^\top Q^\top Q x = x^\top x = \|x\|_2^2\).

Orthogonal Matrices — Preserving Angles

  • For any \(x_1, x_2 \in \mathbb{R}^N\), \((Q x_1) \cdot (Q x_2) = x_1 \cdot x_2\).
  • Proof: \((Q x_1) \cdot (Q x_2) = (Q x_1)^\top (Q x_2) = x_1^\top Q^\top Q x_2 = x_1^\top x_2 = x_1 \cdot x_2\).
  • Orthogonal maps preserve inner products/“angles”
    • e.g., rotate both the \(x_1\) and \(x_2\) exactly the same amount.
    • Useful in many applications such as Principal Component Analysis (PCA) and cosine similarity (which is used in NLP embeddings).

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors

  • For a square \(A\), an eigenvector \(x\) and eigenvalue \(\lambda\) satisfy \[ A x = \lambda x \]

  • \(A\in\mathbb{R}^{N\times N}\) has \(N\) eigenvalue/eigenvector pairs, possible multiplicity of \(\lambda\)

  • Intuition: \(x\) is a direction \(A x \propto x\) and \(\lambda\) says how much it “stretches”

Properties of Eigenvalues and Eigenvectors

  • For any eigenvector \(x\) and scalar \(c\) then \(c x \propto A x\) as well
  • Symmetric matrices have real eigenvalues and orthogonal eigenvectors. i.e. \(x_1 \cdot x_2 = 0\) for \(x_1 \neq x_2\) eigenvectors. Complex in general
  • Singular if and only if it has an eigenvalue of zero
  • Positive (semi)definite if and only if all eigenvalues are strictly (weakly) positive
  • Diagonal matrix has eigenvalues as its diagonal
  • Triangular matrix has eigenvalues as its diagonal

Positive Definite and Eigenvalues

You cannot check \(x^T A x > 0\) for all \(x\). Check if “stretching” is positive

A = np.array([[3, 2],
              [2, 1]])
# A_eigs = np.real(eigvals(A)) # symmetric matrices have real eigenvalues
A_eigs = eigvalsh(A) # specialized for symmetric/hermitian matrices
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")
[-0.23606798  4.23606798]
pos-def? False
pos-semi-def? False

Positive Semi-Definite Matrices May Have a Zero Eigenvalue

The simplest positive-semi-definite (but not posdef) matrix is

A_eigs = eigvalsh(np.array([[1, 0],
                            [0, 0]]))
print(A_eigs)
is_positive_definite = np.all(A_eigs > 0)
is_positive_semi_definite = np.all(A_eigs >= 0) # or eigvals(A) >= -eps
print(f"pos-def? {is_positive_definite}")
print(f"pos-semi-def? {is_positive_semi_definite}")
[0. 1.]
pos-def? False
pos-semi-def? True

Eigen Decomposition (General)

  • For real square matrix \(A\) we may be able to take the eigenvectors as columns of \(Q\) (eigenvectors) and a diagonal matrix \(\Lambda\) of the eigenvalues such that \[ A = Q \, \Lambda \, Q^{-1}. \]
    • \(\Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n)\) collects the eigenvalues (zeros allowed) into a diagonal matrix.
    • The eigenvectors and eigenvalues are complex in general
  • These do not always exist. If so, you will hear the matrix is diagonalizable.
    • Essentially, this is a question of whether the eigenvectors form an \(N\) dimensional basis and hence a \(Q\) that is invertible.
  • Alternatively, every real matrix admits a Singular Value Decomposition (SVD)

Eigen Decomposition (Symmetric Case)

  • A special case of this comes from the spectral theorem, which will sometimes be referred to as a spectral decomposition
  • For any real symmetric matrix \(A\) (singular or not) there exist an orthogonal \(Q\) and a real diagonal \(\Lambda\) such that \[ A = Q \, \Lambda \, Q^{\top}. \]
    • Orthogonal \(Q^{\top} Q = Q Q^{\top} = I\) (so \(Q^{-1}=Q^{\top}\)); the columns of \(Q\) form an orthonormal basis of eigenvectors.
    • \(\Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n)\) collects the real eigenvalues (zeros allowed) into a diagonal matrix. Complex in general for non-symmetric matrices, but real for symmetric \(A\).
  • Geometric interpretation: apply \(Q^{\top}\) (rotation/reflection) to rotate into the basis, scale by \(\Lambda\), then apply \(Q\) to rotate/reflect back.

Eigendecompositions and Matrix Powers

  • Can be used to find \(A^t\) for large \(t\) (e.g. for Markov chains)
    • \(P^t\), i.e. \(P \cdot P \cdot \ldots \cdot P\) for \(t\) times
    • \(P = Q \Lambda Q^{-1}\) then \(P^t = Q \Lambda^t Q^{-1}\) where \(\Lambda^t\) is just the pointwise power
  • Related tools such as SVD can help with dimensionality reduction

Spectral/Eigendecomposition of Symmetric Matrix Example

A = np.array([[2, 1],
              [1, 3]])
Lambda, Q = eig(A)
print(f"eigenvectors are column-by-column in Q =\n{Q}")
print(f"Q^T Q =\n{Q.T @ Q}") # should be I
print(f"Q^-1 == Q^T? {np.allclose(np.linalg.inv(Q), Q.T)}") # should be True
print(f"eigenvalues are in Lambda = {Lambda}")
print(f"Q Lambda Q^T =\n{Q @ np.diag(np.real(Lambda)) @ Q.T}")
eigenvectors are column-by-column in Q =
[[-0.85065081 -0.52573111]
 [ 0.52573111 -0.85065081]]
Q^T Q =
[[1.0000000e+00 1.2127222e-17]
 [1.2127222e-17 1.0000000e+00]]
Q^-1 == Q^T? True
eigenvalues are in Lambda = [1.38196601+0.j 3.61803399+0.j]
Q Lambda Q^T =
[[2. 1.]
 [1. 3.]]

Spectral/Eigendecomposition of an Asymmetric Matrix Example

A = np.array([[5, 1],
              [2, 3]])
Lambda, Q = eig(A)
print(f"eigenvectors are column-by-column in Q =\n{Q}")
print(f"Q^-1 == Q^T? {np.allclose(np.linalg.inv(Q), Q.T)}") # should be False
print(f"eigenvalues are in Lambda = {Lambda}")
print(f"Q Lambda inv(Q) =\n{Q @ np.diag(np.real(Lambda)) @ np.linalg.inv(Q)}")
eigenvectors are column-by-column in Q =
[[ 0.80689822 -0.34372377]
 [ 0.59069049  0.9390708 ]]
Q^-1 == Q^T? False
eigenvalues are in Lambda = [5.73205081+0.j 2.26794919+0.j]
Q Lambda inv(Q) =
[[5. 1.]
 [2. 3.]]

Spectral Radius is Maximum Absolute Eigenvalue

  • If any \(\lambda \in \Lambda\) are \(> 1\) can see this would explode
  • Useful for seeing if iteration \(x_{t+1} = A x_t\) from a \(x_0\) explodes
  • The spectral radius of matrix \(A\) is

\[ \rho(A) = \max_{\lambda \in \Lambda}|\lambda| \]

Recall Condition Numbers

  • The condition number of the matrix \(A\) is the ratio of the largest and smallest eigenvalues
    • \(\kappa(A) = \left|\frac{\lambda_{\text{max}}}{\lambda_{\text{min}}}\right|\) for \(\lambda\) the eigenvalues of \(A\).
    • Sense of relative dispersion and consistent scaling
  • For example, using the fact that the triangular matrix has eigenvalues on the diagonal \[ A = \begin{bmatrix} 1 & 2 \\ 0 & 0.001 \end{bmatrix} \]
    • \(\kappa(A) = 1/0.001 = 1000\) and \(\rho(A) = 1\).
  • Or, the best we can do is \(\kappa(A) = 1\) for the identity (or triangular with 1’s on diagonal) \[ A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

(Optional) Matrix Conditioning and Stability

Matrix Conditioning

  • Poorly conditioned matrices can lead to inaccurate or wrong solutions
  • Can happen when problem is at radically different scales - which can sometimes be fixed by demeaning, rescaling, etc.
  • Also tends to happen when matrices are close to singular
eps = 1e-7
A = np.array([[1, 1],
              [1 + eps, 1]])
print(f"A =\n{A}")
print(f"A^{-1} =\n{inv(A)}")
A =
[[1.        1.       ]
 [1.0000001 1.       ]]
A^-1 =
[[-9999999.99336215  9999999.99336215]
 [10000000.99336215 -9999999.99336215]]

Reminder: Condition Numbers of Matrices

  • \(\text{cond}(A) = \left|\frac{\lambda_{max}}{\lambda_{min}}\right|\)
  • Scale free measure of numerical issues for a variety of matrix operations
  • Intuition: if \(\text{cond}(A) = K\), then \(b \to b + \nabla b\) change in \(b\) amplifies to a \(x \to x + K \nabla b\) error when solving \(A x = b\).
  • See Matlab Docs on inv for example, where inv is a bad idea due to poor conditioning
print(f"condition(I) = {cond(np.eye(2))}")
print(f"condition(A) = {cond(A)}, condition(A^(-1)) = {cond(inv(A))}")
condition(I) = 1.0
condition(A) = 40000001.939191714, condition(A^(-1)) = 40000002.00307444

Example with Interpolation

  • Consider fitting data \(x\in\mathbb{R}^{N+1}\) and \(y\in\mathbb{R}^{N+1}\) with an \(N\)-degree polynomial
  • That is, find \(c \in \mathbb{R}^{N+1}\) such that

\[ \begin{aligned} c_0 + c_1 x_1 + c_2 x_1^2 + \ldots + c_N x_1^N &= y_1\\ \vdots & \\ c_0 + c_1 x_N + c_2 x_N^2 + \ldots + c_N x_N^N &= y_N\\ \end{aligned} \]

  • Which we can then use as \(P(x) = \sum_{n=0}^N c_n x^n\) to interpolate between the points

Writing as a Linear System

  • Define a matrix of all of the powers of the \(x\) values

\[ A \equiv \begin{bmatrix} 1 & x_0 & x_0^2 & \ldots & x_0^N\\ \vdots & \vdots &\vdots & \vdots & \vdots\\ 1 & x_N & x_N^2 & \ldots & x_N^N \end{bmatrix} \]

  • Then solve for \(c\) as the solution (where \(A\) is invertible if \(x_n\) are unique)

\[ A c = y \]

Solving an Example

  • Solve and look at numerical error of interpolation using \(||x||_{\infty} = \max_n |x_n|\) (i.e., inf-norm)
  • We already see a difference in error between using solve and inv
N = 5
x = np.linspace(0.0, 10.0, N + 1)
y = np.exp(x)  # example function to interpolate
A = np.array([[x_i**n for n in range(N + 1)] for x_i in x])  # or np.vander
c = solve(A, y)
c_inv = inv(A) @ y
print(f"error = {norm(A @ c - y, np.inf)}, \
error using inv(A) = {norm(A @ c_inv - y, np.inf)}")
print(f"cond(A) = {cond(A)}")
error = 2.2737367544323206e-11, error using inv(A) = 1.367880031466484e-09
cond(A) = 564652.3214053963

Things Getting Poorly Conditioned Quickly

N = 10
x = np.linspace(0.0, 10.0, N + 1)
y = np.exp(x)  # example function to interpolate
A = np.array([[x_i**n for n in range(N + 1)] for x_i in x])  # or np.vander
c = solve(A, y)
c_inv = inv(A) @ y # Solving with inv(A) instead of solve(A, y)
print(f"error = {norm(A @ c - y, np.inf)}, \
error using inv(A) = {norm(A @ c_inv - y, np.inf)}")
print(f"cond(A) = {cond(A)}")
error = 5.802576197311282e-10, error using inv(A) = 4.1591702029109e-06
cond(A) = 4462833495403.007

Matrix Inverses Fail Completely for \(N = 20\)

  • You do not need massive matrices to get into trouble
N = 20
x = np.linspace(0.0, 10.0, N + 1)
y = np.exp(x)  # example function to interpolate
A = np.array([[x_i**n for n in range(N + 1)] for x_i in x])  # or np.vander
c = solve(A, y)
c_inv = inv(A) @ y # Solving with inv(A) instead of solve(A, y)
print(f"error = {norm(A @ c - y, np.inf)}, \
error using inv(A) = {norm(A @ c_inv - y, np.inf)}")
print(f"cond(A) = {cond(A):.4g}")
error = 1.1368683772161603e-10, error using inv(A) = 2511.4640634151983
cond(A) = 2.381e+24

Moral of this Story

  • Use solve, which is faster and can often solve ill-conditioned problems. Rarely use inv, and only when you know the problem is well-conditioned
  • Check conditioning of matrices when doing numerical work as an occasional diagnostic, as it is a good indicator of potential problems and collinearity
  • For approximation, never use a monomial basis for polynomials
    • Prefer polynomials like Chebyshev, which are as orthogonal as possible
N = 40
x = np.linspace(-1, 1, N+1)  # Or any other range of x values
A = np.array([[np.polynomial.Chebyshev.basis(n)(x_i) for n in range(N+1)] for x_i in x])
A_monomial = np.array([[x_i**n for n in range(N + 1)] for x_i in x])  # or np.vander
print(f"cond(A) = {cond(A):.4g}, cond(A_monomial) = {cond(A_monomial):.4g}")
cond(A) = 3.64e+09, cond(A_monomial) = 2.149e+18