Stochastic Dynamics, AR(1) Processes, and Ergodicity

Undergraduate Computational Macro

Jesse Perla

University of British Columbia

Overview

Motivation and Materials

  • In this lecture, we will introduce our stochastic processes and review probability
  • Our first example of a stochastic process is the AR(1) process (i.e. auto-regressive of order one)
    • This is a simple, univariate process, but it is directly useful in many cases
  • We will also introduce the concept of ergodicity to help us understand long-run behavior
  • While this section is not directly introducing new economic models, it provides the backbone for our analysis of the wealth and income distribution

Deterministic Processes

  • We have seen deterministic processes in previous lectures, e.g. the linear

    \[ X_{t+1} = a X_t + b \]

    • These are coupled with an initial condition \(X_0\), which enables us to see the evolution of a variable
    • The state variable, \(X_t\), could be a vector
    • The evolution could be non-linear \(X_{t+1} = h(X_t)\), etc.
  • But many states in the real world involve randomness

Materials

using LaTeXStrings, LinearAlgebra, Plots, Statistics
using Random, StatsPlots, Distributions, NLsolve
using Plots.PlotMeasures
default(;legendfontsize=16, linewidth=2, tickfontsize=12,
         bottom_margin=15mm)

Random Variables Review

Random Variables

  • Random variables are a collection of values with associated probabilities
  • For example, a random variable \(Y\) could be the outcome of a coin flip
    • Let \(Y=1\) if heads and \(Y=0\) if tails
    • Assign probabilities \(\mathbb{P}(Y=1) = \mathbb{P}(Y=0) = 0.5\)
  • or a normal random variable with mean \(\mu\) and variance \(\sigma^2\), denoted \(Y \sim \mathcal{N}(\mu, \sigma^2)\) has density \(p(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( - \frac{(y-\mu)^2}{2 \sigma^2} \right)\)

Discrete vs. Continuous Variables

  • If discrete (e.g., \(X \in \{x_1, \ldots, x_N\}\)) , then
    • The probability mass function (pmf) is the probability of each value \(p \in \mathbb{R}^N\)
    • Such that \(\sum_{i=1}^N p_i = 1\), and \(p_i \geq 0\)
    • i.e. \(p_i = \mathbb{P}(X=x_i)\)
  • If continuous, then the probability density function (pdf) is the probability of each value and can be represented by a function
    • \(p : \mathbb{R} \rightarrow \mathbb{R}\) if \(X\) is defined on \(\mathbb{R}\)
    • \(\int_{-\infty}^\infty p(x) dx = 1\), and \(p(x) \geq 0\)
    • \(\mathbb{P}(X = a) = 0\) in our examples, and \(\mathbb{P}(X \in [a,b]) = \int_a^b p(x) dx\)

Normal Random Variables

mu = 0.1
sigma = 0.5
d = Normal(0.1, sigma) # SD not variance
x = range(mu - 2 * sigma,
          mu + 2 * sigma;
          length=100)
plot(x, pdf.(d, x); label="Normal PDF",
     xlabel="x", ylabel=L"p(x)",
     size=(600,400))

Comparing to a Histogram

n = 1000
x_draws = rand(d, n) # gets n samples
histogram(x_draws; label="Histogram",
          xlabel="x", ylabel=L"\hat{p}(x)",
          normalize=true, size=(600,400))
plot!(x, pdf.(d, x); label="Normal PDF",
      lw=3)          

Normal Random Variables

  • Normal random variables are special for many reasons (e.g., central limit theorems)
  • They are the only continuous random variable with finite variance that is closed under linear combinations
    • For independent \(X \sim \mathcal{N}(\mu_X, \sigma_X^2)\) and \(Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2)\)
    • \(a X + b Y \sim \mathcal{N}(a \mu_X + b \mu_Y, a^2 \sigma_X^2 + b^2 \sigma_Y^2)\)
    • Also true with multivariate normal distributions
  • Common transformation taking out mean and variance
    • Could draw \(Y \sim N(\mu, \sigma^2)\)
    • Or could draw \(X \sim N(0,1)\) and then \(Y = \mu + \sigma X\)

Expectations

  • For discrete-valued random variables

    \[ \mathbb{E}[f(X)] = \sum_{i=1}^N f(x_i) p_i \]

  • For continuous valued random variables \[ \mathbb{E}[f(X)] = \int_{-\infty}^\infty f(x) p(x) dx \]

Moments

  • The mean of a random variable is the first moment, \(\mathbb{E}[X]\)
  • The variance of a random variable is the second moment, \(\mathbb{E}[(X-\mathbb{E}[X])^2]\)
    • Note the recentering by the mean. Could also calculate as \(\mathbb{E}[X^2] - \mathbb{E}[X]^2\)
  • Normal random variables are characterized by their first 2 moments

Law(s) of Large Numbers

  • Let \(X_1, X_2, \ldots\) be independent and identically distributed (iid) random variables with mean \(\mu \equiv \mathbb{E}(X) < \infty\), then let \[ \bar{X}_n \equiv \frac{1}{n} \sum_{i=1}^n X_i \]

  • One version is Kolmogorov’s Strong Law of Large Numbers \[ \mathbb{P} \left( \lim_{n \rightarrow \infty} \bar{X}_n = \mu \right) = 1 \]

    • i.e. the average of the random variables converges to the mean

Sampling and Plotting the Mean

function ksl(distribution, n = 100)
    title = nameof(typeof(distribution))
    observations = rand(distribution, n)
    sample_means = cumsum(observations) ./ (1:n)
    mu = mean(distribution)
    plot(repeat((1:n)', 2), [zeros(1, n); observations']; title,  xlabel="n",
         label = "", color = :grey, alpha = 0.5)
    plot!(1:n, observations; color = :grey, markershape = :circle,
          alpha = 0.5, label = "", linewidth = 0)
    if !isnan(mu)
        hline!([mu], color = :black, linewidth = 1.5, linestyle = :dash,
               grid = false, label = L"\mathbb{E}[X]")
    end
    return plot!(1:n, sample_means, linewidth = 3, alpha = 0.6, color = :green, label = L"\bar{X}_n")
end
ksl (generic function with 2 methods)

LLN with the Normal Distribution

dist = Normal(0.0, 1.0) # unit normal
ksl(dist)

LLN with the Exponential

  • \(f(x) = \frac{1}{\alpha} \exp(-x/\alpha)\) for \(x \geq 0\) with mean \(\alpha\)
dist = Exponential(0.2)
ksl(dist)

LLN with the Cauchy?

  • \(f(x) = 1/(\pi (1+x^2))\), with median \(= 0\) and \(\mathbb{E}(X)\) undefined
Random.seed!(0); # reproducible results
dist = Cauchy() # Doesn't have an expectation!
sample_mean = cumsum(rand(dist, n)) ./ (1:n)
plot(1:n, sample_mean, color = :red, alpha = 0.6, label =  L"\bar{X}_n",
      xlabel = "n", linewidth = 3)
hline!([0], color = :black, linestyle = :dash, label = "", grid = false)

Monte-Carlo Calculation of Expectations

  • One application of this is the numerical calculation of expectations

  • Let \(X\) be a random variable with density \(p(x)\), and hence \(\mathbb{E}[f(X)] = \int_{-\infty}^\infty f(x) p(x) dx\) (or \(\sum_{i=1}^N f(x_i) p_i\) if discrete)

  • These integrals are often difficult to calculate analytically, but if we can draw \(X \sim p\), then we can approximate the expectation by

    \[ \mathbb{E}[f(X)] \approx \frac{1}{n} \sum_{i=1}^n f(x_i) \]

  • Then by the LLN this converges to the true expectation as \(n \rightarrow \infty\)

Discrete Example

  • Let \(X\) be a discrete random variable with \(N\) states and probabilities \(p_i\)
  • Then \(\mathbb{E}[f(X)] = \sum_{i=1}^N f(x_i) p_i\)
  • For example, the Binomial distribution and \(f(x) = \log(x+1)\)
# number of trials and probability of success
dist = Binomial(10, 0.5)
plot(dist;label="Binomial PMF",
     size=(600,400))
vals = support(dist) # i.e. 0:10
p = pdf.(dist, vals)
# Calulate the expectation manually
@show mean(dist), dot(vals, p);
(mean(dist), dot(vals, p)) = (5.0, 5.000000000000008)

Using Monte-Carlo

N = 500
# expectation with PMF, then MC
f_expec = dot(log.(vals .+ 1), p)
x_draws = rand(dist, N)
f_x_draws = log.(x_draws .+ 1)
f_expec_mc = sum(f_x_draws) / N
@show f_expec, f_expec_mc
# Just calculate sums then divide by N
f_means = cumsum(f_x_draws)./(1:N)
plot(1:length(f_means), f_means;
     label=L"\frac{1}{n}\sum_{i=1}^n f(x_i)",
     xlabel="n", ylabel=L"\bar{f}(X)",
     size=(600,400))
hline!([f_expec];linestyle = :dash,
       label = L"\mathbb{E}[f(X)]")
(f_expec, f_expec_mc) = (1.7526393207741702, 1.7552834928857293)

Stochastic Processes

Stochastic Processes

  • A stochastic process is a sequence of random variables
    • We will focus on discrete time stochastic processes, where the sequence is indexed by \(t=0,1,2,\ldots\)
    • Could be discrete or continuous random variables
  • Skipping through some formality, assume that they share the same values but probabilities may change
  • Denote then as a sequence \(\{X_t\}_{t=0}^\infty\)

Joint, Marginal, and Conditional Distributions

  • Can ask questions on the probability distributions of the process
  • The joint distribution of \(\{X_t\}_{t=0}^{\infty}\) or a subset
    • In many cases things will be correlated over time or else no need to be a process
  • The marginal distribution of \(X_t\) for any \(t\)
    • This is a proper PDF, marginalized from the joint distribution of all values
  • Conditional distributions, fixing some values
    • e.g. \(X_{t+1}\) given \(X_t, X_{t-1}\), etc. are known

Markov Process

  • Before we go further, lets discuss a broader class of these processes useful in economics
  • A Markov process is a stochastic process where the conditional distribution of \(X_{t+1}\) given \(X_t, X_{t-1}, \ldots\) is the same as the conditional distribution of \(X_{t+1}\) given \(X_t\)
    • i.e. the future is independent of the past given the present
  • Note that with the AR(1) model, if I know \(X_t\) then I can calculate the PDF of \(X_{t+1}\) directly without knowing the past
  • This is “first-order” since only one lag is required, but could be higher order
    • A finite number of lags can always be added to the state vector to make it first-order

AR(1) Processes

A Simple Auto-Regressive Process with One Lag

\[ X_{t+1} = a X_t + b + c W_{t+1} \]

  • Just added randomness to the deterministic process from time \(t\) to \(t+1\)
  • \(W_{t+1} \sim \mathcal{N}(0, 1)\) is IID “shocks” or “noise”
  • Could have an initial condition for \(X_0\) Or could have an initial distribution
    • \(X_t\) is a random variable, and so can \(X_0\)
    • “Degenerate random variable” if \(P(X_0 = x) = 1\) for some \(x\)
    • Assume \(X_0 \sim \mathcal{N}(\mu_0, v_0)\), and \(v_0 \to 0\) is the degenerate case

Evolution of the AR(1) Process

  • Both \(W_{t+1}\) and \(X_0\) are assumed to be normally distributed
  • As we discussed, linear combinations of normal random variables are normal
    • So \(X_t\) is normal for all \(t\) by induction
  • Furthermore, we have a formula for the recursion
    • If \(X_t \sim \mathcal{N}(\mu_t, v_t)\), then \(X_{t+1} \sim \mathcal{N}(a \mu_t + b, a^2 v_t + c^2)\)
    • Hence, the evolution of the mean and variance follow a simple difference equation \(\mu_{t+1} = a \mu_t + b\) and \(v_{t+1} = a^2 v_t + c^2\)
    • Let \(X_t \sim \psi_t \equiv \mathcal{N}(\mu_t, v_t)\)

Visualizing the AR(1) Process

a = 0.9
b = 0.1
c = 0.5

# initial conditions mu_0, v_0
mu = -3.0
v = 0.6
0.6

Visualizing the AR(1) Process

sim_length = 5
x_grid = range(-5, 7, length = 120)

plt = plot(;size = (600, 400))
for t in 1:sim_length
    mu = a * mu + b
    v = a^2 * v + c^2
    dist = Normal(mu, sqrt(v))
    plot!(plt, x_grid, pdf.(dist, x_grid),
    label = L"\psi_{%$t}", linealpha = 0.7)
end
plt

From a Degenerate Initial condition

  • Cannot plot \(\psi_0\) since it is a point mass at \(\mu_0\)
mu = -3.0
v = 0.0
plt = plot(;size = (600, 400))
for t in 1:sim_length
    mu = a * mu + b
    v = a^2 * v + c^2
    dist = Normal(mu, sqrt(v))
    plot!(plt, x_grid, pdf.(dist, x_grid),
    label = L"\psi_{%$t}", linealpha = 0.7)
end
plt

Practice with Iteration

  • Let us practice creating a map and iterating it
  • We will need to modify our iterate_map function to work with vectors
  • Let \(x \equiv \begin{bmatrix}\mu & v\end{bmatrix}^{\top}\),
function iterate_map(f, x0, T)
    x = zeros(length(x0), T + 1)
    x[:, 1] = x0
    for t in 2:(T + 1)
        x[:, t] = f(x[:, t - 1])
    end
    return x
end
iterate_map (generic function with 1 method)

Implementation of the Recurrence for the AR(1)

function f(x;a, b, c)
  mu = x[1]
  v = x[2]
  return [a * mu + b, a^2 * v + c^2]
end
x_0 = [-3.0, 0.6]
T = 5
x = iterate_map(x -> f(x; a, b, c), x_0, T)
2×6 Matrix{Float64}:
 -3.0  -2.6    -2.24     -1.916    -1.6244   -1.36196
  0.6   0.736   0.84616   0.93539   1.00767   1.06621

Using Matrices

\[ x_{t+1} = \underbrace{\begin{bmatrix} a & 0 \\ 0 & a^2 \end{bmatrix}}_{\equiv A} x_t + \underbrace{\begin{bmatrix} b \\ c^2 \end{bmatrix}}_{\equiv B} \]

A = [a 0; 0 a^2]
B = [b; c^2]
x = iterate_map(x -> A * x + B, x_0, T)
2×6 Matrix{Float64}:
 -3.0  -2.6    -2.24     -1.916    -1.6244   -1.36196
  0.6   0.736   0.84616   0.93539   1.00767   1.06621

Fixed Point?

  • Whenever you have maps, you can ask whether a fixed point exists
  • This is especially easy to check here. Solve,
    • \(\mu = a \mu + b \implies \mu = \frac{b}{1-a}\)
    • \(v = a^2 v + c^2 \implies v = \frac{c^2}{1-a^2}\)
  • Lets check for a fixed point numerically
sol = fixedpoint(x -> A * x + B, x_0)
@show sol.zero
@show b/(1-a), c^2/(1-a^2);
sol.zero = [1.0000000000000266, 1.3157894736842035]
(b / (1 - a), c ^ 2 / (1 - a ^ 2)) = (1.0000000000000002, 1.3157894736842108)

Existence of a Fixed Point

  • The important of \(a\) is also clear when we look at the \(A\) matrix
  • We know the eigenvalues of a diagonal matrix are the diagonal elements
    • i.e., \(\lambda_1 = a\) and \(\lambda_2 = a^2\)
  • If \(|a| < 1\), then \(a^2 < |a| < 1\) and hence the maxim absolute value of the eigenvalues below 1
  • As we saw in the univariate case, conditions of this sort were crucial to determine whether the systems would converge
  • We will see more complicated versions of the \(A\) matrix as we move into richer “state space models”

Evolution of the Probability Distributions

x_0 = [-1.0, 0.1] # tight
T = 10
f(x) =  A * x + B
x = iterate_map(f, x_0, T)
x_star = fixedpoint(f, x_0).zero
plt = plot(Normal(x_star[1], sqrt(x_star[2]));
           label = L"\psi^*",
           style = :dash,
           size = (600, 400))
for t in 1:T
    dist = Normal(x[1, t], sqrt(x[2, t]))
    plot!(plt, dist, label = L"\psi_{%$(t-1)}",
         linealpha = 0.7)
end
plt

Stationary Distributions

Fixed Points and Steady States

  • Recall in the lecture on deterministic dynamics that we discussed fixed point and steady states \(x_{t+1} = f(x_t)\) has a fixed point \(x^*\) if \(x^* = f(x^*)\)
    • e.g. \(x_{t+1} = a x_t + b\) has \(x^* = \frac{b}{1-a}\) if \(|a| < 1\)
  • We can also interpret as a steady state \(x^*\) as \(\lim_{t \rightarrow \infty} x_t = x^*\) for some \(x_0\)
    • Stability looked at stability which told us about which \(x^*\) the process would approach from points \(x_0\) near \(x^*\)
  • The key: for \(x^*\) if we apply \(f(x^*)\) evolution equation and remain at that point

Stationary Distributions

  • Analogously, with stochastic processes we can think about applying the evolution equation to random variables
    • Instead of a point, we have a distribution \(\psi^*\)
    • Then rather than checking \(x^* = f(x^*)\), we check \(\psi^* \sim f(\psi^*)\), where that notation is loosely taking into account the distribution of shocks
  • Similar to stability, we can consider if repeatedly applying \(f(\cdot)\) repeatedly to various \(\psi_0\) converges to \(\psi^*\)

AR(1) Example

  • Take \(X_{t+1} = a X_t + b + c W_{t+1}\) if \(|a| < 1\) for \(W_{t+1} \sim \mathcal{N}(0, 1)\)

  • Recall If \(X_t \sim \mathcal{N}(\mu_t, v_t) \equiv \psi_t\), then using properties of Normals

    • \(X_{t+1} \sim \mathcal{N}(a \mu_t + b, a^2 v_t + c^2)\equiv \psi_{t+1}\)
    • We derived the fixed point of the mean and variance iteration as \(\psi^* \sim \mathcal{N}\left(\mu^*, v^*\right) = \mathcal{N}\left(\frac{b}{1-a}, \frac{c^2}{1-a^2}\right)\)
  • Apply the evolution equation to \(\psi^*\) we demonstrate that \(\psi^* \sim f(\psi^*)\)

    \[ \mathcal{N}\left(a \frac{b}{1-a} + b, a^2 \frac{c^2}{1-a^2} + c^2\right) = \mathcal{N}\left(\frac{b}{1-a}, \frac{c^2}{1-a^2}\right) \]

    • i.e., from any initial condition, the distribution of \(X_t\) converges to \(\psi^*\)

What if \(a > 1\)?

a,b,c = 1.1, 0.2, 0.25
A = [a 0; 0 a^2]
B = [b; c^2]
f(x) =  A * x + B
T = 15
x = iterate_map(f, [0.0, 0.1], T)
plt = plot(Normal(x[1, end], sqrt(x[2, end]));
            label = L"\psi_{%$T}",
           size = (600, 400))
for t in 1:5
    dist = Normal(x[1, t], sqrt(x[2, t]))
    plot!(plt, dist, label=L"\psi_{%$(t-1)}",
         linealpha = 0.7)
end
plt

Analyzing the Failure of Convergence

  • If it exists, the stationary distribution would need to be \(\psi^* \equiv \mathcal{N}\left(\frac{b}{1-a}, \frac{c^2}{1-a^2}\right)\)
  • Note that if \(b > 0\) we get the drift of the process forward
    • But, just as in the case of the deterministic process, this just acts as a force to move the distribution, not spread it out
  • In fact, with \(b=0\) the mean of \(\psi_t\) is always 0, but the variance grows without bound if \(c > 0\)
  • Lets plot the \(a=1,b=0\) case

What if \(a = 1, b=0\)?

a,b,c = 1.0, 0.0, 0.25
A = [a 0; 0 a^2]
B = [b; c^2]
f(x) =  A * x + B
T = 15
x = iterate_map(f, [0.0, 0.1], T)
plt = plot(Normal(x[1, end], sqrt(x[2, end]));
            label = L"\psi_{%$T}",
           size = (600, 400))
for t in 1:5
    dist = Normal(x[1, t], sqrt(x[2, t]))
    plot!(plt, dist, label=L"\psi_{%$(t-1)}",
         linealpha = 0.7)
end
plt

Ergodicity

  • There are many different variations and definitions of ergodicity
  • Among other things, this rules out are cases where the process is “trapped” in a subset of the state space and can’t swith out
  • Also ensures that the distribution doesn’t spread or drift asymptotically
  • Ergodicity lets us apply LLNs to the stochastic process, even though they are not independent

Ergodicity

  • We will consider a process \(\{X_t\}_{t=0}^\infty\) with a stationary distribution \(\psi^*\)
  • The process is ergodic if for any \(f : \mathbb{R} \rightarrow \mathbb{R}\) (with regularity conditions) \[ \lim_{T\to\infty}\frac{1}{T} \sum_{t=1}^T f(X_t) = \int f(x)\psi^*(x) dx \]
    • i.e. the time average of the function converges to the expectation of the function. Mean ergodic if only require this to work for \(f(x) = x\)

Iteration with IID Noise

  • Adapt scalar iteration for iid noise
function iterate_map_iid(f, dist, x0, T)
    x = zeros(T + 1)
    x[1] = x0
    for t in 2:(T + 1)
        x[t] = f(x[t - 1], rand(dist))
    end
    return x
end
a,b,c = 0.9, 0.1, 0.05
x_0 = 0.5
T = 5
h(x, W) = a * x + b + c * W # iterate given random shock
x = iterate_map_iid(h, Normal(), x_0, T)
6-element Vector{Float64}:
 0.5
 0.5252717486805177
 0.5306225876900339
 0.46819901566492783
 0.532032538532688
 0.583020976850554

Demonstration of Ergodicity with Mean

T = 2000
x_0 = 0.5
x = iterate_map_iid(h, Normal(), x_0, T)
x_means = cumsum(x)./(1:(T+1))
plot(0:T, x_means;
  label=L"\frac{1}{t}\sum_{s=0}^{t-1} X_s",
  xlabel = "t", size = (600, 400))
hline!([b/(1-a)], color = :black,
  linestyle = :dash,
  label = L"\mathbb{E}[X]")

Starting at the Stationary Distribution

  • A reasonable place to start many simulations is a draw from the stationary distribution
Random.seed!(20)
x_0 = rand(Normal(b/(1-a), sqrt(c^2/(1-a^2))))
x = iterate_map_iid(h, Normal(), x_0, T)
x_means = cumsum(x)./(1:(T+1))
plot(0:T, x_means;
  label=L"\frac{1}{t}\sum_{s=0}^{t-1} X_s",
  xlabel = "t", size = (600, 400))
hline!([b/(1-a)], color = :black,
  linestyle = :dash,
  label = L"\mathbb{E}[X]")

The Speed of Convergence

  • The speed with which the process converges towards its stationary distribution is important
  • Key things which govern this transition will be
    • Autocorrelation: As \(a\) goes closer to \(0\), the faster it converges back towards the mean - as with deterministic processes
    • Variances: Wth large \(c\) the noise may dominate and the \(\psi^*\) becomes broader

Close to a Random Walk

Random.seed!(20)
a,b,c = 0.99, 0.01, 0.05
h(x, W) = a * x + b + c * W
T = 2000
x_0 = 0.5
x = iterate_map_iid(h, Normal(), x_0, T)
x_means = cumsum(x)./(1:(T+1))
plot(0:T, x_means;
  label=L"\frac{1}{t}\sum_{s=0}^{t-1} X_s",
  xlabel = "t", size = (600, 400))
hline!([b/(1-a)], color = :black,
  linestyle = :dash,
  label = L"\mathbb{E}[X]")

Dependence on Initial Condition

  • Intuition: ergodicity is that the initial conditions “wear off” over time
  • However, even if a process is ergodic and has a well-defined stationary distribution, it may take a long time to converge to it
  • This is very important in many quantitative models:
    • How much does your initial wealth matter for your long-run?
    • If your wages start low due to discrimination, migration, or just bad luck, how long does it converge?
    • If we provide subsidies to new firms, how long would it take for that to affect the distribution of firms?

Example of a Non-Ergodic Stochastic Process

  • Between \(t=0\) and \(t=1\) a coin is flipped (e.g., result of key exam)
    • If heads: income follows \(X_{t+1} = a X_t + b + c W_{t+1}\) with \(b=0.1\) for \(t \geq 1\)
    • If tails: income follows \(X_{t+1} = a X_t + b + c W_{t+1}\) with \(b=1.0\) for \(t \geq 1\)
  • The initial condition and early sequence cannot be forgotten
  • If there is ANY probability of switching between careers, then it is ergodic because it “mixes”

Moving Average Representation, MA(\(\infty\)), for AR(1)

  • From \(X_t = a X_{t-1} + b + c W_t\), iterate backwards to \(X_0\) and \(W_1\) \[ \begin{aligned} X_t &= a \left(a X_{t-2} + b + c W_{t-1}\right) + b + c W_t \\ &= a^2 X_{t-2} + b (1 + a) + c (W_t + a W_{t-1})\\ &= a^2 \left(a X_{t-3} + b + c W_{t-2}\right)+ b (1 + a) + c (W_t + a W_{t-1})\\ &= a^t X_0 + b \sum_{j=0}^{t-1} a^j + c \sum_{j=0}^{t-1} a^j W_{t-j}\\ &= a^t X_0 + b \frac{1-a^t}{1-a} + c \sum_{j=0}^{t-1} a^j W_{t-j} \end{aligned} \]

Interpreting the Auto-Regressive Parameter

  • The distribution of \(X_t\) then depends on the distribution of \(X_0\) and the distribution of the sum of \(t-1\) iid random variables

  • If \(X_0\) and \(W_t\) are normal, then \(X_t\) is normal since it is a linear combination \[ X_t = a^t X_0 + b \frac{1-a^t}{1-a} + c \sum_{j=0}^{t-1} a^j W_{t-j} \]

    • If \(a = 1\) then the initial condition is never “forgotten”
    • If \(a = 1\), \(W_{t-j}\) shocks are just as important determining the distribution of \(X_t\) because the \(a^j\) doesn’t “decay” over time

Simulation of Moving Average Representation

X_0 = 0.5 # degenerate prior
a, b, c = 0.9, 0.1, 0.05
A = [a 0; 0 a^2]
B = [b; c^2]
T = 20
num_samples = 10000
Xs = iterate_map(x -> A * x + B, [X_0, 0], T)
X_T = Normal(Xs[1, end], sqrt(Xs[2, end]))
W = randn(num_samples, T)
# Comprehensions and generators example, looks like math
X_T_samples = [a^T * X_0 + b * (1-a^T)/(1-a) + c * sum(a^j * W[i, T-j] for j in 0:T-1)
               for i in 1:num_samples]
histogram(X_T_samples; xlabel="x", normalize=true,
          label=L"Samples of $X_{%$T}$ using MA($\infty$)")
plot!(X_T; label=L"Analytic $X_{%$T}$", lw=3)      

Simulation of Moving Average Representation

Nonlinear Stochastic Processes

Nonlinearity with Additive Shocks

  • A useful class involves nonlinear functions for the drift and variance

    \[ X_{t+1} = \mu(X_t) + \sigma(X_t) W_{t+1} \]

    • IID \(W_{t+1}\) with \(\mathbb{E}[W_{t+1}] = 0\) and frequently \(\mathbb{E}[W_{t+1}^2] = 1\)
  • Nests our AR(1) process

    • \(\mu(x) = a x + b\) and \(\sigma(x) = c\)

Auto-Regressive Conditional Heteroskedasticity (ARCH)

  • For example, we may find that time-series data has time-varying volatility and depends on 1 lags \[ X_{t+1} = a X_t + \sigma_t W_{t+1} \]

    • And that the variance increases as we move away from the mean of the stationary distribution \(\sigma_t^2 = \beta + \gamma X_t^2\)
  • Hence the process becomes an ARCH(1)

    \[ X_{t+1} = a X_t + \left(\beta + \gamma X_t^2\right)^{1/2} W_{t+1} \]

Simulation of ARCH(1)

a = 0.7
beta, gamma = 5, 0.5
X_0 = 1.0
T = 200
h(x, W) = a * x + sqrt(beta + gamma * x^2) * W
x = iterate_map_iid(h, Normal(), X_0, T)
plot(x; label = L"X_t", size = (600, 400))

AR(1) with a Barrier

  • Nonlinearity in economics often comes in various forms of barriers, e.g. borrowing constraints

  • Consider our AR(1) except that the process can never go below \(0\)

    \[ X_{t+1} = \max\{a X_t + b + c W_{t+1}, 0.0\} \]

  • We could stop the process at this point, but instead we will continue to iterate

Simulation of AR(1) with a Barrier

a,b,c = 0.95, 0.00, 0.05
X_min = 0.0
h(x, W) = max(a * x + b + c * W, X_min)
T = 1000
x_0 = 0.5
x = iterate_map_iid(h, Normal(), x_0, T)
plot(x; label = L"X_t", size = (600, 400))

Histogram of the AR(1) with a Barrier

  • There isn’t a true density of \(\psi^*\) due to the point mass at \(0\)
T = 20000
x = iterate_map_iid(h, Normal(), x_0, T)
histogram(x; label = L"X_t", normalize = true,
          xlabel = "x", size = (600, 400))

Stochastic Growth Model

Simple Growth Model with Stochastic Productivity

  • Turning off population growth, for \(f(k) = k^\alpha\), and \(s, \delta\) constants \[ k_{t+1} = (1-\delta) k_t + s Z_t f(k_t),\quad \text{given } k_0 \]

  • Let log productivity, \(z_t\equiv \log Z_t\), follow an AR(1) process (why logs?)

    \[ \log Z_{t+1} = a \log Z_t + b + c W_{t+1} \]

Stationary Distribution of Productivity

  • Recall that the stationary distribution of \(\log Z_t\) is \(\mathcal{N}\left(\frac{b}{1-a}, \frac{c^2}{1-a^2}\right)\)
  • Given the stationary distribution of \(Z_t\) is lognormal, we can check ergodicity
a, b, c = 0.9, 0.1, 0.05
Z_0 = 1.0
T = 20000
h(z, W) = a * z + b + c * W
z = iterate_map_iid(h, Normal(), log(Z_0), T)
Z = exp.(z)
histogram(Z; label = L"Z_t", normalize = true,
          xlabel = "Z", size = (600, 400))
plot!(LogNormal(b/(1-a), sqrt(c^2/(1-a^2))),
      lw = 3, label = L"\psi^*")

Quantiles

  • Reminder: A quantile \(q\) is the \(x\) such that \(\mathbb{P}(X \leq x) = q\)
  • Or, given a density \(f(x)\) the quantile is the \(x\) such that \(\int_{-\infty}^x f(x) dx = q\)
  • With data we can calculate an empirical quantile by first sorting the data, then finding the value of the observations below a certain count which is the proportion of the elements
    • e.g. with 100 observations, the 5th percentile is the 5th smallest observation
  • The 0.5 quantile (i.e., the 50th percentile) is the median
  • For heavily skewed distributions, the median is often a better measure of central tendency than the mean

Practice with Iteration and Multivariate Functions

function iterate_map_iid_vec(h, dist, x0, T)
    x = zeros(length(x0), T + 1)
    x[:, 1] = x0
    for t in 2:(T + 1)
        # accepts whatever type rand(dist) returns
        x[:, t] = h(x[:, t - 1], rand(dist))
    end
    return x
end
iterate_map_iid_vec (generic function with 1 method)

Simulation of the Stochastic Growth Model

alpha, delta, s = 0.3, 0.1, 0.2
a, b, c = 0.9, 0.1, 0.1
function h(x, W)
    k = x[1]
    z = x[2]
    return [(1-delta) * k + s * exp(z) * k^alpha,
            a * z + b + c * W]
end
x_0 = [7.0, log(2.0)] # k_0, z_0
T = 150
x = iterate_map_iid_vec(h, Normal(), x_0, T)
plot(x[1, :]; label = L"k_t", xlabel = "t", size = (600, 400), legend=:topright)
plot!(exp.(x[2, :]), label = L"Z_t")
dist = LogNormal(b/(1-a), sqrt(c^2/(1-a^2)))
hline!([mean(dist)]; linestyle = :dash, label = L"\mathbb{E}[Z_t]")
hline!([quantile(dist, 0.05)]; lw=0, fillrange = [quantile(dist, 0.95)], fillalpha=0.2, label = "5/95")

Simulation of the Stochastic Growth Model

Ergodicity and Capital Accumulation

  • Evaluate the closed-form steady-state capital \(k^*\) for the deterministic model
# Remember nonstochastic steady-state
k_ss_det=  (s*mean(dist)/delta)^(1/(1-alpha))

T = 200000
x = iterate_map_iid_vec(h, Normal(), x_0, T)
histogram(x[1, :]; label = L"k_t",
          normalize = true, xlabel = "k",
          alpha=0.5, size = (600, 400))
vline!([k_ss_det]; linestyle = :dash, lw=3,
       label = L"k_{ss}(c = 0)")

Multiplicative Growth Processes

Proportional Growth

  • Many values grow or shrink proportional to their current size
    • e.g. population, firms, wealth, etc.
  • The growth rates are themselves often random
    • e.g. population growth rates, firm growth rates, returns on wealth
    • Random good or bad luck can compound, which changes the distribution
  • See here for more

Kesten Process

  • The simplest Kesten Process is a process of the form

    \[ X_{t+1} = a_{t+1} X_t + y_{t+1} \]

    • \(X_t\) is a state variable
    • \(a_{t+1}\) is an IID random growth rate
    • \(y_{t+1}\) is an IID random shock
  • Examples: if population is \(N_t\) and growth rate between \(t\) and \(t+1\) is \(g_{t+1}\)

    • Then \(N_{t+1}/N_t = 1 + g_{t+1}\)
    • If we had migration \(y_{t+1}\), then \(N_{t+1} = (1 + g_{t+1}) N_t + y_{t+1}\)
  • Key questions will be about whether stationary distributions exist, how they depend on parameters, and how fast they are approached

Conditions for a Stationary Distribution

  • A stationary distribution may not exist.
  • Important conditions for stationary are that
    • \(\mathbb{E}[\log a_t] < 0\), intuition: \(a_t < 1\) most of the time
    • \(\mathbb{E}[y_t] < \infty\)
  • See Kesten Processes for more

Example with Random Growth on a Asset

  • Let \(R_t\) be the gross returns on a asset, and \(W_t\) be value of it \[ W_{t+1} = R_{t+1} W_t \]

    • i.e. no additional savings or consumption
  • Let \(\log R_t \sim \mathcal{N}(\mu, \sigma^2)\), i.e. lognormally distributed

    • The support of \(R_t\) is \((0, \infty)\) and \(\mathbb{E}(R_t) = \exp(\mu + \sigma^2/2)\)

Simulation

mu = -0.01
sigma = 0.1
R_dist = LogNormal(mu, sigma)
T = 100
W_0 = 1.0
@show mean(R_dist)
@show exp(mu + sigma^2/2)
plot(iterate_map_iid((W, R) -> W * R, R_dist,
                     W_0, T);
     ylabel = L"W_t", xlabel = "t",
     size = (600, 400), legend=nothing,
     title = "Simulations of Value")
plot!(iterate_map_iid((W, R) -> W * R, R_dist,
      W_0, T))
plot!(iterate_map_iid((W, R) -> W * R, R_dist,
      W_0, T))
mean(R_dist) = 0.9950124791926823
exp(mu + sigma ^ 2 / 2) = 0.9950124791926823

Simulating an Ensemble

  • Frequently we will want to simulate a large number of paths
function iterate_map_iid_ensemble(f, dist, x0, T, num_samples)
    x = zeros(num_samples, T + 1)
    x[:, 1] .= x0
    for t in 2:(T + 1)
        # or could do a loop over samples
        x[:, t] .= f.(x[:, t - 1], rand(dist, num_samples))
    end
    return x
end
num_samples = 200
W = iterate_map_iid_ensemble((W, R) -> W * R, R_dist, W_0, T, num_samples)
plot(W'; ylabel = L"W_t", xlabel = "t", legend = nothing, alpha = 0.1,
     color=:blue, lw = 1)

Simulating an Ensemble

Distribution of Ensemble Values at t=10

t = 10
bin_edges = range(0.0, 5.0, length=50)
histogram(W[:,t+1]; normalize=:pdf,
         xlabel=L"W_t", ylabel="Density",
         title="Distribution of Values at t=$(t)",
         legend=nothing, size=(600,400),
         bins=bin_edges)
vline!([mean(W[:,t+1])]; color=:red, linestyle=:dash, label="Mean")

Distribution of Ensemble Values at t=10

Distribution of Ensemble Values at t=50

Distribution of Ensemble Values at t=100

Displaying the Distribution Dynamics

num_samples = 1000
T = 200
W = iterate_map_iid_ensemble((W, R) -> W * R, R_dist, W_0, T, num_samples)
q_50 = [quantile(W[:,i], 0.5) for i in 1:T+1]
q_05 = [quantile(W[:,i], 0.05) for i in 1:T+1]
q_95 = [quantile(W[:,i], 0.95) for i in 1:T+1]
mean_W = mean(W, dims=1)'
plot(mean_W; label = "Mean Value")
plot!(q_50; label = "Median Value", style = :dash, color = :lightblue)
plot!(q_05; label = "5/95 Percentile", lw=0, fillrange = q_95, fillalpha=0.4, color = :lightblue)

Displaying the Distribution Dynamics

Larger Returns

mu = -0.001
sigma = 0.1
R_dist = LogNormal(mu, sigma)
T = 100
W_0 = 1.0
@show mean(R_dist)
num_samples = 1000
W = iterate_map_iid_ensemble((W, R) -> W * R, R_dist, W_0, T, num_samples)
q_50 = [quantile(W[:,i], 0.5) for i in 1:T+1]
q_05 = [quantile(W[:,i], 0.05) for i in 1:T+1]
q_95 = [quantile(W[:,i], 0.95) for i in 1:T+1]
mean_W = mean(W, dims=1)'
plot(mean_W; label = "Mean Value")
plot!(q_50; label = "Median Value", style = :dash, color = :lightblue)
plot!(q_05; label = "5/95 Percentile", lw=0, fillrange = q_95, fillalpha=0.4, color = :lightblue)

Larger Returns

mean(R_dist) = 1.004008010677342

Distribution of Ensemble Values at t=10

t = 10
bin_edges = range(0.0, 5.0, length=50)
histogram(W[:,t+1]; normalize=:pdf,
         xlabel=L"W_t", ylabel="Density",
         title="Distribution of Values at t=$(t)",
         legend=nothing, size=(600,400),
         bins=bin_edges)
vline!([mean_W[t+1]]; color=:red, linestyle=:dash, label="Mean")

Distribution of Ensemble Values at t=10

Distribution of Ensemble Values at t=50

Distribution of Ensemble Values at t=100

Divergence and Tails of Distributions

  • These examples show that for multiplicative processes the distributions will often fan out, and potentially diverge
  • This is a common feature of many economic and financial time series
  • In particular, theory will show that for Kesten Processes, the tails of the distribution will be heavy even if it converges to a stationary distribution
    • i.e. the probability of large deviations from the mean will be higher than for a normal distribution
  • These will have what we call Power Law tails in the next section