Maximum Likelihood Method

About

Maximum Likelihood Method estimates parameters by maximizing the probability of observing the given data, assuming a specific probability distribution for the process.

Core Concept

The likelihood function $L (θ ∣ Y_{1}, \dots, Y_{n})$ is the joint probability density viewed as a function of parameters $θ$ , with data fixed.

$L (θ) = f (Y_{1}, Y_{2}, \dots, Y_{n} ∣ θ)$

The maximum likelihood estimator (MLE) is:

$\hat{θ} = ar g max_{θ} L (θ)$

Usually, maximize $lo g L (θ)$ for computational convenience.

Advantages

Uses all information in the data (not just moments)
Asymptotically efficient (lowest variance)
Consistent and asymptotically normal
Works well for large samples

Limitations

Requires specification of full distributional form
Computationally intensive for complex models

Example: AR(1) Model

Assume white noise $e_{t} \sim NIID (0, σ_{e}^{2})$ .

Recall for $X \sim N (0, σ_{e}^{2})$ , the pdf is:

f (X) = \frac{1}{2 π σ ^{2}} exp (- \frac{x ^{2}}{2 σ ^{2}})

Step 1: Joint Density of Errors

The AR(1) errors are white noise: $e_{t} \sim NIID (0, σ_{e}^{2})$ for $t = 2, 3, \dots, n$ . Since they are independent and identically distributed, the joint density equals the product of individual densities.

Derivation:

Recall the pdf of a normal random variable $X \sim N (μ, σ^{2})$ is: $f (X) = \frac{1}{2 π σ ^{2}} exp (- \frac{( X - μ ) ^{2}}{2 σ ^{2}})$

Since $e_{t}$ has mean zero and variance $σ_{e}^{2}$ : $f (e_{t}) = \frac{1}{2 π σ _{e}^{2}} exp (- \frac{e _{t}^{2}}{2 σ _{e}^{2}})$

The joint density is the product of $(n - 1)$ independent densities:

f (e_{2}, e_{3}, \dots, e_{n}) = t = 2 \prod n f (e_{t}) = t = 2 \prod n \frac{1}{2 π σ _{e}^{2}} exp (- \frac{e _{t}^{2}}{2 σ _{e}^{2}}) = (\frac{1}{2 π σ _{e}^{2}})^{n - 1} t = 2 \prod n exp (- \frac{e _{t}^{2}}{2 σ _{e}^{2}}) = (\frac{1}{2 π σ _{e}^{2}})^{n - 1} exp (- \frac{1}{2 σ _{e}^{2}} t = 2 \sum n e_{t}^{2})

Step 2: Marginal Density of $Y_{1}$

For a stationary AR(1) process, the first observation $Y_{1}$ follows the stationary (marginal) distribution. The AR(1) model is: $Y_{t} = μ + ϕ Y_{t - 1} + e_{t}$

The stationary distribution is derived by noting that in the long run: $Var (Y_{t}) = Var (Y_{t - 1}) = σ_{Y}^{2}$

From the model equation:

Var (Y_{t}) σ_{Y}^{2} σ_{Y}^{2} (1 - ϕ^{2}) σ_{Y}^{2} = ϕ^{2} Var (Y_{t - 1}) + Var (e_{t}) = ϕ^{2} σ_{Y}^{2} + σ_{e}^{2} = σ_{e}^{2} = \frac{σ _{e}^{2}}{1 - ϕ ^{2}}

The stationary mean is $μ_{Y} = \frac{μ}{1 - ϕ}$ . Thus: $Y_{1} \sim N (\frac{μ}{1 - ϕ}, \frac{σ _{e}^{2}}{1 - ϕ ^{2}})$

Derivation:

Substituting into the normal pdf with mean $μ_{Y} = \frac{μ}{1 - ϕ}$ and variance $σ_{Y}^{2} = \frac{σ _{e}^{2}}{1 - ϕ ^{2}}$ :

f (Y_{1}) = \frac{1}{2 π \cdot \frac{σ _{e}^{2}}{1 - ϕ ^{2}}} exp - \frac{( Y _{1} - \frac{μ}{1 - ϕ} ) ^{2}}{2 \cdot \frac{σ _{e}^{2}}{1 - ϕ ^{2}}} = \frac{1 - ϕ ^{2}}{2 π σ _{e}^{2}} exp - \frac{( 1 - ϕ ^{2} ) ( Y _{1} - \frac{μ}{1 - ϕ} ) ^{2}}{2 σ _{e}^{2}}

Step 3: Complete Likelihood

The complete likelihood is derived using the chain rule of probability (multiplication rule). This factorizes the joint density of all observations.

Derivation:

By the definition of conditional probability: $P (A, B) = P (A) \cdot P (B ∣ A)$

Applying this to the joint density of observations:

f (Y_{1}, Y_{2}, \dots, Y_{n} ∣ ϕ, μ, σ_{e}^{2}) = f (Y_{1} ∣ ϕ, μ, σ_{e}^{2}) \cdot f (Y_{2}, \dots, Y_{n} ∣ Y_{1}, ϕ, μ, σ_{e}^{2})

Since $e_{t} = Y_{t} - μ - ϕ Y_{t - 1}$ for $t \geq 2$ , the distribution of future errors depends only on $Y_{1}$ through the deterministic relationship, not through stochastic dependence. Therefore: $f (e_{2}, \dots, e_{n} ∣ Y_{1}, ϕ, μ, σ_{e}^{2}) = f (e_{2}, \dots, e_{n} ∣ ϕ, μ, σ_{e}^{2})$

The complete likelihood is: $L (ϕ, μ, σ_{e}^{2}) = f (Y_{1}) \cdot f (e_{2}, \dots, e_{n} ∣ Y_{1})$

Step 4: Log-Likelihood

The log-likelihood is obtained by combining Steps 1, 2, and 3, and taking the natural logarithm.

Derivation:

First, multiply the marginal density $f (Y_{1})$ from Step 2 with the joint error density from Step 1:

L (ϕ, μ, σ_{e}^{2}) = f (Y_{1}) \cdot f (e_{2}, \dots, e_{n}) = \frac{1 - ϕ ^{2}}{2 π σ _{e}^{2}} exp - \frac{( 1 - ϕ ^{2} ) ( Y _{1} - \frac{μ}{1 - ϕ} ) ^{2}}{2 σ _{e}^{2}} \cdot (\frac{1}{2 π σ _{e}^{2}})^{n - 1} exp (- \frac{1}{2 σ _{e}^{2}} t = 2 \sum n e_{t}^{2})

Taking the natural logarithm:

lo g L = lo g (\frac{1 - ϕ ^{2}}{2 π σ _{e}^{2}}) - \frac{( 1 - ϕ ^{2} ) ( Y _{1} - \frac{μ}{1 - ϕ} ) ^{2}}{2 σ _{e}^{2}} + (n - 1) lo g (\frac{1}{2 π σ _{e}^{2}}) - \frac{1}{2 σ _{e}^{2}} t = 2 \sum n e_{t}^{2} = \frac{1}{2} lo g (1 - ϕ^{2}) - \frac{1}{2} lo g (2 π σ_{e}^{2}) - \frac{( 1 - ϕ ^{2} ) ( Y _{1} - \frac{μ}{1 - ϕ} ) ^{2}}{2 σ _{e}^{2}} - \frac{n - 1}{2} lo g (2 π σ_{e}^{2}) - \frac{1}{2 σ _{e}^{2}} t = 2 \sum n e_{t}^{2} = - \frac{n}{2} lo g (2 π σ_{e}^{2}) + \frac{1}{2} lo g (1 - ϕ^{2}) - \frac{S ( ϕ , μ )}{2 σ _{e}^{2}}

Where the unconditional sum of squares is: $S (ϕ, μ) = (1 - ϕ^{2}) (Y_{1} - \frac{μ}{1 - ϕ})^{2} + \sum_{t = 2}^{n} (Y_{t} - μ - ϕ Y_{t - 1})^{2}$

Step 5: Estimate $σ_{e}^{2}$

The estimate of $σ_{e}^{2}$ is obtained by maximizing the log-likelihood with respect to $σ_{e}^{2}$ , treating $\hat{ϕ}$ and $\overset{μ}{^}$ as already estimated.

Derivation:

From Step 4, the log-likelihood is: $lo g L = - \frac{n}{2} lo g (2 π σ_{e}^{2}) + \frac{1}{2} lo g (1 - ϕ^{2}) - \frac{S ( ϕ , μ )}{2 σ _{e}^{2}}$

Taking the partial derivative with respect to $σ_{e}^{2}$ :

\frac{\partial lo g L}{\partial σ _{e}^{2}} = - \frac{n}{2} \cdot \frac{1}{σ _{e}^{2}} + \frac{S ( ϕ , μ )}{2 ( σ _{e}^{2} ) ^{2}}

Setting equal to zero for maximization:

- \frac{n}{2 σ _{e}^{2}} + \frac{S ( ϕ , μ )}{2 ( σ _{e}^{2} ) ^{2}} \frac{n}{2 σ _{e}^{2}} n \overset{σ}{^}_{e}^{2} = 0 = \frac{S ( ϕ , μ )}{2 ( σ _{e}^{2} ) ^{2}} = \frac{S ( ϕ , μ )}{σ _{e}^{2}} = \frac{S ( ϕ ^ , μ ^ )}{n}

After obtaining $\hat{ϕ}$ and $\overset{μ}{^}$ (by maximizing the concentrated likelihood), the estimator is: $\overset{σ}{^}_{e}^{2} = \frac{S ( ϕ ^ , μ ^ )}{n}$

Maximum Likelihood Method

Table of Contents

Table of Contents

Maximum Likelihood Method

About

Core Concept

Advantages

Limitations

Example: AR(1) Model

Step 1: Joint Density of Errors

Step 2: Marginal Density of $Y_{1}$

Step 3: Complete Likelihood

Step 4: Log-Likelihood

Step 5: Estimate $σ_{e}^{2}$

Recent Notes

Deciding Recall Difficulty of Kaishi 1.5k

Tips in Memorizing Kaishi 1.5k

Learn Japanese

Choosing Recall Difficulty of Anki Cards

studying

Graph View

Related notes

Maximum Likelihood Method

Table of Contents

Table of Contents

Maximum Likelihood Method

About

Core Concept

Advantages

Limitations

Example: AR(1) Model

Step 1: Joint Density of Errors

Step 2: Marginal Density of Y1​

Step 3: Complete Likelihood

Step 4: Log-Likelihood

Step 5: Estimate σe2​

Related

Recent Notes

Deciding Recall Difficulty of Kaishi 1.5k

Tips in Memorizing Kaishi 1.5k

Learn Japanese

Choosing Recall Difficulty of Anki Cards

studying

Graph View

Related notes

Step 2: Marginal Density of $Y_{1}$

Step 5: Estimate $σ_{e}^{2}$