Distinguishing between deterministic and stochastic trends
Provides methods for modeling and estimating deterministic trends, including:
Constant means
Linear/quadratic trends
Seasonal/cosine trends.
Regression reliability
Output interpretation
Residual analysis.
Deterministic vs stochastic trends
Time series means can range from arbitrary (general case) to constant (stationary case).
Trends represent a middle ground—simple, non-constant mean functions.
Stochastic trends (e.g., random walk) arise from correlation and increasing variance, not a true mean shift, and vary across simulations.
Deterministic trends have a fixed form, e.g., periodic (μt=μt−12) as in monthly temperatures, or linear (μt=β0+β1t). The model Yt=μt+Xt, with E(Xt)=0, assumes μt holds for all time, requiring justification.
Estimation of a constant mean
For Yt=μ+Xt, where E(Xt)=0, the sample mean Yˉ=n1∑t=1nYt is unbiased (E(Yˉ)=μ). Variance depends on Xt’s structure:
Stationary Xt with autocorrelation ρk:
Var(Yˉ)=nγ0[1+2∑k=1n−1(1−nk)ρk]
White noise (ρk=0,k>0): Var(Yˉ)=nγ0.
Moving average Yt=et−21et−1 (ρ1=−0.4, ρk=0,k>1): Var(Yˉ)≈0.2nγ0 for large n, improved by negative correlation.
If ρk≥0, variance exceeds white noise case.
For ∑k=0∞∣ρk∣<∞, large n: Var(Yˉ)≈nγ0∑k=−∞∞ρk, e.g., ρk=ϕ∣k∣ yields nγ01−ϕ1+ϕ.
Nonstationary Xt (Random Walk):
Property
Expression
Model
Yt=∑j=1tej
Mean
E(Yˉ)=0
Variance
Var(Yˉ)=σe2(2n+1)6nn+1
Regression methods
Regression estimates deterministic trends via least squares.
Linear Trend: μt=β0+β1t, minimized via Q(β0,β1)=∑t=1n[Yt−(β0+β1t)]2:
β^1=∑t=1n(t−tˉ)2∑t=1n(Yt−Yˉ)(t−tˉ),β^0=Yˉ−β^1tˉ,tˉ=2n+1
Example: Random walk fit yields β^0=−1.008, β^1=0.1341.
Seasonal Means: For monthly data, μt=βj (e.g., j=1 for January):
\beta_1, t = 1, 13, 25, \ldots \\
\beta_2, t = 2, 14, 26, \ldots \\
\vdots \\
\beta_{12}, t = 12, 24, 36, \ldots
\end{cases} $$
Estimates are monthly averages; e.g., temperature data fit gives $\beta_1 = 16.608$ (January).
- **Cosine Trends**: $\mu_t = \beta_0 + \beta_1 \cos(2\pi f t) + \beta_2 \sin(2\pi f t)$, $f = 1/12$ for monthly data. Example: Temperature fit yields $\hat{\beta}_0 = 46.2660$, $\hat{\beta}_1 = -26.7079$, $\hat{\beta}_2 = -2.1697$.
## Reliability and efficiency of regression estimates
For $Y_t = \mu_t + X_t$, $E(X_t) = 0$, $X_t$ stationary with $\gamma_k$, $\rho_k$:
- **Seasonal Means**:
| Property | Expression |
|------------------|-------------------------------------|
| Estimate | $\hat{\beta}_j = \frac{1}{N} \sum_{i=0}^{N-1} Y_{j + 12i}$ |
| Variance | $\operatorname{Var}(\hat{\beta}_j) = \frac{\gamma_0}{N} \left[1 + 2 \sum_{k=1}^{N-1} \left(1 - \frac{k}{N}\right) \rho_{12k}\right]$ |
White noise: $\gamma_0 / N$.
- **Cosine Trends**:
| Property | Expression |
|------------------|-------------------------------------|
| Estimate | $\hat{\beta}_1 = \frac{2}{n} \sum_{t=1}^n \cos\left(\frac{2\pi m t}{n}\right) Y_t$ |
| Variance | $\operatorname{Var}(\hat{\beta}_1) = \frac{2 \gamma_0}{n} \left[1 + \frac{4}{n} \sum_{s=2}^n \sum_{t=1}^{s-1} \cos\left(\frac{2\pi m t}{n}\right) \cos\left(\frac{2\pi m s}{n}\right) \rho_{s-t}\right]$ |
White noise: $2 \gamma_0 / n$. For $\rho_1 = -0.4$, large $n$: reduced by ~70%.
- **Linear Trend**:
| Property | Expression |
|------------------|-------------------------------------|
| Estimate | $\hat{\beta}_1 = \frac{\sum_{t=1}^n (t - \bar{t}) Y_t}{\sum_{t=1}^n (t - \bar{t})^2}$ |
| Variance | $\operatorname{Var}(\hat{\beta}_1) = \frac{12 \gamma_0}{n (n^2 - 1)} \left[1 + \frac{24}{n (n^2 - 1)} \sum_{s=2}^n \sum_{t=1}^{s-1} (t - \bar{t})(s - \bar{t}) \rho_{s-t}\right]$ |
For $\rho_1 \neq 0$, $\rho_k = 0, k > 1$, large $n$: $\frac{12 \gamma_0 (1 + 2 \rho_1)}{n (n^2 - 1)}$.
Least squares is asymptotically efficient for large $n$ compared to best linear unbiased estimates (BLUE), but standard errors assume white noise.
## Interpretation of regression output
Regression output (e.g., random walk fit) includes $\hat{\beta}_0$, $\hat{\beta}_1$, standard errors, $t$-values, $R^2$ (e.g., 0.812), and residual standard error $s = \sqrt{\frac{1}{n-p} \sum_{t=1}^n (Y_t - \hat{\mu}_t)^2}$.
Standard errors and $t$-values assume white noise and normality, often invalid for time series.
## Residual Analysis
Residuals $\hat{X}_t = Y_t - \hat{\mu}_t$ assess model fit. For temperature seasonal means:
- Plots (time, fitted values, histogram, QQ) show no trends, approximate normality (Shapiro-Wilk $W = 0.9929$, $p = 0.6954$).
- Runs test ($p = 0.216$) and sample autocorrelation $r_k = \frac{\sum_{t=k+1}^n (Y_t - \bar{Y})(Y_{t-k} - \bar{Y})}{\sum_{t=1}^n (Y_t - \bar{Y})^2}$ suggest independence.
For random walk linear fit: residuals show dependence (high $r_1$, $r_2$).
## Python example
### Estimating a Constant Mean and Variance
```python
import numpy as np
# Simulate a stationary series with constant mean
np.random.seed(42)
n = 100
mu = 5
e_t = np.random.normal(0, 1, n) # White noise
Y_t = mu + e_t
# Sample mean
Y_bar = np.mean(Y_t)
print(f"Sample Mean: {Y_bar:.3f}")
# Variance of sample mean (white noise case)
gamma_0 = np.var(e_t, ddof=1)
var_Y_bar = gamma_0 / n
print(f"Variance of Sample Mean (White Noise): {var_Y_bar:.3f}")
```
This code simulates a stationary time series $Y_t = \mu + e_t$ with $\mu = 5$ and white noise $e_t$, then computes the sample mean and its variance under the white noise assumption.
### Fitting a Linear Trend with Least Squares
```python
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Simulate a series with a linear trend
np.random.seed(42)
t = np.arange(1, 101)
beta_0, beta_1 = 2, 0.1
e_t = np.random.normal(0, 1, 100)
Y_t = beta_0 + beta_1 * t + e_t
# Fit linear trend
X = sm.add_constant(t) # Add intercept term
model = sm.OLS(Y_t, X).fit()
print(model.summary())
# Plot
plt.plot(t, Y_t, 'o', label='Data')
plt.plot(t, model.fittedvalues, '-', label='Linear Fit')
plt.xlabel('Time')
plt.ylabel('Y_t')
plt.legend()
plt.show()
```
This example generates a series with a linear trend $Y_t = 2 + 0.1t + e_t$, fits it using OLS regression, and plots the data with the fitted line.
### Seasonal Means Model for Monthly Data
```python
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Simulate monthly data with seasonal means
np.random.seed(42)
n_years = 5
t = np.arange(1, 12 * n_years + 1)
season = np.tile(np.arange(1, 13), n_years)
beta = np.array([10, 12, 15, 18, 20, 22, 21, 19, 16, 13, 11, 10]) # Seasonal means
e_t = np.random.normal(0, 1, len(t))
Y_t = beta[season - 1] + e_t
# Fit seasonal means model (no intercept)
df = pd.DataFrame({'Y': Y_t, 'Month': season})
X = pd.get_dummies(df['Month'], drop_first=False) # Indicator variables
model = sm.OLS(df['Y'], X).fit()
print(model.summary())
# Plot residuals
plt.plot(t, model.resid, 'o')
plt.xlabel('Time')
plt.ylabel('Residuals')
plt.show()
```
This simulates monthly data with distinct seasonal means, fits a seasonal means model using dummy variables, and plots the residuals.
### Cosine Trend Fit
```python
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
# Simulate data with cosine trend
np.random.seed(42)
t = np.arange(1, 61)
beta_0, beta_1, beta_2 = 10, 5, 2
f = 1 / 12 # Monthly frequency
Y_t = beta_0 + beta_1 * np.cos(2 * np.pi * f * t) + beta_2 * np.sin(2 * np.pi * f * t) + np.random.normal(0, 1, 60)
# Fit cosine trend
X = sm.add_constant(np.column_stack((np.cos(2 * np.pi * f * t), np.sin(2 * np.pi * f * t))))
model = sm.OLS(Y_t, X).fit()
print(model.summary())
# Plot
plt.plot(t, Y_t, 'o', label='Data')
plt.plot(t, model.fittedvalues, '-', label='Cosine Fit')
plt.xlabel('Time')
plt.ylabel('Y_t')
plt.legend()
plt.show()
```
This generates a series with a cosine trend $Y_t = 10 + 5 \cos(2\pi t / 12) + 2 \sin(2\pi t / 12) + e_t$, fits it, and visualizes the fit.
### Residual Analysis with Autocorrelation
```python
import numpy as np
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt
# Simulate a random walk (nonstationary)
np.random.seed(42)
n = 100
e_t = np.random.normal(0, 1, n)
Y_t = np.cumsum(e_t)
# Fit linear trend
t = np.arange(1, n + 1)
X = sm.add_constant(t)
model = sm.OLS(Y_t, X).fit()
# Standardized residuals
resid = model.resid / np.std(model.resid, ddof=1)
# Plot residuals
plt.plot(t, resid, 'o')
plt.xlabel('Time')
plt.ylabel('Standardized Residuals')
plt.show()
# Autocorrelation function
plot_acf(resid, lags=20)
plt.show()
```
This simulates a random walk, fits a linear trend, computes standardized residuals, and plots both the residuals and their sample autocorrelation function to check for dependence.