General Kalman Setup
The system evolution is defined as the evolution of a state:
- state $x$ at $k-1$ through $F_k$ state transition matrix
- control vector $u$ with a control input matrix $B_k$
- $w_k \sim N(0,Q_k)$ a noise process on the evolution
with observations:
- $z_k$ is the measurement of the state $x_k$,
- made through the observation matrix $H_k$ applied on $x_k$
- with noise $v_k \sim N(0,R_k)$
\begin{eqnarray} x_k &=& F_k x_{k-1} + w_k \\ z_k &=& H_k x_k + v_k \end{eqnarray}
The Kalman prediction of the state covariance is given by:
\begin{eqnarray} x_{k|k-1} &=& F_k x_{k-1|k-1} + B_k u_k \\ P_{k|k-1} &=& F_k P_{k-1|k-1} F^t + Q_k \end{eqnarray}
The update rule is
- innovation residual: $y_k = z_k - H_k x_{k|k-1}$
- residual covariance $S_k = H_k P_{k|k-1} H^t_k + R_k$
- optimal kalman gain $K_k = P_{k|k-1} H^t_k S^{-1}_k$
- updated state estimate $x_{k|k} = x_{k|k-1} + K_k y_k$
- updated estimated covariance $P_{k|k} = (I- K_k H_k)P_{k|k-1}$
Exponential Moving Average as a 1D Kalman filter
The state $x$ is the hidden, optimal, filtered price with noise $Q$. The observation $z$ is the market price with noise $R$. $H=F=1,B=0$. The update rules are:
\begin{eqnarray} x_k &=& x_{k-1} + \frac{P+Q}{P+Q+R} (z_k-x_{k-1}) \\ P_k &=& \left(1 - \frac{P_{k-1}+Q}{P_{k-1}+Q+R}\right)(P_{k-1}+Q) \end{eqnarray}
$P_k$ quickly reaches a stable point solution of $P=\left(1 - \frac{P+Q}{P+Q+R}\right)(P+Q)$ which is the positive root of a quadratic equation $x^2 + Qx -QR=0$.
$K = \frac{P+Q}{P+Q+R}$ tends to a constant, and the filter update rule looks like an ewma with weight that tends to K: $$x_k = K z_k + (1-K)x_{k-1}$$
$p$-Factor Regression as a $p$ factor Kalman filter
Regression model definition: \begin{eqnarray} \beta_i &=& \beta_{i-1} + w_i \\ Y_i &=& X_i \beta_i + \varepsilon_i \end{eqnarray}
To map to previous notations, $F=Id(p), B = 0, x_k=\beta_k, H_k = X_k^t, z_k = Y_k$
Kalman updating rule: \begin{eqnarray} K_i &=& \frac{1}{R+X_i^t (P_{i-1}+Q) X_i} (P_{i-1}+Q) . X_i \\ \beta_i &=& \beta_{i-1} + K_i (Y_i - X_{i-1} \beta_{i-1}) \\ P_i &=& (1-K_iX_i).(P_{i-1} + Q) \end{eqnarray}
with dimensionalities: $X_i \in M(1,p), \beta_i \in M(1,p), K_i \in M(p,1), Q \in M(p,p), P_i \in M(p,p)$
In theory $Q_i = cov(w_i), R_i=cov(\varepsilon_i)$, a more stable setup in practice has covariance estimates $Q=Id(p), P_0=10^7 Id(p), R=5$.
Non Kalman view: Smoothed 1D regression as a Penalized $L^2$ Problem
We keep the model setup: \begin{eqnarray} \beta_i &=& \beta_{i-1} + w_i \\ Y_i &=& X_i \beta_i + \varepsilon_i \end{eqnarray}
This can be minimized given all times using a quadratic programming with the loss function $$ \sum_i (Y_i - X_i \beta_i)^2 + \mu \sum_i (\beta_i - \beta_{i-1})^2 $$ For $\mu = 0$, the regression is exact whereas for $\mu$ sufficiently high, $\beta$ does not change with time.
One of the problem is that the algorithm is not adaptive and requires the whole history to optimize. Besides, the choice of optimal $\mu$ reads like a compromise between precision of the estimate and its stability, and it best estimated using information available up to time $i-1$.