Derivation of Unscented Kalman Filter (UKF)

Linearization of non-linear dynamics/observation in the EKF is a clever trick to use the KF equations. We approximate a new linear function from non-linear function around one point i.e. the mean of the Gaussian. However, there is still a better way to approximate it, say we approximate around multiple points. This is the inituition behind the powerful method called Unscented Kalman Filter (UKF). We take some points on source Gaussian and map them on target Gaussian after passing points through some non linear function and then we calculate the new mean and variance of transformed Gaussian. These special set of points are called "Sigma points". It can be very difficult to transform whole state distribution (computationally expensive) through a non linear function but it is very easy to transform some individual points. These sigma points are the representatives of whole distribution.


Unscented Transform

Consider a non-linear function $z = f(x)$ for a random variable $\mathbb{R}^n \ni x \sim N(\mu, \Sigma)$. Then the sigma points are computed as, \begin{align*} x^{(i)} &= \mu + \sqrt{n\Sigma_i}^T \\ x^{(n+i)} &= \mu - \sqrt{n\Sigma_i}^T \\ \textrm{for } &i = 1, ..., n \tag{1} \end{align*} where $\sqrt{n\Sigma_i}$ is the $i^{th}$ row of the matrix $\sqrt{n\Sigma}$. The matrix square root is computed using the Cholesky decomposition - if we can decompose $\Sigma = S S^T$, then $ S = \sqrt{\Sigma}$. There are $2n$ sigma points for an $n$-dimensional Gaussian. Each sigma point is assigned an equal weight, \begin{equation*} w^{(i)} = \frac{1}{2n} \tag{2} \end{equation*} Each sigma point is then transformed with a non-linear function $f$ to get the transformed sigma points $z^{(i)} = f(x^{(i)})$. The mean and covariance of the transformed points can be computed as, \begin{align*} \mu_z &= \sum_{i=1}^{2n} w^{(i)} z^{(i)} = \sum_{i=1}^{2n} w^{(i)} f(x^{(i)}) \tag{3}\\ \Sigma_z &= \sum_{i=1}^{2n} w^{(i)} (z^{(i)} - \mu_z) (z^{(i)} - \mu_z)^T = \sum_{i=1}^{2n} w^{(i)} (f(x^{(i)}) - \mu_z) (f(x^{(i)}) - \mu_z)^T \tag{4} \end{align*}
The Unscented Transform gives us a way to accurately estimate the mean and covariance of the transformed distribution through a nonlinearity. We can use the UT to modify the EKF to make it a more accurate state estimator. The resultant algorithm is called the Unscented Kalman Filter (UKF). The UT can aslo be modified in varied ways through the inclusion of mean as one of the sigma points and the scaling factor.

Consider a non-linear dynamical system with non-linear observations, \begin{align*} x_{k+1} &= f(x_k, u_k) + \epsilon \\ y_k &= g(x_k) + \upsilon \end{align*} where noise vectors $\epsilon \sim N(0,R)$ and $\upsilon \sim N(0,Q)$ are defined similar to Part 3.


Step 1: Propagating the dynamics by one timestep

Given our current state estimate $\mu_{k|k}$ and $\Sigma_{k|k}$, we use the UT to obtain the updated estimates $\mu_{k+1|k}$ and $\Sigma_{k+1|k}$. If $x^{(i)}$ are the sigma points of the Guassian $N(\mu_{k|k}, \Sigma_{k|k})$ and $w^{(i)}$ are the corresponding weights, we transform them using the non-linear function and obtain new mean and covariance (using Equation 3 and 4) as, \begin{align*} \mu_{k+1|k} &= E[\hat{x}_{k+1|k}] = E[f(\hat{x}_{k|k}, u_k) + \epsilon] = \sum_{i=1}^{2n} w^{(i)} f(x^{(i)}, u_k) \tag{5}\\ \Sigma_{k+1|k} &= \textrm{cov}(\hat{x}_{k+1|k}) = \textrm{cov}(f(\hat{x}_{k|k}, u_k) + \epsilon) \\ &= \sum_{i=1}^{2n} w^{(i)} (f(x^{(i)}) - \mu_{k+1|k}) (f(x^{(i)}) - \mu_{k+1|k})^T + R \tag{6} \end{align*}


Computing general form of Kalman gain

The observation step is also modified using the UT. We need a way to compute the Kalman gain in terms of the sigma points in the UT. Until now we have written the Kalman gain using the measurement matrix $C$. We will now discuss a more abstract formulation that gives the same expression.

Let's get back to our standard Kalman Filter equations (Step 2 of Part 2) to obtain a general form of the kalman gain and update equations, i.e. without the measurement matrix $C$. Recall that the linear observation model was given as ${y}_{k+1} = C x_{k+1|k} + \upsilon$ where $\upsilon \sim N(0,Q)$. An estimate of this observation (or prediction of the measurement) is given as, \begin{equation*} \hat{y}_{k+1} = C \hat{x}_{k+1|k} + \upsilon \tag{7} \end{equation*} From Step 1, we obtain $\mu_{k+1|k}, \Sigma_{k+1|k}$ and we want to incorporte a new observation $y_{k+1}$. We define the joint distribution of the previous state estimate $\hat{x}_{k+1|k}$ and observation estimate $\hat{y}_{k+1}$ as, \begin{equation*} p = \begin{bmatrix} x \\ y \end{bmatrix} \sim N \left( \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix}, \begin{bmatrix} \Sigma_{xx} & \Sigma_{xy}\\ \Sigma_{yx} & \Sigma_{yy} \end{bmatrix} \right) \end{equation*} For simpler notation we use $\hat{x}_{k+1|k} = x$ and $\hat{y}_{k+1} = y$. We already have \begin{align*} \mu_x &= E[\hat{x}_{k+1|k}] = \mu_{k+1|k} \\ \Sigma_{xx} &= \textrm{cov}(\hat{x}_{k+1|k}) = \Sigma_{k+1|k} \end{align*} The mean of the estimate of the observation is $\mu_y = E[\hat{y}_{k+1}]$. For the linear system, the covariances can be obtained as, \begin{align*} \Sigma_{yy} &= \textrm{cov}(\hat{y}_{k+1}) = \textrm{cov}(C \hat{x}_{k+1|k} + \upsilon) = C \Sigma_{k+1|k} C^T + Q \tag{8.1}\\ \Sigma_{xy} &= E[(\hat{x}_{k+1|k} - E[\hat{x}_{k+1|k}])(\hat{y}_{k+1} - E[\hat{y}_{k+1}])^T] \\ &= E[(\hat{x}_{k+1|k} - \mu_{k+1|k})(C \hat{x}_{k+1|k} + \upsilon - E[C \hat{x}_{k+1|k} + \upsilon])^T] \\ &= E[(\hat{x}_{k+1|k} - \mu_{k+1|k})(C \hat{x}_{k+1|k} + \upsilon - C\mu_{k+1|k})^T]\\ &= E[(\hat{x}_{k+1|k} - \mu_{k+1|k})(C (\hat{x}_{k+1|k} - \mu_{k+1|k}) + \upsilon)^T] \\ &= E[(\hat{x}_{k+1|k} - \mu_{k+1|k})((\hat{x}_{k+1|k} - \mu_{k+1|k})^T C^T + \upsilon^T)] \\ &= E[(\hat{x}_{k+1|k} - \mu_{k+1|k}) (\hat{x}_{k+1|k} - \mu_{k+1|k})^T] C^T + E[(\hat{x}_{k+1|k} - \mu_{k+1|k})] E[\upsilon^T] \\ &= \Sigma_{k+1|k} C^T \tag{8.2}\\ \Sigma_{yx} &= \Sigma_{xy}^T = (\Sigma_{k+1|k} C^T)^T = C \Sigma_{k+1|k} \tag{8.3} \end{align*} We can write the Kalman gain equation (equation 4 of Part 2) in terms of the covariances (we do this to get rid of the matrix $C$) using Equations 8.1,8.2 as, \begin{align*} K_{k+1} &= \Sigma_{k+1|k} C^T ( C \Sigma_{k+1|k} C^T + Q)^{-1} \\ K^*_{k+1} &= \Sigma_{xy} ( \Sigma_{yy} )^{-1} \tag{9} \end{align*} $K^*$ is the representation of Kalman gain in a general form which is applicable for all types of systems. The updated estimate (Equation 5 of Part 2) can be written through rearrangement of Equation 7 as, \begin{align*} \hat{x}_{k+1|k+1} &= \hat{x}_{k+1|k} + K_{k+1}(y_{k+1} - C\hat{x}_{k+1|k}) \\ &= \hat{x}_{k+1|k} + K^*_{k+1}(y_{k+1} - \hat{y}_{k+1} + \upsilon) \end{align*} The mean of this new estimator can be computed as, \begin{align*} \mu_{k+1|k+1} &= E[\hat{x}_{k+1|k+1}] = E [\hat{x}_{k+1|k} + K^*_{k+1}(y_{k+1} - \hat{y}_{k+1} + \upsilon)] \\ &= \mu_{k+1|k} + K^*_{k+1}(y_{k+1} - \mu_{y}) \tag{10} \end{align*} And the covariance of this estimator (Equation 6.2 of Part 2) can be written in a similar fashion using Equations 8.1,8.2,8.3 as, \begin{align*} \Sigma_{k+1|k+1} &= (I - K_{k+1}C) \Sigma_{k+1|k} = \Sigma_{k+1|k} - K_{k+1}C\Sigma_{k+1|k} \\ &= \Sigma_{k+1|k} - \Sigma_{k+1|k} C^T ( C \Sigma_{k+1|k} C^T + Q)^{-1} C \Sigma_{k+1|k}\\ &= \Sigma_{k+1|k} - \Sigma_{xy} (\Sigma_{yy})^{-1} \Sigma_{yx} \tag{11.1} \\ &= \Sigma_{k+1|k} - K^* \Sigma_{yy} K^{*T} \tag{11.2} \end{align*}

Step 2: Incorporating the observation

It is clear from the above derivation that we need the estimate of the new observation to proceed further. We generate new sigma points $x^{(i)}$ on the updated distribution $N(\mu_{k+1|k}, \Sigma_{k+1,k})$ obtained from Step 1 with weights $w^{(i)}$. We then transform them using the non-linear observation model $g$. These transformed points estimate our new observation $y_{k+1}$. To apply equations 9, 10 and 11, we need $\mu_y, \Sigma_{yy}, \Sigma_{xy}$. We compute them as follows,

The mean of the transformed sigma points is nothing but the mean of the estimation of the observation $\mu_y$. We use Equation 3 to obtain, \begin{equation*} \mu_{y} = E[\hat{y}_{k+1}] = E[g(\hat{x}_{k+1|k}) + \upsilon] = \sum_{i=1}^{2n} w^{(i)} g(x^{(i)}) \tag{12} \end{equation*} The other covariances are computed using Equation 4 as, \begin{align*} \Sigma_{yy} &= \textrm{cov}(\hat{y}_{k+1}) = \textrm{cov}(g(\hat{x}_{k+1|k}) + \upsilon) \\ &= \sum_{i=1}^{2n} w^{(i)} (g(x^{(i)}) - \mu_y)(g(x^{(i)}) - \mu_y)^T + Q \tag{13.1} \\ \Sigma_{xy} &= \textrm{cov}(\hat{x}_{k+1|k}, \hat{y}_{k+1}) = \sum_{i=1}^{2n} w^{(i)} (f(x^{(i)}) - \mu_{k+1|k})(g(x^{(i)}) - \mu_y)^T \tag{13.2} \\ \end{align*}

Now we can apply equations 9,10 and 11 to obtain the updates mean $\mu_{k+1|k+1}$ and covariance $\Sigma_{k+1|k+1}$.

As compared to the Extended Kalman Filter, the UKF is a better approximation for nonlinear systems. Of course, if the system is linear, both EKF and UKF are equivalent to a Kalman Filter. In practice, we typically use the UKF with some tuning parameters in the Unscented Transform. An important point to remember about both the UKF and EKF is that even if they can handle nonlinear systems, they still approximate the filtering distribution $P(x_k|y_1, ..., y_k)$ as a Gaussian.

Still wondering why it is called Unscented Kalman Filter?
The guys who invented UKF thought that EKF stinks because it was a very poor idea to linearize a non linear function around a single point i.e. mean. They called it Unscented on purpose so that they can tell the world that EKF stinks!!

Comments

Popular posts from this blog

The move_base ROS node

Three Wheeled Omnidirectional Robot : Motion Analysis

Overview of ATmega328P