# Block 3 — Optimal Fusion

By the end of this block you should be able to:

1. Explain why combining sensors improves a navigation estimate.
2. Compute a fused estimate using **weighted averaging**.
3. Show that the optimal weights depend on the **measurement uncertainty** of each sensor.
4. Compute the **variance of a fused estimate** and recognize that it is always smaller than either input variance.
5. See that the Kalman filter is just **recursive optimal fusion**.

## Why Fuse Sensors?

Navigation systems often measure the same physical quantity with more than one sensor. A canonical example: aircraft altitude. A radar altimeter gives you a precise reading at low altitude, with a 1-sigma noise of perhaps 2 m. A barometric altimeter is noisier, with a 1-sigma noise of perhaps 10 m, but it works at any altitude and through any weather.

If both sensors are looking at the same true altitude, what altitude estimate should the navigation algorithm use? Pick the radar reading and ignore the baro? Average them? Something smarter?

:::{admonition} Key Concept
:class: key-concept

Two independent measurements of the same quantity can always be combined to produce an estimate with **lower variance** than either one alone. The trick is choosing the right weights.
:::

## Simple Averaging

Suppose two sensors return measurements $z_1$ and $z_2$. The simplest combination is the unweighted average

$$
\hat{x} = \frac{z_1 + z_2}{2}.
$$

This works when the sensors have similar uncertainty. It is a poor choice when one sensor is much better than the other. If $\sigma_1 = 1$ m and $\sigma_2 = 10$ m, why would you let the noisy sensor have equal influence on your estimate?

## Weighted Averaging

Generalize to a weighted estimate:

$$
\hat{x} = w_1 z_1 + w_2 z_2, \qquad w_1 + w_2 = 1.
$$

Now the question is: what weights minimize the uncertainty of $\hat{x}$?

## The Optimal Weights

Assume the two sensor errors are independent and have variances $\sigma_1^2$ and $\sigma_2^2$. The variance of the weighted estimate is

$$
\mathrm{Var}(\hat{x}) = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2.
$$

Substitute the constraint $w_2 = 1 - w_1$, take the derivative with respect to $w_1$, and set it to zero. The result is the **inverse-variance** weighting:

$$
w_1 = \frac{1/\sigma_1^2}{1/\sigma_1^2 + 1/\sigma_2^2}, \qquad
w_2 = \frac{1/\sigma_2^2}{1/\sigma_1^2 + 1/\sigma_2^2}.
$$

Lower-variance sensors get larger weights. The optimal fused estimate becomes

$$
\hat{x}_{\text{opt}}
= \frac{\dfrac{z_1}{\sigma_1^2} + \dfrac{z_2}{\sigma_2^2}}
       {\dfrac{1}{\sigma_1^2} + \dfrac{1}{\sigma_2^2}}.
$$

Substituting the optimal weights back into the variance gives the **fused variance**:

$$
\sigma_{\text{opt}}^2
= \left( \frac{1}{\sigma_1^2} + \frac{1}{\sigma_2^2} \right)^{-1}.
$$

That formula is so important it deserves its own callout.

:::{admonition} Key Concept
:class: key-concept

Optimal fusion adds **inverse variances**. The fused variance is always smaller than the smaller of $\sigma_1^2$ and $\sigma_2^2$. You cannot lose information by adding an independent sensor; you can only refuse to use it.
:::

![Three Gaussians: a wide green sensor 1, a slightly narrower rust sensor 2, and a navy fused estimate that is narrower than both.](_media/fusion_concept.png)

*Fig. 1.1. The optimal fused estimate is centered between the two sensor means, weighted by their inverse variances. Its variance is smaller than either input variance.*

## The Information Interpretation

There is a clean way to remember this. Define the **information** carried by a measurement as the inverse of its variance:

$$
I = \frac{1}{\sigma^2}.
$$

Then optimal fusion just adds information:

$$
I_{\text{total}} = I_1 + I_2.
$$

Each independent sensor contributes information; total information is the sum. This generalizes immediately to $n$ sensors, and it is the conceptual foundation of every Kalman-filter measurement update you will see in the rest of the course.

::::{admonition} Quick Exercise
:class: quick-exercise

Two sensors measure the same quantity:

$$
z_1 = 100, \quad \sigma_1 = 2, \qquad z_2 = 110, \quad \sigma_2 = 6.
$$

Use inverse-variance weighting to compute the optimal fused estimate $\hat{x}_{\text{opt}}$ and the fused standard deviation $\sigma_{\text{opt}}$.

:::{admonition} Solution
:class: dropdown

Information from each sensor:

$$
I_1 = \frac{1}{2^2} = 0.25, \qquad I_2 = \frac{1}{6^2} = 0.0278.
$$

Total information $I_{\text{total}} = 0.2778$. So $\sigma_{\text{opt}}^2 = 1/0.2778 \approx 3.6$, giving $\sigma_{\text{opt}} \approx 1.897$.

The fused estimate is

$$
\hat{x}_{\text{opt}} = \frac{0.25 \cdot 100 + 0.0278 \cdot 110}{0.2778}
\approx \frac{25 + 3.06}{0.2778} \approx 101.0.
$$

Sensor 1 dominates: it carries about nine times more information than sensor 2, so the fused estimate is pulled strongly toward $z_1 = 100$.
:::

::::

## Optimal Fusion Demo

The **Optimal Fusion demo** in this block lets you fuse a precise radar altimeter ($\sigma_R = 2$ m by default) with a noisier barometric altimeter ($\sigma_B = 10$ m) and watch the fused PDF tighten as you slide the fusion weight $w_R$ toward the inverse-variance optimum. A variance-vs-weight curve underneath traces $\sigma_{\text{fused}}(w_R)$ across all possible weight choices; its minimum is exactly the optimal weight, and moving away from that minimum visibly inflates the fused $\sigma$. Sliders for $\sigma_R$, $\sigma_B$, and per-sensor biases let you explore how the optimal weight shifts and how systematic offsets propagate through fusion (they do; variance reduction does not protect against biased sensors). A MATLAB implementation is also included at `code/OptimalFusionDemo.m`.

## From Fusion to Kalman Filtering

Static fusion assumes all measurements arrive at once. Real navigation systems process measurements **sequentially over time**. At each step you have

- A prediction $\hat{x}_k^-$ with uncertainty $P_k^-$ (your prior, propagated forward from the last update).
- A new measurement $z_k$ with uncertainty $R_k$.

Treat the two like the two sensors above and the optimal fusion formula gives you

$$
\hat{x}_k^+ = \hat{x}_k^- + K_k \left(z_k - \hat{x}_k^-\right),
$$

where the **Kalman gain** $K_k$ plays the role of the optimal weight on the measurement:

$$
K_k = \frac{P_k^-}{P_k^- + R_k}.
$$

When $P_k^- \gg R_k$ (you trust the measurement much more than your prediction), $K_k \to 1$ and you snap the estimate onto the measurement. When $R_k \gg P_k^-$ (your prediction is much better than the measurement), $K_k \to 0$ and you ignore it. Everywhere in between you blend the two with the optimal trust factor.

We will derive this rigorously and add the covariance update in Block 4. For now the take-away is that the Kalman filter is not a separate piece of math; it is the optimal-fusion result, applied recursively as time progresses.

## Wrap-Up

Multiple sensors of the same quantity can always be fused to produce an estimate with lower variance than either input. The optimal weights are inverse variances. The fused variance is the inverse of the sum of inverse variances, equivalently the sum of information. That formula plus a recursion in time is the entire Kalman filter, which is the topic of the next block.