# Block 9 — Statistical Test Planning for the F-47 ANS

By the end of this block you should be able to:

1. Explain why correlated time-series data require **effective sample size** $N_{\text{eff}}$, not the raw sample count $N$.
2. Estimate $T_{\text{corr}}$ from an autocorrelation function and apply the **10% rule of thumb**.
3. Derive $N_{\text{eff}}$ from first principles and plan a statistically defensible test duration.
4. Apply the **95th-percentile ECDF** for accuracy requirements (rather than a CI on the mean).
5. Estimate inertial **drift rate** with a 95% confidence interval on the slope.
6. Infer **fault detection time** from navigation-error behavior when no internal fault flag is available.

This block is the **methods** half of the F-47 ANS capstone. The Project section in the sidebar contains the operational handout (datasets, deliverables, rubric); this reading is the math and statistics that justify the analysis approach.

## The Problem with Correlated Data

A 15-minute AllSource run sampled at 10 Hz yields $N = 9{,}000$ samples. The tempting conclusion is that any 95% confidence interval on the mean error scales as

$$
\text{CI width} \propto \frac{\sigma}{\sqrt{9{,}000}} \approx 0.011\,\sigma,
$$

i.e. you have an essentially perfect estimate of the mean. In practice this is wrong, and dangerously so, because consecutive error samples are not independent. Sample $e_1$ at $t = 0.1$ s and sample $e_2$ at $t = 0.2$ s are essentially the same number — drawn from the same realization of a navigation filter that has memory measured in seconds, not 0.1 s. Treating them as independent inflates your confidence by orders of magnitude, and the resulting test plan can fail an audit immediately.

:::{admonition} Key Concept
:class: key-concept

The number of samples a time series gives you is not the number of *independent* samples it gives you. Correlated data require a discount: the **effective sample size** $N_{\text{eff}}$ is what controls the width of any honest confidence interval.
:::

## The Autocorrelation Function

The right way to quantify the discount is the autocorrelation function (ACF):

$$
R(\tau) = \mathbb{E}\big[e(t)\,e(t + \tau)\big].
$$

For navigation errors driven by an AR(1)-like process — which is a good approximation for a tightly-coupled GPS/INS filter in steady state — the ACF decays exponentially:

$$
R(\tau) = \sigma^2 \, e^{-|\tau| / T_{\text{corr}}}.
$$

The parameter $T_{\text{corr}}$ is the **correlation time**: the timescale of the navigation error's "memory". At $\tau = 0$ the ACF is the variance $\sigma^2$. At $\tau = T_{\text{corr}}$ it has fallen to $\sigma^2 / e \approx 0.37\sigma^2$. Beyond a few correlation times the ACF is essentially zero.

![Exponential decay of the ACF with markers at the 1/e point (T_corr) and the 10% point.](_media/acf_decay.png)

*Fig. 1.1. Exponential ACF of an AR(1) navigation error. The orange marker at $T_{\text{corr}}$ is where the ACF falls to $1/e$. The red marker is the 10%-rule decimation point at $\tau = T_{\text{corr}} \ln 10 \approx 2.30\,T_{\text{corr}}$, which is the heuristic some test reports use to declare independence between samples.*

The HITL campaign on the F-47 ANS reports a steady-state horizontal-error correlation time of approximately **15 s**. That number is the foundation of every test-duration calculation in the project.

## The 10% Rule of Thumb

A common quick estimate of "when have samples become effectively independent?" is the **10% rule**: take the ACF down to 10% of its peak and use that lag as a decimation interval. For an exponential ACF this corresponds to

$$
L \cdot dt = T_{\text{corr}} \cdot \ln 10 \approx 2.30\, T_{\text{corr}}.
$$

The standard variance-of-the-mean derivation (which we'll do next) gives a closely related result with a slightly different threshold:

| Method | Implicit ACF threshold | Difference vs. standard |
| --- | --- | --- |
| Standard (sum of ACF terms) | $\rho = e^{-2} \approx 13.5\%$ | (reference) |
| 10% decimation rule | $\rho = 10.0\%$ | $-13\%$ on $N_{\text{eff}}$ |

The 10% rule gives an $N_{\text{eff}}$ about 13% smaller than the rigorous formula. That is a conservative discount, which is the right way to err in test planning.

## The Effective Sample Size $N_{\text{eff}}$

Start from the variance of a sample mean over correlated data. For a stationary error with variance $\sigma^2$ and ACF $\rho(\tau)$,

$$
\mathrm{Var}\big(\bar{e}\big)
= \frac{\sigma^2}{N}
\left(1 + 2 \sum_{k=1}^{N-1} \big(1 - \tfrac{k}{N}\big)\,\rho(k\,dt)\right).
$$

For an exponential $\rho(\tau) = e^{-|\tau|/T_{\text{corr}}}$ and $N \gg T_{\text{corr}} / dt$, the sum collapses and you get the standard result

$$
\boxed{\,N_{\text{eff}} = \frac{T_{\text{total}}}{2\, T_{\text{corr}}}.\,}
$$

The factor of 2 has a clean physical interpretation: each sample is correlated with neighbors both forward and backward in time, so each "independent look" occupies a window of $2\,T_{\text{corr}}$ on the timeline. You can fit $T_{\text{total}} / (2 T_{\text{corr}})$ such windows into a run of duration $T_{\text{total}}$.

:::{admonition} Key Concept
:class: key-concept

For an exponentially-correlated signal with correlation time $T_{\text{corr}}$, the effective independent sample count is $N_{\text{eff}} = T_{\text{total}} / (2 T_{\text{corr}})$. The raw sample rate cancels out — sampling faster does not give you more independent information.
:::

## Test Duration Planning

Requirements 1 and 2 of the F-47 ANS capstone demand $N_{\text{eff}} \ge 300$. With $T_{\text{corr}} = 15$ s the minimum run duration is

$$
T_{\text{total}} \ge 2 \times 15 \times 300 = 9{,}000 \text{ s} \approx 150 \text{ min}.
$$

That is the floor. Anything shorter and your 95th-percentile ECDF estimate has a wider sampling distribution than the requirement is willing to accept.

| Run duration | $N$ at 10 Hz | $N_{\text{eff}}$ | Meets $N_{\text{eff}} \ge 300$? |
| --- | --- | --- | --- |
| 15 min | 9,000 | 30 | No |
| 60 min | 36,000 | 120 | No |
| 150 min | 90,000 | 300 | Yes |

Notice how the raw $N$ column is misleading. A 15-minute run looks like 9,000 "samples", but only 30 of them are independent. The test plan must request the longer run — and justify why — to produce a defensible accuracy assessment.

![Plot of N_eff versus T_total in minutes. The curve is linear; horizontal red line marks the N_eff = 300 requirement; vertical green line marks T_total = 150 min where they intersect.](_media/neff_planning.png)

*Fig. 2.1. $N_{\text{eff}}$ as a function of total run duration with $T_{\text{corr}} = 15$ s. The 300-sample floor projects onto a 150-minute minimum run. Any test plan below that line cannot defensibly claim $N_{\text{eff}} \ge 300$, no matter how high the sample rate.*

## 95th-Percentile ECDF for Accuracy

Requirements 1 and 2 read: "horizontal position error shall be $\le 1.0$ m (95% confidence)". There are two natural ways to interpret that, and they are not the same thing.

- **95th-percentile ECDF.** Find the value below which 95% of error samples fall. In MATLAB: `H95 = prctile(errH, 95)`. This is a tail property: it says "95% of the time the error is at most this large". That matches the operational reading of the requirement.
- **95% CI on the mean.** Tests whether the **average** error is below a bound. The requirement does not constrain the mean; a system with zero mean error and a fat tail can still violate the operational requirement. Using a CI on the mean would be fundamentally wrong.

The right tool is the empirical CDF. Compute the horizontal errors against truth, build the empirical CDF, read off the value where $F(e_H) = 0.95$, and compare to the requirement.

![Empirical CDF of horizontal error rising from 0 to 1, with a horizontal red dashed line at 0.95 and a vertical line dropping to e_{H,95} on the x-axis. A green dotted line shows the requirement bound; the e_{H,95} sits below it, labeled PASS.](_media/ecdf_compliance.png)

*Fig. 3.1. Empirical CDF of horizontal error with 95th-percentile readout. Compliance is the comparison $e_{H,95} \le \text{requirement}$. The plot also makes the sample-size dependence visible: with too few independent samples the ECDF curve is jagged near the 95% mark, which inflates the uncertainty on the readout.*

:::{admonition} Key Concept
:class: key-concept

Accuracy is a **tail** property. Use the 95th-percentile of the empirical CDF, not a CI on the mean. The right number of samples is what makes the percentile readout statistically credible — that is where $N_{\text{eff}}$ comes back in.
:::

## Inertial Drift with Confidence Interval

Requirement 3 is different. In inertial-only mode the velocity-bias random walk integrates into a position error that grows approximately linearly in time:

$$
e_H(t) = a + b\,t + \varepsilon(t).
$$

The drift rate is the slope $b$, and the requirement is that the drift rate be $\le 1.0$ NM/hr at the 95% upper confidence bound. The estimator is straightforward — ordinary least squares — and the trick is that the **upper bound** of the CI is what you compare to the requirement, not the point estimate.

In MATLAB,

```matlab
X    = [ones(N,1), t];
beta = X \ e_H;            % beta = [a_hat; b_hat]
yhat = X * beta;
res  = e_H - yhat;
sig2_hat = (res' * res) / (N - 2);
cov_beta = sig2_hat * inv(X' * X);
se_b     = sqrt(cov_beta(2,2));
tcrit    = tinv(0.975, N - 2);
b_lo     = beta(2) - tcrit * se_b;     % lower 95% bound on slope
b_hi     = beta(2) + tcrit * se_b;     % upper 95% bound on slope

b_hi_NMhr = b_hi * (3600 / 1852);      % convert m/s to NM/hr
```

Compliance: `b_hi_NMhr <= 1.0`. If the upper bound exceeds the requirement, the system fails — even if the point estimate is well below 1.0 NM/hr — because the test does not yet have enough evidence to rule out a drift rate above the bound.

![A simulated inertial drift trace with an OLS best-fit line and a shaded 95% CI band on the slope. Annotation: PASS or FAIL based on whether the upper CI on the slope is below 1 NM/hr.](_media/drift_ci.png)

*Fig. 4.1. OLS drift fit on a simulated 90-minute inertial run. The best-fit line (rust) gives the point estimate of the slope; the shaded band is the 95% CI on the slope, extrapolated linearly. Compliance compares the upper edge of the band to the 1.0 NM/hr requirement.*

The width of that CI band is what the autocorrelation discount controls. If you used the raw $N$ instead of $N_{\text{eff}}$ in the OLS standard error formula, the band would be unreasonably tight and the slope CI would not honestly reflect the available evidence.

## Inferring Detection Time Without an FTI Flag

Requirement 4 measures fault detection time and HMI exposure on 30 spoof events. The catch: the SPO has directed that DT use only production-representative sensors and avionics-bus data. There is no FTI tap into the integrity-monitoring algorithm, so **the detection time $t_D$ is not directly observable**. You have to infer it from the time history of the navigation error itself.

Recall what happens during a fault: error rises from baseline as the bias accumulates, peaks while the integrity monitor triggers an exclusion or downweighting, then "snaps back" toward the baseline once the corrective action takes effect. The peak-and-snapback algorithm formalizes that pattern:

1. Compute the **pre-fault baseline** $e_{\text{base}}$ from a window of duration $T_{\text{pre}}$ ending just before the known fault start time $t_0$. Use the median or a robust statistic to immunize against single-point outliers.
2. Find the **peak error** $e_{\text{peak}}$ in the post-fault search window.
3. Set the **snapback threshold** $e_{\text{thr}} = e_{\text{base}} + \alpha\,(e_{\text{peak}} - e_{\text{base}})$ with $\alpha = 0.25$. The system is considered "back to normal" once the error has shrunk to within 25% of the way from peak back down to baseline.
4. **Detection time** $t_D$ is the first time after the peak at which $e_H \le e_{\text{thr}}$ for at least 0.5 seconds (to immunize against measurement noise that briefly dips below threshold).

Once you have $t_D$, the time-to-detect is $T_D = t_D - t_0$ and the HMI exposure is the total time during $[t_0, t_D]$ for which the true error exceeds the protection level.

![Time history of e_H starting near baseline, rising linearly past a peak, then falling back to baseline. Markers indicate t0 (fault start), the peak, the snapback threshold, and t_D (detection time). A double-headed arrow labeled T_D spans from t0 to t_D.](_media/peak_snapback.png)

*Fig. 5.1. Peak-and-snapback inference of $t_D$. The threshold is at 25% of the way from baseline to peak. $t_D$ is the first time after the peak that the error stays below the threshold for at least 0.5 s. Time-to-detect is $T_D = t_D - t_0$.*

This algorithm is robust to a wide range of fault scenarios, but it is not magic. If the fault is small enough that the error never gets clearly above baseline, peak-and-snapback will report a very small $T_D$ — possibly because nothing meaningful happened. The accompanying check is that the HMI exposure must be non-zero, otherwise the event is excluded from the 30 used for the 27/30 statistic.

## The 27-of-30 Compliance Rule

Requirement 4 specifies a probabilistic compliance rule: at least 27 of 30 independent spoof events must satisfy $T_D \le 5$ s, $\text{HMI}_H \le 1$ s, and $\text{HMI}_V \le 1$ s. Three failures out of thirty are tolerated.

The 27/30 number isn't arbitrary. It encodes a 90% probability of pass, with each event treated as an independent Bernoulli trial. If the underlying per-event success probability is $p$, the probability of $\ge 27$ successes out of 30 is computed from the binomial distribution. With $p = 0.92$, that probability is roughly 0.81; with $p = 0.95$ it is about 0.94. So the 27/30 rule effectively requires the underlying per-event reliability to be in the 90% to 95% range.

The probabilistic structure also means the test is not "all 30 must pass". You can lose a few events to bad data, marginal HPL inflation, or edge cases, and still meet the requirement, as long as the system performs reliably on the bulk of events.

## Wrap-Up

The mathematical machinery for the project is contained in five ideas:

1. **Autocorrelation and $N_{\text{eff}}$.** Correlated data require a discount. With $T_{\text{corr}} = 15$ s, an honest 300-sample test demands a 150-minute run.
2. **The 10% rule** is a quick estimate of $T_{\text{corr}}$ that runs about 13% conservative compared to the standard ACF-sum derivation.
3. **95th-percentile ECDF** is the right tool for accuracy; CIs on the mean are not.
4. **OLS drift slope with a 95% CI on the upper bound** is the right tool for inertial drift; the upper bound, not the point estimate, is what gets compared to the requirement.
5. **Peak-and-snapback inference** is how you measure $t_D$ without an FTI tap; the 27/30 rule is how you turn the resulting per-event statistics into a single pass/fail compliance decision.

The Project section in the sidebar walks through the operational deliverables: datasets, analysis scripts, briefing rubric, and the project handout PDF. The math in this block is what justifies the analysis choices in those scripts.