Benchmark dashboard

Reliability-aware benchmark for routing assumptions.

This page compares schedule-only, realtime-snapshot, and robust assumptions on a deterministic candidate slice. It is meant to sit beside the atlas and research review as the operational comparison layer of the public site.

Deterministic candidate slice Schedule vs snapshot vs robust Decision-facing metrics
Rows evaluated
20
Current deterministic benchmark slice
Scheduled access
20
Reachable within threshold on schedule
Robust access
20
Reachable after reliability penalty is applied
Access loss
0
Cases pushed outside the threshold by uncertainty
Snapshot miss rate
0.0417
Average missed-transfer exposure under snapshot assumptions
Robust miss rate
0.0833
Average missed-transfer exposure under robust assumptions

Trade-off summary

This slice is still scaffold-scale, but it already shows how a single candidate set can support multiple decision assumptions in one page.

Snapshot regret
1.00 min
Average increase over schedule under realtime snapshot
Robust regret
1.00 min
Average increase over schedule under robust routing assumptions
Current takeaway: the benchmark is already useful as a communication layer. The next step is scale: a larger held-out evaluation window, more routes, and sharper schedule-versus-robust accessibility-loss evidence.
OD Line Mode Scheduled Snapshot Robust Snapshot miss Robust miss Loss flag
C_20260302T10_directCST21.022.022.00.0416670.0833330
C_20260302T10_transferCST25.026.026.00.0416670.0833330
C_20260302T11_directCST24.025.025.00.0416670.0833330
C_20260302T11_transferCST28.029.029.00.0416670.0833330
C_20260302T12_directCST27.028.028.00.0416670.0833330
C_20260302T12_transferCST31.032.032.00.0416670.0833330
C_20260302T13_directCST18.019.019.00.0416670.0833330
C_20260302T13_transferCST22.023.023.00.0416670.0833330
F_20260302T11_directFST21.022.022.00.0416670.0833330
F_20260302T11_transferFST25.026.026.00.0416670.0833330
F_20260302T12_directFST24.025.025.00.0416670.0833330
F_20260302T12_transferFST28.029.029.00.0416670.0833330

Artifacts: results/benchmark/latest/candidates.csv, results/benchmark/latest/comparison.csv, results/benchmark/latest/summary.md.