Reconstructing the Higgs on open data — identical to ROOT
2026-06-21 — the flagship: a real, multi-channel analysis (ROOT's df103 Higgs→ZZ→4ℓ) ported to nano.rust, run on CMS Open Data over HTTPS, reconstructing the 125 GeV Higgs peak — and matching ROOT's own result bit-for-bit.
The earlier demos showed one cut or one spectrum. This is the real thing: a
complete analysis with three decay channels, lepton selection with track-quality
cuts, Z-candidate combinatorics, mass windows, and a four-lepton invariant mass —
faithfully ported from ROOT's df103_NanoAODHiggsAnalysis tutorial and run on the
public CMS Open Data 2012 samples.

The four-lepton invariant mass on the simulated signal — the Higgs at 125 GeV,
plotted with kuva straight from the Rust
analysis (--plot).
(No player? Raw cast: demo-higgs.cast.)
This analysis is now spec-driven — the framework's promise, demonstrated. The selection + reconstruction below is generated from a spec, not hand-written: a physicist writes
higgs4l*.toml(objects, cuts, Z-candidate reconstruction, masses);nano-specvalidates it and codegens anano-analysistypestate kernel; the Rust compiler is the gate. The generated kernel is bit-identical to the hand-written df103 reference on synthetic events for all three channels (interpret == codegentoo), and reproduces the exact open-data counts == ROOT (4μ=9115, 4e=5528, 2e2μ=12065, total 26708, peak 23370). The hand-writtenhiggs4l_opendata.rsis now just the golden reference, not the shipped analysis. Honest caveat: it currently takes one spec per channel (the region model is conjunctive — a single multi-channel spec needs a union producer, the next enhancement). The muon channel shows the same chain on a simpler analysis.
The analysis
For each of the 4μ, 4e, and 2e2μ channels (examples/higgs4l_opendata.rs):
- Select leptons —
nLeptonrequirements, η/pt kinematics, isolation, and track quality: the 3-D impact-parameter significancesip3d < 4,|dxy|,|dz|, plus opposite-charge balance. - Reconstruct two Z bosons — form the opposite-charge same-flavor pair whose
mass is closest to the Z (
reco_zz_to_4l); the remaining pair is the second Z. Apply ΔR separation and the mass windows (Z₁ ∈ [40,120], Z₂ ∈ [12,120] GeV). - Reconstruct the Higgs — the invariant mass of the four selected leptons.
Each step maps directly to a df103 stage — and the physicist writes them as a spec, reviewed as physics, not as Rust.
The spec that generates the analysis
This is the whole 4μ analysis — crates/nano-spec/examples/higgs4l.toml. There is
no hand-written event loop behind it: nano-spec validates this and codegens
the typed kernel, which is bit-identical to the df103 reference and to ROOT.
[objects.good_muon]
source = "Muon"
[derived.z1] # Z1 = opposite-charge pair nearest m_Z
kind = "pair"; object = "good_muon"
constraints = ["opposite_charge"]
selection = "nearest_mass_truncated"; target = "91.2 GeV"
[derived.z2] # Z2 = best remaining pair
kind = "pair"; object = "good_muon"
constraints = ["opposite_charge"]
selection = "leading_pt"; exclude = ["z1"]
[derived.h] # H = Z1 + Z2
kind = "combine"; items = ["z1", "z2"]
[regions.signal]
require = [
"count(good_muon) == 4",
"all(good_muon, pt > 5 GeV)",
"all(good_muon, abs(eta) < 2.4)",
"all(good_muon, abs(pfRelIso04_all) < 0.40)",
"all(good_muon, sqrt(dxy*dxy + dz*dz) / sqrt(dxyErr*dxyErr + dzErr*dzErr) < 4.0)",
"count(good_muon, charge == 1) == 2",
"count(good_muon, charge == -1) == 2",
"closest_mass(z1, z2, 91.2 GeV) > 40 GeV", # Z1 window
"other_mass(z1, z2, 91.2 GeV) > 12 GeV", # Z2 window
]
[[outputs]]
name = "h_mass"
expr = "h.mass"
A physicist reviews that — sip3d, the Z windows, the pairing rule — and the
compiler guarantees the generated kernel implements it. (4e and 2e2μ are
sibling specs; a single multi-channel spec is the next enhancement.)
The config that steers it
The physics knobs live in one TOML — configs/higgs4l.toml — not buried in the
code. A physicist reviews and edits this; the combinatoric kernel just runs it
(numbers stay bit-identical):
luminosity = 11580.0 # pb^-1 (11.6 fb^-1)
[selection.muon] # per-flavour lepton selection
min_pt = 5.0
max_abs_eta = 2.4
max_pf_rel_iso04_all = 0.40
max_sip3d = 4.0 # 3-D impact-parameter significance
max_abs_dxy = 0.5
max_abs_dz = 1.0
[zcandidates] # Z reconstruction windows
z_reference_mass = 91.2
z1_mass_min = 40.0
z1_mass_max = 120.0
z2_mass_min = 12.0
z2_mass_max = 120.0
[histogram]
bins = 36
range = [70.0, 180.0]
[[sample]] # luminosity-weighted samples for the stack
name = "SMHiggsToZZTo4L"
role = "signal"
channels = ["4mu", "4e", "2e2mu"]
xsec = 0.0065
nevt = 299973.0
scale = 1.0
[[sample]]
name = "ZZTo4mu"
role = "background"
channels = ["4mu"]
xsec = 0.077
nevt = 1499064.0
scale = 1.386 # ZZ k-factor
# … ZZTo4e / ZZTo2e2mu, and the Run2012 DoubleMu/DoubleElectron data samples …
Run it against any config with --config; the default is the file above. The
electron block, the data samples, and the per-channel mixed-flavour pt cuts are
in the same file — that's the entire physics surface a reviewer needs, separate
from the implementation.
Identical to ROOT — bit for bit
The point of porting ROOT's tutorial: we can check against ROOT itself. Running
ROOT's df103 (scripts/higgs4l_root_crosscheck.sh) on the same skimmed signal,
every number matches:
| quantity | nano.rust (HTTPS) | ROOT (xrootd) |
|---|---|---|
| total selected 4ℓ | 26,708 | 26,708 |
| 4μ / 4e / 2e2μ | 9115 / 5528 / 12065 | 9115 / 5528 / 12065 |
| 120–130 GeV (Higgs peak) | 23,370 | 23,370 |
| 110–120 GeV | 2080 | 2080 |
| 130–140 GeV | 647 | 647 |
Getting from "agrees to 0.01%" to identical meant matching ROOT's arithmetic
precisely: the impact-parameter significance (ip3d/sip3d) is computed in
the same float precision ROOT's RVecF uses, so the sip3d < 4 cut flips
identically on the handful of boundary events; the invariant-mass and Z-pairing
arithmetic match ROOT's. A golden test now asserts these exact counts, so any
future drift fails CI.
The full picture: signal + background + data
The plot above is the simulated signal alone. The real df103 result stacks the
luminosity-weighted signal and ZZ background and overlays the 2012 data — the
actual discovery plot. examples/higgs4l_stack_opendata reads all eight skimmed
open-data samples over HTTPS, weights each by lumi·σ/N (lumi = 11.6 fb⁻¹), and
fills 36 bins over m(4ℓ):

The ZZ continuum and Z peak sit at low mass, the m_H = 125 GeV signal bump rises above the background, and the data points track them — the four-lepton Higgs excess, from public data, in pure Rust. Totals: signal 6.70, background 62.0, data 82. Against ROOT's df103 the agreement is to f64 precision (~12 significant figures; data exact, signal per-bin identical, the background sum differing only at ~1e-12 from summation order).
Why this matters
This is the whole thesis, end to end, on a real analysis:
- a complicated analysis (three channels, track quality, Z combinatorics, mass windows, m4ℓ) — not a toy cut;
- on public, reproducible data, read remotely on demand in pure Rust (no ROOT, no download — ~6 MB fetched for the skimmed signal);
- validated against the reference implementation (ROOT) on its own tutorial, bit-for-bit;
- with a publication-style plot generated in-process (kuva).
The same typed I/O and event model that power this also power the spec → kernel → workflow pipeline — so the framework's foundation is now proven against ROOT on an analysis that reconstructs the Higgs boson.
Reproduce it
$ cargo run -p nano-io --example higgs4l_opendata --features full -- \
"https://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod_skimmed/SMHiggsToZZTo4L.root" \
--insecure --plot higgs.svg