Reconstructing the Higgs on open data — identical to ROOT

2026-06-21 — the flagship: a real, multi-channel analysis (ROOT's df103 Higgs→ZZ→4ℓ) ported to nano.rust, run on CMS Open Data over HTTPS, reconstructing the 125 GeV Higgs peak — and matching ROOT's own result bit-for-bit.

The earlier demos showed one cut or one spectrum. This is the real thing: a complete analysis with three decay channels, lepton selection with track-quality cuts, Z-candidate combinatorics, mass windows, and a four-lepton invariant mass — faithfully ported from ROOT's df103_NanoAODHiggsAnalysis tutorial and run on the public CMS Open Data 2012 samples.

Higgs → ZZ → 4ℓ four-lepton mass, reconstructed by nano.rust from CMS Open Data — the peak at 125 GeV

The four-lepton invariant mass on the simulated signal — the Higgs at 125 GeV, plotted with kuva straight from the Rust analysis (--plot).

(No player? Raw cast: demo-higgs.cast.)

This analysis is now spec-driven — the framework's promise, demonstrated. The selection + reconstruction below is generated from a spec, not hand-written: a physicist writes higgs4l*.toml (objects, cuts, Z-candidate reconstruction, masses); nano-spec validates it and codegens a nano-analysis typestate kernel; the Rust compiler is the gate. The generated kernel is bit-identical to the hand-written df103 reference on synthetic events for all three channels (interpret == codegen too), and reproduces the exact open-data counts == ROOT (4μ=9115, 4e=5528, 2e2μ=12065, total 26708, peak 23370). The hand-written higgs4l_opendata.rs is now just the golden reference, not the shipped analysis. Honest caveat: it currently takes one spec per channel (the region model is conjunctive — a single multi-channel spec needs a union producer, the next enhancement). The muon channel shows the same chain on a simpler analysis.

The analysis

For each of the 4μ, 4e, and 2e2μ channels (examples/higgs4l_opendata.rs):

Select leptons — nLepton requirements, η/pt kinematics, isolation, and track quality: the 3-D impact-parameter significance sip3d < 4, |dxy|, |dz|, plus opposite-charge balance.
Reconstruct two Z bosons — form the opposite-charge same-flavor pair whose mass is closest to the Z (reco_zz_to_4l); the remaining pair is the second Z. Apply ΔR separation and the mass windows (Z₁ ∈ [40,120], Z₂ ∈ [12,120] GeV).
Reconstruct the Higgs — the invariant mass of the four selected leptons.

Each step maps directly to a df103 stage — and the physicist writes them as a spec, reviewed as physics, not as Rust.

The spec that generates the analysis

This is the whole 4μ analysis — crates/nano-spec/examples/higgs4l.toml. There is no hand-written event loop behind it: nano-spec validates this and codegens the typed kernel, which is bit-identical to the df103 reference and to ROOT.

[objects.good_muon]
source = "Muon"

[derived.z1]                              # Z1 = opposite-charge pair nearest m_Z
kind = "pair"; object = "good_muon"
constraints = ["opposite_charge"]
selection = "nearest_mass_truncated"; target = "91.2 GeV"

[derived.z2]                              # Z2 = best remaining pair
kind = "pair"; object = "good_muon"
constraints = ["opposite_charge"]
selection = "leading_pt"; exclude = ["z1"]

[derived.h]                               # H = Z1 + Z2
kind = "combine"; items = ["z1", "z2"]

[regions.signal]
require = [
  "count(good_muon) == 4",
  "all(good_muon, pt > 5 GeV)",
  "all(good_muon, abs(eta) < 2.4)",
  "all(good_muon, abs(pfRelIso04_all) < 0.40)",
  "all(good_muon, sqrt(dxy*dxy + dz*dz) / sqrt(dxyErr*dxyErr + dzErr*dzErr) < 4.0)",
  "count(good_muon, charge == 1) == 2",
  "count(good_muon, charge == -1) == 2",
  "closest_mass(z1, z2, 91.2 GeV) > 40 GeV",   # Z1 window
  "other_mass(z1, z2, 91.2 GeV)  > 12 GeV",    # Z2 window
]

[[outputs]]
name = "h_mass"
expr = "h.mass"

A physicist reviews that — sip3d, the Z windows, the pairing rule — and the compiler guarantees the generated kernel implements it. (4e and 2e2μ are sibling specs; a single multi-channel spec is the next enhancement.)

The config that steers it

The physics knobs live in one TOML — configs/higgs4l.toml — not buried in the code. A physicist reviews and edits this; the combinatoric kernel just runs it (numbers stay bit-identical):

luminosity = 11580.0          # pb^-1 (11.6 fb^-1)

[selection.muon]              # per-flavour lepton selection
min_pt = 5.0
max_abs_eta = 2.4
max_pf_rel_iso04_all = 0.40
max_sip3d = 4.0               # 3-D impact-parameter significance
max_abs_dxy = 0.5
max_abs_dz = 1.0

[zcandidates]                 # Z reconstruction windows
z_reference_mass = 91.2
z1_mass_min = 40.0
z1_mass_max = 120.0
z2_mass_min = 12.0
z2_mass_max = 120.0

[histogram]
bins = 36
range = [70.0, 180.0]

[[sample]]                    # luminosity-weighted samples for the stack
name = "SMHiggsToZZTo4L"
role = "signal"
channels = ["4mu", "4e", "2e2mu"]
xsec = 0.0065
nevt = 299973.0
scale = 1.0

[[sample]]
name = "ZZTo4mu"
role = "background"
channels = ["4mu"]
xsec = 0.077
nevt = 1499064.0
scale = 1.386                 # ZZ k-factor
# … ZZTo4e / ZZTo2e2mu, and the Run2012 DoubleMu/DoubleElectron data samples …

Run it against any config with --config; the default is the file above. The electron block, the data samples, and the per-channel mixed-flavour pt cuts are in the same file — that's the entire physics surface a reviewer needs, separate from the implementation.

Identical to ROOT — bit for bit

The point of porting ROOT's tutorial: we can check against ROOT itself. Running ROOT's df103 (scripts/higgs4l_root_crosscheck.sh) on the same skimmed signal, every number matches:

quantity	nano.rust (HTTPS)	ROOT (xrootd)
total selected 4ℓ	26,708	26,708
4μ / 4e / 2e2μ	9115 / 5528 / 12065	9115 / 5528 / 12065
120–130 GeV (Higgs peak)	23,370	23,370
110–120 GeV	2080	2080
130–140 GeV	647	647

Getting from "agrees to 0.01%" to identical meant matching ROOT's arithmetic precisely: the impact-parameter significance (ip3d/sip3d) is computed in the same float precision ROOT's RVecF uses, so the sip3d < 4 cut flips identically on the handful of boundary events; the invariant-mass and Z-pairing arithmetic match ROOT's. A golden test now asserts these exact counts, so any future drift fails CI.

The full picture: signal + background + data

The plot above is the simulated signal alone. The real df103 result stacks the luminosity-weighted signal and ZZ background and overlays the 2012 data — the actual discovery plot. examples/higgs4l_stack_opendata reads all eight skimmed open-data samples over HTTPS, weights each by lumi·σ/N (lumi = 11.6 fb⁻¹), and fills 36 bins over m(4ℓ):

CMS Open Data H→ZZ→4ℓ: ZZ background, the m_H=125 signal stacked, and 2012 data — the Higgs discovery plot, reconstructed by nano.rust

The ZZ continuum and Z peak sit at low mass, the m_H = 125 GeV signal bump rises above the background, and the data points track them — the four-lepton Higgs excess, from public data, in pure Rust. Totals: signal 6.70, background 62.0, data 82. Against ROOT's df103 the agreement is to f64 precision (~12 significant figures; data exact, signal per-bin identical, the background sum differing only at ~1e-12 from summation order).

Why this matters

This is the whole thesis, end to end, on a real analysis:

a complicated analysis (three channels, track quality, Z combinatorics, mass windows, m4ℓ) — not a toy cut;
on public, reproducible data, read remotely on demand in pure Rust (no ROOT, no download — ~6 MB fetched for the skimmed signal);
validated against the reference implementation (ROOT) on its own tutorial, bit-for-bit;
with a publication-style plot generated in-process (kuva).

The same typed I/O and event model that power this also power the spec → kernel → workflow pipeline — so the framework's foundation is now proven against ROOT on an analysis that reconstructs the Higgs boson.

Reproduce it

$ cargo run -p nano-io --example higgs4l_opendata --features full -- \
    "https://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod_skimmed/SMHiggsToZZTo4L.root" \
    --insecure --plot higgs.svg