Particle Identification: b-tagging
Why b-tagging
What are jets?
Quarks and gluons are make up proton and neutrons and hold them together. However, we can never isolate a single quark, when we produce them at the Large Hadron Collider (LHC) they instead create a collimated sprays of particles or ``jets'', as illustrated on the right.
Identifying the type of quark that initiated a jet is crucial for understanding which physics processes are occurring in our collisions. ` The most common way for a Higgs boson to decay is into a pair of b-quarks, making b-jet identification the key ingredient to understand the Higgs boson.
Source: QM Diaries.
What's special about b-jets?
b-quarks have a ``long'' lifetime by particle physics standards ($10^{-12}$ seconds), which means they travel few mm before decaying into other particles (see right). The innermost layers detector closest to the collision point are most important for b-jet identification. Our workhorse b-tagging model considers these charged particle trajectories as the set of inputs to a transformer encoder architecture that classifies the jet's flavor.
Also, b-jets have broader showers compared to other types of quarks, and often will include include leptons (electrons and muons) from the b-hadron decay. Recently have extended the transformer inputs to also include more inputs to also leverage this information.
Below, I highlight my work in $b$-jet identifcation over the years, in reverse chronological order.
Carpe Datum: scaling up our models and datasets
ATLAS Collaboration. Carpe Datum: Scaling behavior of transformers for heavy hadron flavor identification ATL-SOFT-PUB-2026-002 (2026).
As a native of Dallas, TX, ``bigger is better'' is something of a cultural motto. Turns out, this fits right in with the ML mantra of today as well, as the recent wave of success of large language models is primarily driven by the size of the larger models and datasets. This work, we show that the same holds true for $b$-jet classification. We use neural scaling laws to predict these performance gains, allowing us to predict the performance of future, larger models without having to train them. Additionally, we can extrapolate to infinite data to estimate the performance limit for a fixed feature set.
This work doensn't only predict the performance of larger models, but verifies these predictions in practice. We produced the largest dataset ever for jet classification for particle physics, and verfied the predicted performance agreed with the neural scaling law prediction. This paves the way to next steps for building up a massive dataset for a foundation model for particle physics.
M. Vigl, NH, Michael Kagan, L. Heinrich Neural Scaling Laws for Boosted Jet Tagging 2602.15781, submitted to Foundation Models for Science ICLR workshop (2026).
Towards Multi-modal Inputs: GN3
ATLAS Collaboration. GN3: Multi-task, Multi-modal Transformers for Jet Flavour Tagging in ATLAS ATL-PHYS-PUB-2026-001 (2026).
In my PhD and early post-doc, the group used to developed custom deep learning models for each type of
particle ID task. For example, b-jet classification focused on the tracks in the innermost detector,
while quark vs. gluon classification focused on the energy deposits in the calorimeters.
In this work, we introduced the GN3 archiecture which now significantly increases the inputs
to include the caloritmer energy deposits and the lepton information, in addition to the tracks.
The improvement in performance was an impressive two-fold decrease in the mis-id rate, and the flexible input representations constitutes an important step forward toward a foundation model for jet physics.
Transformers for Particle identification
ATLAS Collaboration. Transforming Jet Flavour Tagging at ATLAS, Nature Commun. 17 (2026) 541 10.1038/s41467-025-65059-6.
First end-to-end track-based tagger (GN2, transformer) recommended for physics analyses. Contribution: I led the development of track-based taggers in my PhD with RNNs and Deep Set. All the innovations I introduced for the Deep Set (new variables, optimized track selection) propagated to this SOTA transformer. I also led the team as we finalized GN2.
Deep Sets
ATLAS Collaboration. Deep Sets based Neural Networks for Impact Parameter Flavour Tagging in ATLAS ATL-PHYS-PUB-2020-014 (2020).
Contribution: Developed a new Deep Sets-based tagger. Optimized selection resulted in 2x improvement in background rejection compared to the RNN.
Model the jet as a set
A deep set architecture is natively designed to handle unordered sets of inputs. In this work, we specialized this architecture for b-tagging, following the work of Komiske et. al. Since this network can parallelize the computation across the tracks in the jet, an added benefit was that the deep set tagger was 4x faster to train than the RNN. My additional optimizations included more input variables and a looser track quality selection, which improved the background rejection by a factor of two compared to the RNN tagger.
Recurrent Neural Networks (RNN)
ATLAS Collaboration. "ATLAS b-tagging algorithms for the LHC Run 2 dataset." Eur. Phys. J. C 83 (2023) 681 2211.16345 700 citations.
Contribution: Optimized Recurrent Neural Network (RNN) tagger: first time RNN was recommended for physics analyses. This new tagger resulted in a 10% improvement in the non-resonant HH→4b analysis.
Traditionally, b-tagging was done with hand-engineered features explicitly reconstructing the displaced vertex and the associated kinematics.
Model the jet as a sequence
The RNN is an ML-based approach that could automatically identify In this model, we think of the tracks in the jet similarly to the words in a sentence, and the RNN can natively handle the variable number of tracks in the jet and leverage the correlations between tracks to identify the b-jets.
The RNN is an ML-based approach that could automatically identify In this model, we think of the tracks in the jet similarly to the words in a sentence, and the RNN can natively handle the variable number of tracks in the jet and leverage the correlations between tracks to identify the b-jets.
I trained the first RNN-based b-tagger ready for physics analyses. This new RNN-based model had a 10% sensitivity improvement for the non-resonant HH→4b analysis.
Optimizations for Variable Radius Track Jets
ATLAS Collaboration. "Performance of 2019 recommendations of ATLAS Flavor Tagging algorithms with Variable Radius track jets" FTAG-2019-006 (2019).
For very energetic Higgs bosons, the b-decay products become so collimated that it becomes challenging to resolve the b decay products into two separate jets. A solution introduced by Krohn et. al of ``variable radius'' track jets lets the size of the jet adapt based on its estimated momentum. (Sketch on the right.)
I optimized the RNN-based b-tagger for this new jet collection, which resulted in a significant improvement for boosted Higgs analyses. The specific improvement for HH→4b is shown in the plot below... an up to 50\% improvement in the analysis sensitivity.