Teaching CMS to trace particles

By CMS Collaboration

Display of a CMS candidate event reconstructed using machine learning.

An interactive version of the above event display can be found at this page.

For the first time, the CMS experiment has demonstrated that machine learning can be used to reconstruct collisions at the LHC. This new approach improves the precision compared to traditional methods—whilst also running much faster—helping physicists better understand the data collected at the LHC.

Each proton–proton collision at the LHC is like a microscopic explosion, spraying out a complex pattern of particles that must be carefully reconstructed before physicists can study what really happened. Doing this accurately—and fast—is a significant challenge.

At the heart of this task in the CMS experiment is the particle-flow (PF) algorithm. For more than a decade, PF has acted as the experiment’s “translator”, combining information from different particle detectors—such as the tracking detector, electromagnetic and hadron calorimeters, and muon detectors—to identify each particle produced in a collision. The method works remarkably well, but it relies on a long chain of hand-crafted rules designed by physicists.

Now, for the first time, CMS has shown that this task can be done in a fundamentally different way, using machine learning.

From hand-written rules to learned intuition

Schematic for MLPF.

Above: From detector signals to particles—a simplified view of how MLPF works.

The new machine-learning-based particle-flow (MLPF) algorithm replaces much of the rigid hand-crafted logic with a single machine learning model trained directly on simulated collisions. Instead of being told how to reconstruct particles, the algorithm learns how particles look in the detector—much like how humans learn to recognize faces without memorizing explicit rules.

Once trained, MLPF reconstructs the entire collision in one go.

Despite the different approach, MLPF performs just as well as the traditional PF algorithm—and in some cases, even better.

In simulated top quark events, MLPF improves the precision with which jets are reconstructed by 10–20% in key momentum ranges. This can directly benefit many CMS measurements, from precision Standard Model tests to searches for new particles.

The most striking feature is that MLPF reconstructs a full collision much faster than the traditional PF algorithm. This is possible because MLPF can run efficiently on modern GPUs, whereas the traditional algorithm is typically limited to CPUs.

Result of MLPF.

Above: Jet energy resolution for two reconstruction methods. Lower values indicate better agreement with the true jet energy. Animation: Haeun Kim.

Why does this matter?

“Machine learning helps disentangle the complex correlations of physics signals”, says Ka Wa Ho, an MLPF developer working on benchmarking the performance. “By showing that MLPF can match traditional methods in performance, it lays the foundation for more advanced models to reconstruct previously elusive signals and fully exploit the detector’s capabilities.”

“By learning directly from simulated collisions, MLPF reduces human bias in reconstruction and lets the data speak for itself”, says Eric Wulff, an MLPF developer working on model training and optimization. “This makes it possible to quickly adapt to new detector conditions or geometries, enables faster iteration, and ultimately delivers a clearer picture of the underlying physics.”

“Ultimately, our goal is to be able to get more information with less cost out of the experimental data”, says Joosep Pata, lead developer of MLPF. “ML-based data reconstruction has the potential to improve the accuracy while allowing the use of high-performance computing directly for some of the most complex parts of data processing.”

Looking ahead

As LHC data rates grow, machine learning offers a new way to meet the challenges of the High-Luminosity LHC and future colliders such as the Future Circular Collider.

By teaching detectors to learn from data directly, physicists are not just improving performance—they are redefining what’s possible in experimental particle physics.

To learn about this? Read our paper linked below and watch out for the upcoming seminar https://indico.cern.ch/event/1632943/.

Written by: Farouk Mokhtar, for the CMS Collaboration
Edited by: Muhammad Ansar Iqbal

Read more about these results: