By CMS Collaboration

A new CMS paper describes how the CMS experiment can squeeze more physics out of the LHC data by using smarter data selection/reduction techniques.

The Large Hadron Collider produces an avalanche of data; the beams collide 40 million times per second, and it is unfortunately impossible to save every collision. This massive data volume requires that physicists need to give priority to certain particles, such as the Higgs particle. Selecting these interesting collisions means that some signatures are much more likely to be saved than those containing other particles, even when they are much more frequently produced but maybe not as attractive. Many factors determine the decision to keep or reject a collision. Still, in the end, the experiment can only save around 1000 collisions every second using the selection method that CMS was designed with, called the trigger. The trigger method means that for every collision saved, about 40,000 are thrown away and never investigated. But what if the next LHC discovery is hiding in those other 39,999 events? Or if physicists want to measure the properties of standard model particles mainly present in that rejected data? This is where the techniques described in a new CMS paper come in.

Graphic representing the scouting and parking techniques

Figure: Collision events are recorded by CMS using a multi-step selection in the trigger system that filters events with properties that are interesting for analysis. The scouting and parking data streams can select more and different events, allowing to save more data while at the same time accessing physics signatures that were previously rejected by the trigger.

The solution for saving more collisions comes from thinking creatively about why so few collisions are saved. The most important limiting factors are that every collision contains so much information that we cannot process everything from raw detector data to analysis-quality data quickly in the first two days, and we also cannot afford to buy more storage capacity to save a lot more collisions. These two challenges were solved using different solutions, creatively called parking and scouting

The parking method does exactly what it promises: potentially interesting data is parked away at a safe location until there is time to run the conversion from raw data to analysis-ready data. That time is typically when the LHC is not colliding, as the computers are less busy then. In the last few years, the parking method has allowed the CMS experiment to save another few 1000 events per second. These extra collisions were mainly selected on signatures that are already well-known, and we would like to study more, for example, signatures coming from bound particles containing b quarks. This parked data allows CMS to look at the same types of particles that the LHCb experiment does, opening a whole new world of particles to study and providing competitive results. We also save other signatures, for example to search for particles not visible at the collision point and that only create a signal further along the detector.

Scouting tackles the problem in a different way. Since a rough first draft of the conversion to analysis-ready data is already present during the online selection, scouting saves only the necessary particle properties, like their energy, velocity, direction, and a few extra identifying details. Keeping only this reduced information instead of all detector information allows for a considerable reduction in the number of gigabytes per second that such data would typically fill up. Fortunately, the online selection is already excellent in the CMS experiment so that, for many analyses, the differences are negligible, between the traditional analysis and what can be done using only the trigger-level information. In recent years, scouting has allowed the recording of ten thousand or even more extra collisions every second, meaning that entirely unexplored collision types can be examined for many different signatures. Particularly, searches for undiscovered particles in signatures typically rejected by the tight trigger requirements have experienced massive gains from the scouting method when it was only partially available, and some physics results have already been released. With the new addition of all other commonly used particles, the scouting program is expected to produce many new results that were previously imagined impossible because of trigger constraints.  Also, the online selection for scouting was made substantially faster by adding GPU processors (the same chips used for gaming consoles), making it possible to save even more collisions.



The parking and scouting methods are complementary and make the CMS detector’s data collection methods ready for the next increase in LHC intensity. When the high-luminosity LHC upgrade is complete in 2028, the LHC will produce factors of ten or more interesting collisions every second, while the fraction of collisions that can be saved with conventional methods will stay almost the same. With scouting and parking, CMS is ready to continue exploring as much LHC data as is technically possible, for many years to come.

Read more about these results:

  • Do you like these briefings and want to get an email notification when there is a new one? Subscribe here
Date of publication