Should you get excited by your data? Let the Look-Elsewhere Effect decide

Is there a significant bump in this mass spectrum? Experimental physicists may fall prey of the occasional random fluctuations.

A dangerous beast is hiding in today’s searches for the rare signs of new physics — or even “old” physics, such as the Higgs boson — at the Large Hadron Collider. It is called “Look-Elsewhere Effect”, LEE for insiders. What is it, and why should you care? Imagine you look for a heavy particle decaying into a pair of hadronic jets: a commonplace test case in high-energy physics. You have your background model, which predicts the observable shape of the di-jet mass distribution, and you know what kind of a bump a new particle signal would produce in that shape. So you search for such a bump in the data, but — not knowing where it might appear — you search everywhere. You have worked all day, and the night is nearing; you prepare yourself a martini and spin your analysis program. To your amazement, the program finds a significant bump at some particular mass value: is it a real signal? To claim it is a new signal, the effect must reach or exceed the “five-sigma” significance level, five standard deviations away from the expectation: that's a rather silly but well-established rule. But if yours gives only 3.5 or four sigma, are you allowed to get excited and wake up your boss, or should you sit back and sip your martini, with an “I know better” grin on your face? I claim the latter is a better option. You have fallen prey to the LEE: you looked in many places for a possible signal, and found a significant effect somewhere; the likelihood of finding something in the entire region you probed is greater than it would be if you had stated beforehand where the signal would be, because of the “probability boost” of looking in many places. A good rule of thumb is the following: if your signal has a width W, and if you examined a spectrum spanning a mass range from M1 to M2, then the “boost factor” due to the LEE is (M2-M1)/W. This may easily amount to a factor of 10 or 100, depending on the details of your search. An effect occurring by chance once in ten thousand cases in a given place on your spectrum may actually be just an unexciting one-in-a-hundred fluctuation! In fact, the “five-sigma” rule I mentioned above was conceived with exactly this particular effect in mind. Five sigma is a really, really rare occurrence (three in ten million), so even including the LEE, it is still something to take quite seriously. Now-a-days, we publish three-sigma results that may or may not go away with more data, and our scientific integrity requires us to account for the LEE. This is actually less easy to do than by just multiplying a probability by (M2-M1)/W as in the example above: in complex searches such as that for the Higgs boson, which combine bump hunts in many channels, this is not a trivial task. A recent paper by E. Gross and O. Vitells (Eur. Phys. J. C70:525-530,2010) has clarified some of the technical issues. The searches for the Higgs boson by CMS and ATLAS are sizing up the LEE by studying the probability of the background-only hypothesis as a function of the Higgs mass: the more the observed p-value distribution varies up and down as the signal mass hypothesis change, the stronger is the Look-Elsewhere-Effect correction that is required. Stuff for experts, for sure. But outsiders have better be aware that a three-sigma effect should not be blindly dubbed “evidence” for something new in the data. As travellers to a foreign country whose tax habits are unknown, you better ask before you buy, “LEE included or not”? — Submitted by Tommaso Dorigo

Tags / keywords

statistics