Poster
in
Workshop: Machine Learning and the Physical Sciences
Stronger symbolic summary statistics for the LHC
Nathalie Soybelman · Anja Butter · Tilman Plehn · Johann Brehmer
Analyzing the high-dimensional data collected at the Large Hadron Collider experiments often requires a balance between maximizing sensitivity and maintaining interpretability by domain experts. We propose a new algorithm to construct powerful summary statistics for LHC processes in the form of simple symbolic expressions. First, we extract latent information from a chain of simulators; through symbolic regression on this data we then learn approximately sufficient statistics. Observables constructed in this way can be used as plug-in replacements for established summary statistics, potentially improving the precision of scientific results without adding any overhead. In Higgs production in weak boson fusion, our algorithm rediscovers well-known heuristics and proposes new, moderately complex formulas that rival the new physics reach of neural networks.