Disentangling quarks and gluons in CMS open data

We study quark and gluon jets separately using public collider data from the CMS experiment. Our analysis is based on 2 . 3 fb − 1 of proton-proton collisions at √ s = 7 TeV, collected at the Large Hadron Collider in 2011. We define two non-overlapping samples via a pseudorapidity cut—central jets with | η | ≤ 0 . 65 and forward jets with | η | > 0 . 65—and employ jet topic modeling to extract individual distributions for the maximally separable categories. Under certain assumptions, such as sample independence and mutual irreducibility, these categories correspond to “quark” and “gluon” jets, as given by a recently proposed operational definition. We consider a number of different methods for extracting reducibility factors from the central and forward datasets, from which the fractions of quark jets in each sample can be determined. The greatest stability and robustness to statistical uncertainties is achieved by a novel method based on parametrizing the endpoints of a receiver operating characteristic (ROC) curve. To mitigate detector effects, which would otherwise induce unphysical differences between central and forward jets, we use the OmniFold method to perform central value unfolding. As a demonstration of the power of this method, we extract the intrinsic dimensionality of the quark and gluon jet samples, which exhibit Casimir scaling, as expected from the strongly-ordered limit. To our knowledge, this work is the first application of full phase space unfolding to real collider data (albeit without a full systematics analysis), and the first application of topic modeling using a machine-learned classifier to extract separate quark and gluon distributions at the LHC.

[1]  Yen-Jie Lee,et al.  Data-driven extraction of the substructure of quark and gluon jets in proton-proton and heavy-ion collisions , 2022, 2204.00641.

[2]  I. Stewart,et al.  Pure quark and gluon observables in collinear drop , 2022, Journal of High Energy Physics.

[3]  P. Komiske,et al.  Analyzing N-Point Energy Correlators inside Jets with CMS Open Data. , 2022, Physical review letters.

[4]  Katy Craig,et al.  Which metric on the space of collider events? , 2021, Physical Review D.

[5]  M. Campanelli,et al.  Publishing unbinned differential cross section results , 2021, Journal of Instrumentation.

[6]  S. M. Etesami,et al.  Study of quark and gluon jet substructure in Z+jet and dijet events from pp collisions , 2021, Journal of High Energy Physics.

[7]  Armenia,et al.  Measurement of Lepton-Jet Correlation in Deep-Inelastic Scattering with the H1 Detector Using Machine Learning for Unfolding. , 2021, Physical review letters.

[8]  Benjamin Nachman,et al.  Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution , 2021, ArXiv.

[9]  Katy Craig,et al.  Linearized optimal transport for collider events , 2020, Physical Review D.

[10]  Andrew P. Turner,et al.  Data-driven quark- and gluon-jet modification in heavy-ion collisions , 2020, 2008.08596.

[11]  J. Kamenik,et al.  Learning the latent structure of collider events , 2020, Journal of High Energy Physics.

[12]  N. Castro,et al.  Use of a generalized energy Mover’s distance in the search for rare phenomena at colliders , 2020, The European Physical Journal C.

[13]  C. Cesarotti,et al.  A robust measure of event isotropy at colliders , 2020, Journal of High Energy Physics.

[14]  Patrick T. Komiske,et al.  The hidden geometry of particle collisions , 2020, Journal of High Energy Physics.

[15]  E. Alvarez,et al.  Topic model for four-top at the LHC , 2019, Journal of High Energy Physics.

[16]  Patrick T. Komiske,et al.  OmniFold: A Method to Simultaneously Unfold All Observables. , 2019, Physical review letters.

[17]  Patrick T. Komiske,et al.  Exploring the space of jets with CMS open data , 2019, Physical Review D.

[18]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[19]  Hoang Dai Nghia Nguyen,et al.  Properties of jet fragmentation using charged particles measured with the ATLAS detector in $pp$ collisions at $\sqrt{s}=13$ TeV , 2019, 1906.09254.

[20]  E. Metodiev,et al.  A theory of quark vs. gluon discrimination , 2019, Journal of High Energy Physics.

[21]  Jernej F. Kamenik,et al.  Uncovering latent jet substructure , 2019, Physical Review D.

[22]  Jesse Thaler,et al.  Metric Space of Collider Events. , 2019, Physical review letters.

[23]  Benjamin Nachman,et al.  Investigating the topology dependence of quark and gluon jets , 2018, Journal of High Energy Physics.

[24]  Patrick T. Komiske,et al.  Energy flow networks: deep sets for particle jets , 2018, Journal of High Energy Physics.

[25]  P. Komiske,et al.  An operational definition of quark and gluon jets , 2018, Journal of High Energy Physics.

[26]  M. Campanelli,et al.  Jet substructure at the Large Hadron Collider , 2018, Reviews of Modern Physics.

[27]  E. Metodiev,et al.  Jet Topics: Disentangling Quarks and Gluons at Colliders. , 2018, Physical review letters.

[28]  Gilles Blanchard,et al.  Decontamination of Mutual Contamination Models , 2017, J. Mach. Learn. Res..

[29]  B. Nachman,et al.  Classification without labels: learning from mixed samples in high energy physics , 2017, 1708.02949.

[30]  C. Collaboration,et al.  Particle-flow reconstruction and global event description with the CMS detector , 2017, 1706.04965.

[31]  M. Williams,et al.  A novel approach to the bias-variance problem in bump hunting , 2017, 1705.03578.

[32]  D. Kar,et al.  Systematics of quark/gluon tagging , 2017, 1704.03878.

[33]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[34]  Khachatryan,et al.  Jet energy scale and resolution in the CMS experiment in pp collisions at 8 TeV , 2016, 1607.03663.

[35]  Scoap Measurement of the charged-particle multiplicity inside jets from s=8 TeV pp collisions with the ATLAS detector , 2016 .

[36]  M. P. Casado,et al.  Measurement of the charged-particle multiplicity inside jets from $$\sqrt{s}=8$$s=8$${\mathrm{TeV}}$$TeV pp collisions with the ATLAS detector , 2016, The European physical journal. C, Particles and fields.

[37]  M. D. Pietra,et al.  Measurement of jet charge in dijet events from √s = 8 TeV p p collisions with the ATLAS detector , 2016 .

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  W. Waalewijn,et al.  Gaining (mutual) information about quark/gluon discrimination , 2014, Journal of High Energy Physics.

[40]  J. T. Childers,et al.  Light-quark and gluon jet discrimination in pp collisions at √s=7 TeV with the ATLAS detector , 2014, 1405.6583.

[41]  J. T. Childers,et al.  Light-quark and gluon jet discrimination in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pp$$\end{document}pp colli , 2014, The European Physical Journal C.

[42]  D. Neill,et al.  Jet shapes with the broadening axis , 2014, 1401.2158.

[43]  Ben Taskar,et al.  The Tangent Earth Mover's Distance , 2013, GSI.

[44]  G. Salam,et al.  Energy correlation functions for jet substructure , 2013, 1305.0007.

[45]  M. Cacciari,et al.  FastJet user manual , 2011, 1111.6097.

[46]  R. Field,et al.  Min-Bias and the Underlying Event at the LHC , 2011, 1202.0901.

[47]  J. Thaler,et al.  Maximizing boosted top identification by minimizing N-subjettiness , 2011, 1108.2701.

[48]  C. Collaboration,et al.  Determination of Jet Energy Calibration and Transverse Momentum Resolution in CMS , 2011, 1107.4277.

[49]  J. Thaler,et al.  Identifying boosted objects with N-subjettiness , 2010, 1011.2268.

[50]  S. D. Ellis,et al.  Jet shapes and jet algorithms in SCET , 2010, 1001.0014.

[51]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[52]  João Paulo Teixeira,et al.  The CMS experiment at the CERN LHC , 2008 .

[53]  M. Cacciari,et al.  The anti-$k_t$ jet clustering algorithm , 2008, 0802.1189.

[54]  M. Cacciari,et al.  The Catchment Area of Jets , 2008, 0802.1188.

[55]  J. Varela,et al.  The CMS trigger system , 2004, 1609.02366.

[56]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[57]  S. Mrenna,et al.  Pythia 6.3 physics and manual , 2003, hep-ph/0308153.

[58]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[59]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[60]  G. D'Agostini,et al.  A Multidimensional unfolding method based on Bayes' theorem , 1995 .

[61]  J. Pumplin,et al.  How to tell quark jets from gluon jets. , 1991, Physical review. D, Particles and fields.

[62]  Michael Werman,et al.  A Unified Approach to the Change of Resolution: Space and Gray-Level , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  P. Grassberger,et al.  Characterization of Strange Attractors , 1983 .

[64]  H. Akaike A new look at the statistical model identification , 1974 .

[65]  V. Mikuni Multi-differential Jet Substructure Measurement in High Q 2 DIS Events with HERA-II Data , 2022 .

[66]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[67]  A. Dell'Acqua,et al.  Geant4 - A simulation toolkit , 2003 .

[68]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[69]  R. Dobrushin Prescribing a System of Random Variables by Conditional Distributions , 1970 .