Data Science at CMS

During my bachelor’s and master’s theses at the RWTH Aachen University, I was part of the CMS collaboration at CERN.
The CMS collaboration consists of thousands of scientists from around the globe who are running the CMS experiment collaboratively, recording and analyzing collisions of high energetic protons accelerated by the Large Hadron Collider (LHC).

Specifically, I worked on an analysis called “MUSiC” (Model Unspecific Search in CMS), which aims to provide an automated and model independent way of searching for new physics in collision data recorded by CMS. As particle physics analyses usually consist of reducing several terabytes of data into a few pretty graphics, performance and automation are indispensable. For this purpose, I used a wide range of “big-data” libraries, mostly in Python and C++, including the following:

Below, I’ve copied the abstracts and links to the full text.

Discovery Potential of a Model Independent Search for New Physics at the LHC (Master’s Thesis)

In 2015 and 2016, the CMS detector recorded proton-proton collisions at an unprecedented center of mass energy of 13TeV. The Model Unspecific Search in CMS (MUSiC) provides an automated search for various possible signatures of new physics in these data.

In a three step process, MUSiC first classifies events according to the physics content of the final state, searches a set of kinematic distributions for the most significant deviations between Standard Model Monte Carlo simulations and observed data and finally applies a statistical hypothesis test to draw conclusions about indications of new physics in the observed dataset.

In this thesis, the discovery potential towards new physics is assessed. For this purpose, a quantification of the test power is defined. Subsequently, the framework is applied to simulated events of four benchmark models for new physics. Alongside the discovery potential towards these theories, the influence of several existing and newly introduced features and parameters on the sensitivity is measured.

GitHub, Full Text (PDF, 3.38 MB, english)

Development of a Fast Search Algorithm for the MUSiC Framework (Bachelor’s Thesis)

The CMS experiment at the LHC produces a vast amount of data: Each second about 20 TB of information is generated by the detector hardware. Accordingly, analysis of the data also requires a lot of computing power. Using a chain of several algorithms, the detector signal is interpreted as physical meaningful data, on which state-of-the-art analyses are performed.

The Model Unspecific Search in CMS (MUSiC) is an analysis carried out on a wide spectrum of final states. Kinematic distributions of these final states are aggregated and compared to the expectation from Standard Model Monte Carlo simulations.

By searching for deviations, MUSiC is sensitive to indications of physics beyond the standard model. This thesis proposes, implements and validates an additional step in the MUSiC analysis, which drastically reduces the runtime.

GitHub, Full Text (PDF, 3.05 MB, english)