The high dimensionality of the data produced in high-energy physics experiments makes the use of machine learning algorithms, such as neural networks, necessary to improve the performance of reconstruction and classification of the analyzed events. Interpretability, i.e. the capability to explain the dynamics that lead the network to a certain outcome, emerged as a major need with architectures growing in complexity. In the analysis of pp collisions at the LHC, explainability firstly concern the assessment of the relative importance of high-level observables used to classify events. In this context, we have developed a method to select the most important features associated with a particle jet of which we want to establish the origin. Features are importance-sorted with a decision tree algorithm. A k-fold cross-validation is applied to raise the confidence in the extracted ranking. We tested the method with the case of highly boosted di-jet resonances decaying to two b-quarks, to be selected against an overwhelming QCD background with a Deep Neural network. We show that noisy and irrelevant features are rejected while relevant features occupy the top-ranking positions.

Automated feature selection procedure for particle jet classification

Andrea Di Luca
;
Marco Cristoforetti;Francesco Maria Follega;Daniela Mascione
2023-01-01

Abstract

The high dimensionality of the data produced in high-energy physics experiments makes the use of machine learning algorithms, such as neural networks, necessary to improve the performance of reconstruction and classification of the analyzed events. Interpretability, i.e. the capability to explain the dynamics that lead the network to a certain outcome, emerged as a major need with architectures growing in complexity. In the analysis of pp collisions at the LHC, explainability firstly concern the assessment of the relative importance of high-level observables used to classify events. In this context, we have developed a method to select the most important features associated with a particle jet of which we want to establish the origin. Features are importance-sorted with a decision tree algorithm. A k-fold cross-validation is applied to raise the confidence in the extracted ranking. We tested the method with the case of highly boosted di-jet resonances decaying to two b-quarks, to be selected against an overwhelming QCD background with a Deep Neural network. We show that noisy and irrelevant features are rejected while relevant features occupy the top-ranking positions.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/339758
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact