Backdoor malware remains a persistent and elusive threat that successfully evades conventional detection methods through intricate techniques, such as registry key concealment and API call manipulation. In this study, we introduce an approach to detect backdoor malware, drawing upon the diverse domains of cybersecurity. Our method combines static and dynamic analysis techniques with machine learning methodologies, particularly emphasizing classification and feature engineering. Through static analysis, we extract valuable raw features from malware binaries. Discerning the most significant attributes, we delve into the calling frequencies embedded within these raw features. Subsequently, these selected attributes undergo a meticulous refinement process facilitated by feature engineering techniques, culminating in a streamlined set of distinctive features. To accurately detect malware exploiting heap-based overflow vulnerabilities, we employ three distinct yet potent classifiers: J48, Naïve Bayes, and Simple Logistic. These classifiers are trained and tested using carefully curated feature sets. Our approach combines machine learning and data mining principles to develop a comprehensive malware detection methodology. We demonstrate the efficacy of our approach through rigorous validation using two distinct settings: a dedicated training/testing set and a comprehensive 10-fold validation. Our approach simultaneously achieves 90.29% and 84.46% accuracy in train/ test split and cross-validation strategies.

Behavioral Analysis of Backdoor Malware Exploiting Heap Overflow Vulnerabilities Using Data Mining and Machine Learning

Ahmad, Tahir
Membro del Collaboration Group
;
2023-01-01

Abstract

Backdoor malware remains a persistent and elusive threat that successfully evades conventional detection methods through intricate techniques, such as registry key concealment and API call manipulation. In this study, we introduce an approach to detect backdoor malware, drawing upon the diverse domains of cybersecurity. Our method combines static and dynamic analysis techniques with machine learning methodologies, particularly emphasizing classification and feature engineering. Through static analysis, we extract valuable raw features from malware binaries. Discerning the most significant attributes, we delve into the calling frequencies embedded within these raw features. Subsequently, these selected attributes undergo a meticulous refinement process facilitated by feature engineering techniques, culminating in a streamlined set of distinctive features. To accurately detect malware exploiting heap-based overflow vulnerabilities, we employ three distinct yet potent classifiers: J48, Naïve Bayes, and Simple Logistic. These classifiers are trained and tested using carefully curated feature sets. Our approach combines machine learning and data mining principles to develop a comprehensive malware detection methodology. We demonstrate the efficacy of our approach through rigorous validation using two distinct settings: a dedicated training/testing set and a comprehensive 10-fold validation. Our approach simultaneously achieves 90.29% and 84.46% accuracy in train/ test split and cross-validation strategies.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/349227
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact