Behavioral Analysis of Backdoor Malware Exploiting Heap Overflow Vulnerabilities Using Data Mining and Machine Learning

Khaliq, Ali Raza; Ullah, Subhan; Ahmad, Tahir; Yadav, Ashish; Majid, M Imran

doi:10.22555/pjets.v11i1.984

Backdoor malware remains a persistent and elusive threat that successfully evades conventional detection methods through intricate techniques, such as registry key concealment and API call manipulation. In this study, we introduce an approach to detect backdoor malware, drawing upon the diverse domains of cybersecurity. Our method combines static and dynamic analysis techniques with machine learning methodologies, particularly emphasizing classification and feature engineering. Through static analysis, we extract valuable raw features from malware binaries. Discerning the most significant attributes, we delve into the calling frequencies embedded within these raw features. Subsequently, these selected attributes undergo a meticulous refinement process facilitated by feature engineering techniques, culminating in a streamlined set of distinctive features. To accurately detect malware exploiting heap-based overflow vulnerabilities, we employ three distinct yet potent classifiers: J48, Naïve Bayes, and Simple Logistic. These classifiers are trained and tested using carefully curated feature sets. Our approach combines machine learning and data mining principles to develop a comprehensive malware detection methodology. We demonstrate the efficacy of our approach through rigorous validation using two distinct settings: a dedicated training/testing set and a comprehensive 10-fold validation. Our approach simultaneously achieves 90.29% and 84.46% accuracy in train/ test split and cross-validation strategies.

IRIS Institutional Research Information System