In recent years, massive development in the malware industry changed the entire landscape for malware development. Therefore, cybercriminals became more sophisticated by advancing their development techniques from file-based to fileless malware. As file-based malware depends on files to spread itself, on the other hand, fileless malware does not require a traditional file system and uses benign processes to carry out its malicious intent. Therefore, it evades conventional detection techniques and remains stealthy. This paper briefly explains fileless malware, its life cycle, and its infection chain. Moreover, it proposes a detection technique based on feature analysis using machine learning for fileless malware detection. The virtual machine acquired the memory dumps upon executing the malicious and non-malicious samples. Then the necessary features are extracted using the Volatility memory forensics tool, which is then analyzed using machine learning classification algorithms. After that, the best algorithm is selected based on the k-fold cross-validation score. Experimental evaluation has shown that Random Forest outperforms other machine learning classifiers (Decision Tree, Support Vector Machine, Logistic Regression, K-Nearest Neighbor, XGBoost, and Gradient Boosting). It achieved an overall accuracy of 93.33% with a True Positive Rate (TPR) of 87.5% at zeroFalse Positive Rate (FPR) for fileless malware collected from five widely used datasets (VirusShare, AnyRun, PolySwarm, HatchingTriage, and JoESadbox).

An Insight into the Machine-Learning-Based Fileless Malware Detection

Tahir Ahmad
Membro del Collaboration Group
;
2023-01-01

Abstract

In recent years, massive development in the malware industry changed the entire landscape for malware development. Therefore, cybercriminals became more sophisticated by advancing their development techniques from file-based to fileless malware. As file-based malware depends on files to spread itself, on the other hand, fileless malware does not require a traditional file system and uses benign processes to carry out its malicious intent. Therefore, it evades conventional detection techniques and remains stealthy. This paper briefly explains fileless malware, its life cycle, and its infection chain. Moreover, it proposes a detection technique based on feature analysis using machine learning for fileless malware detection. The virtual machine acquired the memory dumps upon executing the malicious and non-malicious samples. Then the necessary features are extracted using the Volatility memory forensics tool, which is then analyzed using machine learning classification algorithms. After that, the best algorithm is selected based on the k-fold cross-validation score. Experimental evaluation has shown that Random Forest outperforms other machine learning classifiers (Decision Tree, Support Vector Machine, Logistic Regression, K-Nearest Neighbor, XGBoost, and Gradient Boosting). It achieved an overall accuracy of 93.33% with a True Positive Rate (TPR) of 87.5% at zeroFalse Positive Rate (FPR) for fileless malware collected from five widely used datasets (VirusShare, AnyRun, PolySwarm, HatchingTriage, and JoESadbox).
File in questo prodotto:
File Dimensione Formato  
sensors-23-00612 (1).pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/344207
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
social impact