IRIS Institutional Research Information System

With the ever-increasing continuous adoption of Industrial Internet of Things (IoT) technologies, security concerns have grown exponentially, especially regarding securing critical infrastructures. This is primarily due to the potential for backdoors to provide unauthorized access, disrupt operations, and compromise sensitive data. Backdoors pose a significant threat to the integrity and security of Industrial IoT setups by exploiting vulnerabilities and bypassing standard authentication processes. Hence its detection becomes of paramount importance. This paper not only investigates the capabilities of Machine Learning (ML) models in identifying backdoor malware but also evaluates the impact of balancing the dataset via resampling techniques, including Synthetic Minority Oversampling Technique (SMOTE), Synthetic Data Vault (SDV), and Conditional Tabular Generative Adversarial Network (CTGAN), and feature reduction such as Pearson correlation coefficient, on the performance of the ML models. Experimental evaluation on the CCCS-CIC-AndMal-2020 dataset demonstrates that the Random Forest (RF) classifier generated an optimal model with 99.98% accuracy when using a balanced dataset created by SMOTE. Additionally, the training and testing time was reduced by approximately 50% when switching from the full feature set to a reduced feature set, without significant performance loss.

Backdoor Malware Detection in Industrial IoT Using Machine Learning

Khan, Maryam Mahsal;Buriro, Attaullah;Ahmad, Tahir^{Membro del Collaboration Group};Ullah, Subhan

2024-01-01

Abstract

With the ever-increasing continuous adoption of Industrial Internet of Things (IoT) technologies, security concerns have grown exponentially, especially regarding securing critical infrastructures. This is primarily due to the potential for backdoors to provide unauthorized access, disrupt operations, and compromise sensitive data. Backdoors pose a significant threat to the integrity and security of Industrial IoT setups by exploiting vulnerabilities and bypassing standard authentication processes. Hence its detection becomes of paramount importance. This paper not only investigates the capabilities of Machine Learning (ML) models in identifying backdoor malware but also evaluates the impact of balancing the dataset via resampling techniques, including Synthetic Minority Oversampling Technique (SMOTE), Synthetic Data Vault (SDV), and Conditional Tabular Generative Adversarial Network (CTGAN), and feature reduction such as Pearson correlation coefficient, on the performance of the ML models. Experimental evaluation on the CCCS-CIC-AndMal-2020 dataset demonstrates that the Random Forest (RF) classifier generated an optimal model with 99.98% accuracy when using a balanced dataset created by SMOTE. Additionally, the training and testing time was reduced by approximately 50% when switching from the full feature set to a reduced feature set, without significant performance loss.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
TSP_CMC_57648 (5).pdf accesso aperto Licenza: Dominio pubblico Dimensione 657.57 kB Formato Adobe PDF Visualizza/Apri	657.57 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/356627

Citazioni

ND

social impact