Malware Analysis and Detection using Machine Learning

Mayur Rasal; Monika  Rokade; Sunil  Khatal

Authors

Mayur Rasal Dept of Computer Engg. Sharadchandra Pawar College of Engineering, Otur, India
Monika Rokade Dept of Computer Engg. Sharadchandra Pawar College of Engineering, Otur, India
Sunil Khatal Dept of Computer Engg. Sharadchandra Pawar College of Engineering, Otur, India

Keywords:

Android APK Malware Detection Machine Learning Random Forests Software Security

Abstract

The increase in Android apps is accompanied by higher risks of malware infection that lead to greater risks to user privacy and compromise the integrity of devices. This study proposes a framework to identify and classify Android malware instances using machine learning. Static behavior of applications, in this case, is defined as attributes such as permissions and API calls that are collected from the APK files. Dynamic behavior includes actions performed by a running application. These are captured, converted to descriptive feature vectors and used in multidimensional space to classify malware into families. The results demonstrated how machine-learned models can extend the boundaries of Android malware detection. They also shed light on the innovative and effective means of feature selection and model parametrization in order to respond to the challenges existing in the Android malware detection space. Open-source datasets from Kaggle are used for the base of the study. Android permissions and intents are identified to be among the most important and are therefore selected as the focal point. Several preprocessing methods are used to prepare the dataset for the experiments. Among them, normalization and feature extraction are performed to prune the dataset and reduce the computational burden on the algorithms. This also enhances the performance of the algorithms. The value of machine learning and data-driven methods for modern malware detection is clearly illustrated by this study. The dataset was optimized with a 70 to 30% split of 2800 and 1200 samples for the training and testing partitions, respectively. The three main classifiers used for testing are Random Forest (RF), J48, and Naïve Bayes (NB). The accuracy of NB was 93.5%, while J48 and RF achieved mean accuracy of 94.35% and 96.25%, respectively. With an accuracy of 98.33%, 96.41%, 99.66%, and 98.01% (F-measure), this model demonstrates a competitive edge against the HML models (decision stump, Random Forest, and Vote), with a 0.33%, 0.33%, and 5.41% improvement over the HML models, respectively. Overall, it can be concluded that hybrid models improve detection accuracy and model robustness when compared to simply using single classifiers. The results revealed that combining feature selection and ensemble learning can be efficiently and effectively employed to process high-dimensional data of networks pertaining to practical Android malware detection.

Malware Analysis and Detection using Machine Learning

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Quick Links

For Authors

For Reviewers

Contact Us

Similar Articles

Improved Interpretability with the use of Explainable AI for Intrusion Detection and Classification in Internet of Things Networks

Advanced Detection of Fake Social Media Accounts Using Machine Learning Algorithms

Comparative Analysis of Machine Learning and Deep Learning Models for Stock Market Prediction Using Continuous and Binary Data

A Lightweight AI-Driven Framework for Intelligent Cyber Threat Detection and Response

A Review of the Integration of Machine Learning Techniques for the Detection of Depression

Cyber Threat Analysis and Detection Using Advanced Deep Learning Models

Fraud Detection and Analysis for Insurance Claim using Machine Learning

Data-Driven Insights for Academic Success: Predicting Student Performance Using Machine Learning

A Novel Federated Incremental Deep Learning Framework for Zero-Day Cyber Attack Detection

AI- based Personalised Learning