Malware Analysis and Detection using Machine Learning

Main Article Content

Mayur Rasal
Monika Rokade
Sunil Khatal

Abstract

The increase in Android apps is accompanied by higher risks of malware infection that lead to greater risks to user privacy and compromise the integrity of devices. This study proposes a framework to identify and classify Android malware instances using machine learning. Static behavior of applications, in this case, is defined as attributes such as permissions and API calls that are collected from the APK files. Dynamic behavior includes actions performed by a running application. These are captured, converted to descriptive feature vectors and used in multidimensional space to classify malware into families. The results demonstrated how machine-learned models can extend the boundaries of Android malware detection. They also shed light on the innovative and effective means of feature selection and model parametrization in order to respond to the challenges existing in the Android malware detection space. Open-source datasets from Kaggle are used for the base of the study. Android permissions and intents are identified to be among the most important and are therefore selected as the focal point. Several preprocessing methods are used to prepare the dataset for the experiments. Among them, normalization and feature extraction are performed to prune the dataset and reduce the computational burden on the algorithms. This also enhances the performance of the algorithms. The value of machine learning and data-driven methods for modern malware detection is clearly illustrated by this study. The dataset was optimized with a 70 to 30% split of 2800 and 1200 samples for the training and testing partitions, respectively. The three main classifiers used for testing are Random Forest (RF), J48, and Naïve Bayes (NB). The accuracy of NB was 93.5%, while J48 and RF achieved mean accuracy of 94.35% and 96.25%, respectively. With an accuracy of 98.33%, 96.41%, 99.66%, and 98.01% (F-measure), this model demonstrates a competitive edge against the HML models (decision stump, Random Forest, and Vote), with a 0.33%, 0.33%, and 5.41% improvement over the HML models, respectively. Overall, it can be concluded that hybrid models improve detection accuracy and model robustness when compared to simply using single classifiers. The results revealed that combining feature selection and ensemble learning can be efficiently and effectively employed to process high-dimensional data of networks pertaining to practical Android malware detection.


 

Article Details

How to Cite
Rasal, M., Rokade, M., & Khatal, S. (2026). Malware Analysis and Detection using Machine Learning. International Journal of Advanced Scientific Research and Engineering Trends, 10(5), 63–67. Retrieved from https://journals.mriindia.com/index.php/ijasret/article/view/3235
Section
Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.