IntegriScan: A Graph-Aided Model for Detecting Corrupted and Anomalous Data Patterns

M. Asha Aruna Sheela; Mailavarapu Tejaswi; Nallani Bhanu Prakash; Manam Dhana Sri Tulasi; Kongara Anitha

PDF

Published: Apr 14, 2025

Keywords:

Random Forest, Machine Learning, Adamic-Adar, PAACDA, Outlier Detection, Data Corruption

M. Asha Aruna Sheela

Assistant Professor & HOD,Department of Computer Science & Engineering ,Chalapathi Institute of Engineering and Technology, LAM, Guntur, AP, India

Mailavarapu Tejaswi

Department of Computer Science and Engineering,Chalapathi Institute of Engineering and Technology, LAM, Guntur, AP, India

Nallani Bhanu Prakash

Department of Computer Science and Engineering,Chalapathi Institute of Engineering and Technology, LAM, Guntur, AP, India

Manam Dhana Sri Tulasi

Department of Computer Science and Engineering,Chalapathi Institute of Engineering and Technology, LAM, Guntur, AP, India

Kongara Anitha

Department of Computer Science and Engineering,Chalapathi Institute of Engineering and Technology, LAM, Guntur, AP, India

Abstract

In today's data-driven environments, ensuring data integrity is critical for accurate decision-making. Data corruption—caused by system errors, transmission faults, or malicious attacks—can lead to misleading analytical results. Existing machine learning models like Local Outlier Factor (LOF), Isolation Forest, and One-Class SVM offer partial solutions but often lack the precision required in complex datasets. This paper introduces a novel algorithm, PAACDA (Proximity-based Adamic-Adar Corruption Detection Algorithm), that leverages graph-based Adamic-Adar similarity to identify outlier and corrupted values. The algorithm uses local proximity measurements to determine abnormal data points by comparing feature similarity scores and thresholds derived from mean-based scaling. Additionally, we propose a hybrid model—Hybrid PAACDA—that extracts features from PAACDA and trains a Random Forest classifier to predict corrupted data in future datasets. The system is implemented using a Django-based web interface, providing modules for training and evaluation across multiple algorithms. Experimental results show that PAACDA outperforms traditional methods, achieving 94% accuracy, while the Hybrid PAACDA extension delivers 100% accuracy, confirming its effectiveness in real-time corruption detection.

Downloads

Download data is not yet available.

How to Cite

Sheela , M. A. A., Tejaswi , M., Prakash , N. B., Tulasi , M. D. S., & Anitha , K. (2025). IntegriScan: A Graph-Aided Model for Detecting Corrupted and Anomalous Data Patterns. International Journal of Recent Advances in Engineering and Technology, 14(1), 71–78. Retrieved from https://journals.mriindia.com/index.php/ijraet/article/view/181