IntegriScan: A Graph-Aided Model for Detecting Corrupted and Anomalous Data Patterns
Main Article Content
Abstract
In today's data-driven environments, ensuring data integrity is critical for accurate decision-making. Data corruption—caused by system errors, transmission faults, or malicious attacks—can lead to misleading analytical results. Existing machine learning models like Local Outlier Factor (LOF), Isolation Forest, and One-Class SVM offer partial solutions but often lack the precision required in complex datasets. This paper introduces a novel algorithm, PAACDA (Proximity-based Adamic-Adar Corruption Detection Algorithm), that leverages graph-based Adamic-Adar similarity to identify outlier and corrupted values. The algorithm uses local proximity measurements to determine abnormal data points by comparing feature similarity scores and thresholds derived from mean-based scaling. Additionally, we propose a hybrid model—Hybrid PAACDA—that extracts features from PAACDA and trains a Random Forest classifier to predict corrupted data in future datasets. The system is implemented using a Django-based web interface, providing modules for training and evaluation across multiple algorithms. Experimental results show that PAACDA outperforms traditional methods, achieving 94% accuracy, while the Hybrid PAACDA extension delivers 100% accuracy, confirming its effectiveness in real-time corruption detection.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.