Unsupervised Learning Framework for Anomaly Detection in High-Dimensional Data Streams Using Clustering and Autoencoders
Main Article Content
Abstract
The rapid growth of high-dimensional data streams generated from IoT systems, financial networks, cybersecurity infrastructures, healthcare monitoring platforms, and industrial sensor systems has significantly increased the importance of real-time anomaly detection. Traditional supervised anomaly detection techniques often require large volumes of labeled data, which are difficult and expensive to obtain in dynamic environments. Furthermore, high-dimensional streaming data introduces challenges related to feature complexity, noise, scalability, and evolving data distributions. Unsupervised learning approaches have therefore emerged as effective solutions for detecting anomalous patterns without requiring labeled training datasets. This research proposes an unsupervised learning framework for anomaly detection in high-dimensional data streams using clustering and autoencoder-based deep learning techniques. The proposed framework integrates feature extraction, dimensionality reduction, distributed clustering, and deep autoencoder reconstruction mechanisms to identify abnormal patterns and rare events in continuously evolving data streams. Clustering algorithms are utilized to group normal behavioral patterns, while autoencoders learn compressed latent representations and identify anomalies through reconstruction error analysis. The framework supports real-time analytical processing, adaptive learning, and scalable anomaly detection in heterogeneous streaming environments. Experimental evaluation demonstrates that the proposed framework significantly improves anomaly detection accuracy, false-positive reduction, scalability, and computational efficiency compared to traditional statistical and distance-based anomaly detection approaches Furthermore, the integration of clustering and deep autoencoder architectures enhances the framework’s capability to identify subtle and previously unseen anomalies in high-dimensional feature spaces.