A Tri-Modal Deepfake Forensics and Web Interception Architecture

Aryan Pardeshi; Harsh  Rathod; Prajwal Pansare; Apurva Shinde; Ashvini Kheole

doi:10.65521/ijeecs.v15i1S.3069

Authors

Aryan Pardeshi Department of Computer Engineering, Genba Sopanrao Moze College of Engineering.
Harsh Rathod Department of Computer Engineering, Genba Sopanrao Moze College of Engineering.
Prajwal Pansare Department of Computer Engineering, Genba Sopanrao Moze College of Engineering.
Apurva Shinde Department of Computer Engineering, Genba Sopanrao Moze College of Engineering.
Ashvini Kheole Department of Computer Engineering, Genba Sopanrao Moze College of Engineering.

DOI:

https://doi.org/10.65521/ijeecs.v15i1S.3069

Keywords:

Deepfake Detection Digital Forensics Computer Vision Machine Learning Natural Language Processing Speech Processing Cybersecurity Multimodal Learning Web Security; Artificial Intelligence

Abstract

The rapid proliferation of highly realistic synthetic media, commonly known as deepfakes, poses a severe threat to digital identity verification and media authenticity. Current deepfake detection methodologies predominantly rely on single-modality neural networks or computationally prohibitive feature-level fusion, rendering them inefficient for real-time web deployment. This paper surveys existing unimodal and multimodal deepfake detection frameworks and proposes a novel, highly scalable alternative: a decoupled, Tri-Modal Late-Fusion architecture. The proposed system evaluates media through three parallel, asynchronous pipelines: a Spatial engine utilizing Error Level Analysis (ELA) paired with a Convolutional Neural Network (CNN) for compression artifact detection; a Biometric engine employing a ResNeXt-50 and LSTM network for temporal facial tracking; and an Auditory engine converting 1D waveforms into 2D Mel-Spectrograms for synthetic frequency classification. By intercepting live WebRTC streams via a zero-dependency DOM injection protocol, the architecture bypasses traditional file-download bottlenecks. Utilizing a Weighted Confidence Algorithm for decision-level fusion, the system achieves a 97.8% ensemble accuracy and gracefully degrades in the absence of specific data streams, analyzing 5-second media buffers with a maximum latency of 2.1 seconds. This survey demonstrates that decoupled, parallel modality processing offers a vastly superior, fault-tolerant framework for commercial deepfake interception compared to traditional synchronous models.

A Tri-Modal Deepfake Forensics and Web Interception Architecture

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Quick Links

For Authors

For Reviewers

Contact Us

Similar Articles

Artificial Intelligence Techniques for a Proactive Auto-Scaling and Energy-Efficient VM Allocation Framework Using an Online Multi-Resource Capsule Shuffle Attention Network for Cloud Data Centres: Trends and Challenges

AI-Driven Fish Health Monitoring and Recommendation System for Aquaculture

Smart Rubik’s Cube Solving Robot with Colour Recognition

A Survey of Methods and Architectures for Multi-Attack Detection using Forensics and Coherent Integrated Photonic Neural Networks-Based Prevention for Secure IoT-MANETs

Artificial Intelligence Techniques for Enhancing Air Pollution Detection Accuracy and Quality Monitoring Using Pyramidal Convolution Split-Attention Networks and IoT: Trends and Challenges

Result Paper On Cyberfence: Intelligent Defence Against Phishing Links

Deep Learning and Optimization Approaches in Risk Forecasting in Financial Management of Publicly Listed Companies Using an Enhanced Deep Learning Network within the Digital Economy: A Review

A Comprehensive Review Of Campus Recruitment Systems Using Machine Learning

Multimodal detection and Severity Assessment of Autism Spectrum Disorder using ML and DL

Predictive Fluid Eye Analysis via Machine Learning