Enhanced AI-Based Image and Video Retrieval System Using CLIP and Hybrid Semantic Indexing

Anuja Bele; Ryan  Lawrence; Darshan Butle; Kapil Gupta; Himanshu Hiwanj

doi:10.65521/ijacect.v14i3s.1631

Authors

Anuja Bele Computer Engineering , St. Vincent Pallotti College of Engineering and Technology, Nagpur Email: 1anujabele.22@stvincentngp.edu.in, 2ryanlawrence.22@stvincentngp.edu.in,
Ryan Lawrence Computer Engineering , St. Vincent Pallotti College of Engineering and Technology, Nagpur
Darshan Butle Computer Engineering , St. Vincent Pallotti College of Engineering and Technology, Nagpur
Kapil Gupta Computer Engineering , St. Vincent Pallotti College of Engineering and Technology, Nagpur
Himanshu Hiwanj Computer Engineering , St. Vincent Pallotti College of Engineering and Technology, Nagpur

DOI:

https://doi.org/10.65521/ijacect.v14i3s.1631

Keywords:

CLIP FAISS multimodal retrieval video frame sampling deep learning semantic search

Abstract

Finding pictures and movies fast and precisely has become more crucial due to the explosive growth of multimedia content. In order to improve the efficiency and semantic significance of that procedure, this research presents a sophisticated AI-powered retrieval system. The system supports both text-based and image-based searches by combining Facebook AI Similarity Search (FAISS) [10][11]with Contrastive Language–Image Pre-training (CLIP). It produces more accurate results by enabling configurable weighting and removing irrelevant or negatively associated suggestions.

During video retrieval, the system extracts individual frames using FFmpeg and indexes them using FAISS for frame-level similarity matching. With Precision@5 of 92.8%, Recall@10 of 89.1%, and an average query time of just 0.45 seconds, the method achieves remarkable performance. Recent developments in multimodal video processing[25],[26] and CLIP optimization[24] are combined to increase efficiency even more. All things considered, our approach offers a scalable and useful foundation for high semantic comprehension in real-time multimedia retrieval.

Enhanced AI-Based Image and Video Retrieval System Using CLIP and Hybrid Semantic Indexing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Quick Links

For Authors

For Reviewers

Contact Us

Similar Articles

Deep Learning Approaches for Speech Recognition and Synthesis

Deep Learning for Medical Diagnosis and Prognosis

Adaptive Learning in the Era of Artificial Intelligence: Enhancing Student Engagement in Digital Education

A Machine Learning–Driven Framework for Predicting Nutritional Deficiencies using a Multi-Data Approach

Deep Learning and Optimization Approaches in Combining the Advantages of Radiomics Feature Extraction and Non-Invasive Detection of Microsatellite Instability in Colorectal Cancer Using Hyperparameter Tuned Pre-trained Model: A Review

Deep Learning and Optimization Approaches in Secure Cloud Data Storage and Retrieval Using Giant Trevally Optimizer with Quantum Convolutional Neural Network-Based Encryption Algorithm: A Review

A Survey of Methods and Architectures for Deep Learning with Optimization-Based Task Scheduling and Computing Resource Allocation for VR Video Services in Advanced 6G Networks

Deep Learning and Optimization Approaches in Deep Recursive Self-Attention Modules: MANET-Based Integrated Sensor System for Disaster Detection and Communication in Hazardous Environments: A Review

Deep Learning and Optimization Approaches in Enhancing Air Pollution Detection Accuracy and Quality Monitoring Using Pyramidal Convolution Split-Attention Networks and IoT: A Review

Deep Learning and Optimization Approaches in Dynamic Path-Controllable Deep Unfolding Network to Predict K-Barriers for Intrusion Detection Using Wireless Sensor Networks: A Review