Enhanced AI-Based Image and Video Retrieval System Using CLIP and Hybrid Semantic Indexing
Main Article Content
Abstract
Finding pictures and movies fast and precisely has become more crucial due to the explosive growth of multimedia content. In order to improve the efficiency and semantic significance of that procedure, this research presents a sophisticated AI-powered retrieval system. The system supports both text-based and image-based searches by combining Facebook AI Similarity Search (FAISS) [10][11]with Contrastive Language–Image Pre-training (CLIP). It produces more accurate results by enabling configurable weighting and removing irrelevant or negatively associated suggestions.
During video retrieval, the system extracts individual frames using FFmpeg and indexes them using FAISS for frame-level similarity matching. With Precision@5 of 92.8%, Recall@10 of 89.1%, and an average query time of just 0.45 seconds, the method achieves remarkable performance. Recent developments in multimodal video processing[25],[26] and CLIP optimization[24] are combined to increase efficiency even more. All things considered, our approach offers a scalable and useful foundation for high semantic comprehension in real-time multimedia retrieval.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.