Emotionally Intelligent AI Companion for Enhancing Human-AI Interaction through Text and Voice Based On Sentiment Analysis
Main Article Content
Abstract
This research presents a multimodal AI companion designed to support adolescent mental health by enabling empathetic interactions through both text and voice.¹ ² Voice inputs are transcribed using OpenAI’s Whisper API, which provides low word-error rates and robust performance across diverse speech conditions.³ ⁴ The transcribed or typed text is then processed by a fine-tuned DistilBERT model for real-time detection of 28 emotions based on the GoEmotions dataset, capturing polarity and nuanced affective states.⁵ ⁶ Meta’s Llama 3.0 generates context-aware responses, adapting tone using detected emotions and user history stored through LangChain and MongoDB for personalization.⁷ ⁸ A FastAPI-based implementation supports secure deployment and includes a dashboard for tracking emotional trends over time.⁹ The prototype demonstrates high accuracy in both transcription and emotion recognition, outperforming unimodal baselines and strengthening affective computing through integrated voice and text capabilities.¹⁰ Future work includes expanding multilingual support to increase accessibility.¹¹ ³
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.