Multimodal Depression Detection Using Textual and Visual Cues: A Machine Learning Approach with the DAIC-WOZ Dataset
Main Article Content
Abstract
Depression is a prevalent mental health disorder affecting millions worldwide, yet remains underdiagnosed due to various barriers in clinical seFngs. This research proposes a novel multimodal machine learning framework that leverages both textual and visual behavioral indicators to detect depression. Utilizing the Distress Analysis Interview Corpus Wizard-of-Oz (DAIC-WOZ) dataset, our approach combines natural language processing techniques to analyze linguistic patterns with computer vision methods to capture non-verbal cues. The multimodal model achieved significantly higher performance (F1-score: 0.89) compared to unimodal approaches (text-only: 0.76, visual-only: 0.72), demonstrating the effectiveness of integrating multiple data modalities. Key depression indicators identified include specific linguistic patterns (increased negative emotion words, first-person singular pronouns) and visual markers (reduced facial expressivity, decreased eye contact). This research contributes to the emerging field of automated depression screening tools that could supplement clinical diagnostics, particularly in telehealth seFngs where in-person assessment is limited. Ethical considerations regarding privacy, bias, and appropriate implementation contexts are discussed.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.