DocuMind: A Two-Stage Retrieval-Augmented Generation System for Academic Research Paper Question Answering

Main Article Content

Shivani Vyas
Archi Singhal
Prabhakar Sharma
R. P. S. Chauhan
Anjali Chandra

Abstract

Unstructured academic data has seen a massive increase in recent years and have become extremely challenging in terms of extraction of information. While current question answering applications on PDFs have high accuracy, they rely on closed source cloud services, which make them inappropriate for research papers. This work introduces DocuMind, an open-source and privately deployable retrieval augmented generation framework for question answering on research papers. It features a novel two-step retrieval scheme consisting of deterministic page one pinning along with maximal marginal relevance to tackle the issue of false answers coming from references sections in academic documents. An experimental evaluation is conducted through two hundred question and answer pairs from twenty research papers and results show an accuracy of 81.5 percent with full immunity against hallucinations. The method has improved the accuracy of identity questions to 82.7 percent from 44.4 percent. All components of DocuMind have been developed using open-source software without any requirement for cloud services.

Article Details

How to Cite
Vyas, S., Singhal, A., Sharma, P., Chauhan, R. P. S., & Chandra, A. (2026). DocuMind: A Two-Stage Retrieval-Augmented Generation System for Academic Research Paper Question Answering. International Journal on Advanced Computer Engineering and Communication Technology, 15(1), 160–166. Retrieved from https://journals.mriindia.com/index.php/ijacect/article/view/2347
Section
Articles

Most read articles by the same author(s)

Similar Articles

<< < 9 10 11 12 13 14 15 16 17 18 > >> 

You may also start an advanced similarity search for this article.