AI-Based Handwritten Text Recognition and Smart Document Digitization System
Main Article Content
Abstract
Handwritten text recognition remains a challenging problem in document intelligence due to variations in writing style, including differences in stroke, spacing, and alignment. This paper presents an AI-Based Handwritten Text Recognition and Smart Document Digitization System that addresses these challenges using an end-to-end OCR pipeline. The system integrates image preprocessing, CNN-based feature extraction, Bi-LSTM sequence modeling, and CTC-based decoding to enable accurate recognition of handwritten text without explicit segmentation. The proposed model is trained and evaluated on the IAM dataset using standard metrics such as Character Error Rate (CER) and Word Error Rate (WER). Unlike conventional OCR systems, the proposed approach extends beyond recognition by incorporating a complete application pipeline, including a Tkinter-based GUI, Flask API, and export functionality to Word and PDF formats. Experimental results demonstrate a CER of 9.94% and WER of 29.83%, outperforming standard CRNN baselines while maintaining computational efficiency. The study also highlights the significant impact of preprocessing on recognition performance and the importance of statistical evaluation in validating model effectiveness. The system provides a practical and deployable solution for real-world handwritten document digitization.