Multi Group classification based on Arabic

Main Article Content

Suliman Al-Shossi
Akram Alsubari

Abstract

Hierarchical text classification represents a significant and critical challenge in the field of Arabic natural language processing. This challenge is further complicated by the language's morphological richness and the scarcity of large-scale, structured datasets. This paper presents a comprehensive comparative study of two modern approaches to fine-tuning this task: a generative text-to-text approach using the AraT5 model, and a direct classification approach using the AraGPT-2 model. These models were evaluated on a large, specially collected dataset comprising over 75,000 articles distributed across 600 subcategories, as well as a smaller benchmark dataset compared to the previous literature. Experimental results demonstrated that the generative AraT5 model achieved superior performance and hierarchical consistency on this large and complex dataset. Furthermore, our improved AraGPT-2 model, enhanced with advanced regularization techniques, significantly outperformed the current literature benchmark on the compared dataset, achieving a hierarchical F1 score of 93.54%. The results indicate that, while both approaches are effective, the generative approach demonstrates a clear advantage in dealing with a fuzzy classification space. This work establishes new performance benchmarks and provides critical insights into the impact of fine-tuning strategies and data complexity on the hierarchical classification of Arabic.


 

Article Details

How to Cite
Al-Shossi, S., & Alsubari, A. (2026). Multi Group classification based on Arabic . International Journal on Advanced Electrical and Computer Engineering, 15(1S), 8–23. Retrieved from https://journals.mriindia.com/index.php/ijaece/article/view/1340
Section
Articles

Most read articles by the same author(s)

Similar Articles

<< < 1 2 3 4 5 6 

You may also start an advanced similarity search for this article.