Using NLP for Arabic business email classification

No Thumbnail Available
Dweikat, Omar
Nooraldeen, Zreaq
Journal Title
Journal ISSN
Volume Title
Ever since the creation of the first computer, humans have had trouble interacting with machines, Natural Language Processing(NLP) is the medium through which modern humans talk with machines, in a way they both understand, by applying concepts of Artificial Intelligence(AI), to transform complex human language, into numbers understandable by machines. Arabic NLP(ANLP) is one field of NLP, which has yet to flourish fully and become potent enough to compare with other languages like English. It is for this reason that this paper focuses on ANLP, trying to research the field, by using an original, Arabic only dataset, to create and train different models of text analysis. The investigation done on three types of text analysis, Urgency Detection, Sentiment Analysis and Topic Classification, show that Urgency Detection is the simplest of the three, and the one with the best model, with an accuracy of 87\%, as for Sentiment Analysis, the results show that the detection of sentiment in the Arabic language is a challenging one, as the Sentiment model had a final accuracy of 78\%, after over 50 different models and alternations were tried. Finally, Topic Classification in Arabic seems to be a problem of dataset size and complexity, as the dataset grows larger and gets less complex, the accuracy readily increases, the accuracy of topic with larger datasets was notably higher, and the accuracy of less complex topics, ones with a less diverse set of subtopics, also turned out to be significantly higher.