Detection Type Font

No Thumbnail Available
Date
2021
Authors
Hamadneh, Donya
Shtaiwi, Ansam
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Deep learning, one of the methods of machine learning, appeared in 2006 and has spread in the last 10 years greatly, as it has been used in many recognition and classification processes. Classification can be used in many areas such as the classification of plants, clothing, and even Fonts. Classification processes have been used extensively in many languages to classify the type of script, unlike the Arabic language, which has very little research to classify the font. There are so many types of Arabic fonts that are difficult for the unfamiliar reader to distinguish between them, and they are usually handwritten. Also, most of the applications are either for marking types of fonts for non-Arabic languages, or for general approaches to pattern recognition. This research aims to classify the 9 most common Arabic fonts using CNN convolutional neural networks. CNN neural networks were trained to distinguish Arabic fonts using a pre-coded set of images. High-resolution images were used but at a low cost due to the development of technology. The Python language was used and the functions of the TensorFlow and Keras library were used to build and train the CNN model. Accuracy results were obtained higher than 99.8%, which is the highest percentage obtained in the classification of Arabic calligraphy using CNN. A simple Android application has been created in the Java language that performs the testing process on the image to be classified. The image classification program in Python was linked to the Android application using the PHP server. Fonts differ in different languages, as there are many fonts around the world, and the line can be defined as a combination of shapes that have a specific word and problem and differ from one language to another, one of the most important types of fonts is Arabic calligraphy, Arabic calligraphy contains many patterns that have appeared over the ages, Such as Kufi, Naskh, Granada ... etc. The studies have been applied to many languages and most of them were in the English language, but there are very few studies that deal specifically with the Arabic language. Whereas the Arabic language is characterized by distinctive characteristics, one of the most important of these characteristics is word- formation by appending prefixes, suffixes, and word roots. It also has the feature of accent marks above and below letters, unlike other languages. In addition, writing in it starts from right to left. And there is a letter (ض (that is unique to it from others, so it was called the language of dahd. The Arabic language developed over time, and a different way of writing and drawing appeared (as it was considered an art of the formal arts) for many people. Consequently, there are many types of Arabic calligraphy that reflect the era, history and society that came from it. So it became necessary to distinguish the lines. Distinguishing lines was previously restricted to experts in knowing the lineage of manuscripts in museums and libraries. The rapid development of information on the Internet has led researchers to show an interest in helping users retrieve and search their information. Thus, distinguishing Arabic fonts has become a very arduous task due to the huge number of different documents and fonts, and as only experts can distinguish types of fonts, it has become necessary to use machine learning and modern technologies for the purpose of assistance. This project aims to classify the types of Arabic fonts using deep learning technology, and one of the most important technologies that have been used in classification, which has recently appeared, is the technology of deep learning using CNN networks. CNNs have shown that they do well in classification networks because they infer and analyze the outputs and thus it can be demonstrated whether the CNN is suitable for the set of documents trained. )Kufic script, Andalusian script, Naskh script, Granada line, Thuluth script, Roman script, Diwani script, free writing, and Qayrawani script). This work helps to facilitate the users to know the types of fonts, whether for the purpose of learning or education for people or institutions such as academies for teaching fonts, teachers of Arabic in particular, and speakers of non-Arabic. In this research, the following was concluded: • Classification of 9 Arabic fonts using CNN, the accuracy results were higher than 99.8%, which is the highest percentage reached so far. • Preserving a sample resulting from the training process for use in testing operations. • Simple Android application linking to help users perform the test process on any image. Conclusions: • Knowledge of deep learning using CNN networks, where we learned how to create a model for training data and how to use it for training and validation, and finally how to store the final weights and use them in the testing process. • The ability to create a simple Android application, where we learned to create a suitable GUI, how to add images and icons to it, how to access the device’s camera and take a picture of it, send an image to the server and receive a text from the server, interact with a phone simulator using the Android Studio program. • Knowledge of the TensorFlow platform and the ability to use its functions in the training process and stain. • Delve into the OpenCV library and the ability to employ its functions in creating datasets. • Learn the basics of the Python language and the ability to use its libraries, such as Numby, TensorFlow, Keras, CV2, and others.
Description
Keywords
Citation