Detection Type Font
No Thumbnail Available
Date
2021
Authors
Hamadneh, Donya
Shtaiwi, Ansam
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Deep learning, one of the methods of machine learning, appeared in 2006 and has
spread in the last 10 years greatly, as it has been used in many recognition and
classification processes. Classification can be used in many areas such as the
classification of plants, clothing, and even Fonts. Classification processes have been
used extensively in many languages to classify the type of script, unlike the Arabic
language, which has very little research to classify the font. There are so many
types of Arabic fonts that are difficult for the unfamiliar reader to distinguish
between them, and they are usually handwritten. Also, most of the applications
are either for marking types of fonts for non-Arabic languages, or for general
approaches to pattern recognition. This research aims to classify the 9 most
common Arabic fonts using CNN convolutional neural networks. CNN neural
networks were trained to distinguish Arabic fonts using a pre-coded set of images.
High-resolution images were used but at a low cost due to the development of
technology. The Python language was used and the functions of the TensorFlow
and Keras library were used to build and train the CNN model. Accuracy results
were obtained higher than 99.8%, which is the highest percentage obtained in the
classification of Arabic calligraphy using CNN. A simple Android application has
been created in the Java language that performs the testing process on the image
to be classified. The image classification program in Python was linked to the
Android application using the PHP server.
Fonts differ in different languages, as there are many fonts around the world, and
the line can be defined as a combination of shapes that have a specific word and
problem and differ from one language to another, one of the most important types
of fonts is Arabic calligraphy, Arabic calligraphy contains many patterns that have
appeared over the ages, Such as Kufi, Naskh, Granada ... etc.
The studies have been applied to many languages and most of them were in the
English language, but there are very few studies that deal specifically with the
Arabic language. Whereas the Arabic language is characterized by distinctive
characteristics, one of the most important of these characteristics is word-
formation by appending prefixes, suffixes, and word roots. It also has the feature
of accent marks above and below letters, unlike other languages. In addition,
writing in it starts from right to left. And there is a letter (ض (that is unique to it
from others, so it was called the language of dahd.
The Arabic language developed over time, and a different way of writing and
drawing appeared (as it was considered an art of the formal arts) for many people.
Consequently, there are many types of Arabic calligraphy that reflect the era,
history and society that came from it. So it became necessary to distinguish the
lines.
Distinguishing lines was previously restricted to experts in knowing the lineage of
manuscripts in museums and libraries. The rapid development of information on
the Internet has led researchers to show an interest in helping users retrieve and
search their information. Thus, distinguishing Arabic fonts has become a very
arduous task due to the huge number of different documents and fonts, and as
only experts can distinguish types of fonts, it has become necessary to use
machine learning and modern technologies for the purpose of assistance.
This project aims to classify the types of Arabic fonts using deep learning
technology, and one of the most important technologies that have been used in
classification, which has recently appeared, is the technology of deep learning
using CNN networks. CNNs have shown that they do well in classification networks
because they infer and analyze the outputs and thus it can be demonstrated
whether the CNN is suitable for the set of documents trained. )Kufic script,
Andalusian script, Naskh script, Granada line, Thuluth script, Roman script, Diwani
script, free writing, and Qayrawani script).
This work helps to facilitate the users to know the types of fonts, whether for the
purpose of learning or education for people or institutions such as academies for
teaching fonts, teachers of Arabic in particular, and speakers of non-Arabic.
In this research, the following was concluded:
• Classification of 9 Arabic fonts using CNN, the accuracy results were higher
than 99.8%, which is the highest percentage reached so far.
• Preserving a sample resulting from the training process for use in testing
operations.
• Simple Android application linking to help users perform the test process on
any image.
Conclusions:
• Knowledge of deep learning using CNN networks, where we learned how to
create a model for training data and how to use it for training and validation,
and finally how to store the final weights and use them in the testing
process.
• The ability to create a simple Android application, where we learned to
create a suitable GUI, how to add images and icons to it, how to access the
device’s camera and take a picture of it, send an image to the server and
receive a text from the server, interact with a phone simulator using the
Android Studio program.
• Knowledge of the TensorFlow platform and the ability to use its functions in
the training process and stain.
• Delve into the OpenCV library and the ability to employ its functions in
creating datasets.
• Learn the basics of the Python language and the ability to use its libraries,
such as Numby, TensorFlow, Keras, CV2, and others.