Era Identification of Arabic Text Using Stylometric Features

dc.contributor.advisorArandi, Samer
dc.contributor.authorShanti, maz
dc.contributor.authorSaleh, Dia
dc.date.accessioned2019-07-03T10:38:03Z
dc.date.available2019-07-03T10:38:03Z
dc.date.issued2017
dc.description.abstractIn the Arabic studies there is a theory claim that ancient authors of Arabic books more eloquent than modern authors and that lead to the hardness of reading ancient books , this project tries to proof that theory by analyze both ancient and modern books using stylometric features which is a set of methods used to analyze the texts and get meta-data about it , there is a lot of features some of them contain significant changes with 026 . 0 p- value and other features didn’t change through the decades of Arabic writings ,also some features kept increasing until the last decade it decreased for example the letter ’ الهاء ) ‘h) at the end of the words ’ ه - ‘pronoun in Arabic- like in ’ له ) ‘lh) which mean ”for him”. Those features must be apply on a very large texts (corpus) to test the significant but the Arabic corpora are very rare beside the published ones doesn’t contain any old texts all of them is very modern and from blogs or the websites. So there was a necessary to collect and clean new Arabic corpus with texts from the year of 100 Hijri (718) to 1439 Hijri (2017) and publish it on the Internet. The stylometric analysis which applied on the new corpus was fed to Naive Bayes classifier and when try to test a new document the classifier will predict the writing year for the document with a small error (example: the document may be written from 500 - 600 Hijri) and the result was very good with Precision of .7% . 83en_US
dc.identifier.urihttps://hdl.handle.net/20.500.11888/14406
dc.language.isoenen_US
dc.titleEra Identification of Arabic Text Using Stylometric Featuresen_US
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Era Identification of Arabic Text Using Stylometric Features.pptx
Size:
4.89 MB
Format:
Microsoft Powerpoint XML
Description:
Loading...
Thumbnail Image
Name:
Abstract - Era identification.pdf
Size:
41.98 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: