Num Hoor Qadri Nisreen Al-Saleh Idea I D E A Solved Problems Find the number quickly. Heuristic. Real-time. Our Aim Find the numbers quickly Target Market Students Journalists Researchers Business Companies Project Timeline Sep Oct Nov Dec Jan Feb Mar Apr May Software Engineering NLP Analysis & documenting Backend process Frontend process Interface Indexing Crawling Project flowchart Crawling Study of Web Crawler and its Different Types [1] Text data Preprocessing Clean Text Name Entity Recognition(Gate) Advantages: Supports Arabic Language Disadvantages: Lack of resources Not easy to learn Gate tool[2] Advantages: Extract entities (Location , Person ,Organization and numbers). Disadvantages: Hard to recognize the chunks of the data. Example: Text : An-Najah National University Results : An-Najah: Org , National: Org , University: Org Name Entity Recognition(Stanford) Stanford[3] 11 Advantages: Extract entities (Location , Person ,Organization, Phone,Emails,URLs,Percentages,Product,Money and Keywords). Can recognize the chunks of the data. Disadvantages: Can’t make more than 60 request per minute. Can’t make more than 1000 call per day. Name Entity Recognition(AYLIEN) Name Entity Recognition tools Combination Aylien LOC ORG PER Keywords Stanford NER NUM IRS Lucene flow[4] 2 3 Choose appropriate crawling library 1 Page structures Choose appropriate NLP tool Challenges Acknowledgment Dr.Abed Razzaq Natsheh Dr.Hamed Abdelhaq Mr.Omar Rayyan References [1] Trupti V. Udapure , Ravindra D. Kale , Rajesh C. Dharmik, “Study of Web Crawler and its Different Types”, IOSR Journal of Computer Engineering (IOSR-JCE), Volume 16, Issue 1, Ver. VI (Feb. 2014). [2] Gate general architecture for text engineering , https://gate.ac.uk/ [3] The Stanford Natural Language Processing Group, https://nlp.stanford.edu/software/ [4]Lucene flow, http://gopaldas.org/wp-content/uploads/2015/10/lucence-flow.png ANY QUESTIONS ? ? ? image2.jpeg image3.png image4.png image5.png image6.png image7.jpeg image8.png image9.png image10.png image1.jpeg /docProps/thumbnail.jpeg