Documents Classification System Using RapidMiner Tool
No Thumbnail Available
The exponential growth of the Internet has led to a great deal of interest especially by companies in developing useful and efficient Tools and Software’s to assist employees for doing their job and users for searching the web. However, the complexity of Natural Languages and the extremely High Dimensionality of the feature space of documents have made this Classification problem very difficult. We investigate four different Methods for Document Classification such as: the Naive Bayes classifier, the Nearest Neighbor Classifier, Decision Trees and a Support Vector Machine. These were applied to five classes of BBC and Reuters's news groups which is (Business, Entertainment, Politics, Sports and Technology) individually by using RapidMiner as a Tool. Our experimental results indicate that the Naive Bayes Classifier outperform the other classifiers on our data sets with a best accuracy of 85%. So we recommended companies to use RapidMiner as a Tool to classify their Documents and Naive Base as an algorithm to do this Classification.