National Research Council of Canada. NRC Institute for Information Technology
This paper investigates the application of text categorization (TC) in an eBusiness setting that exhibits a large number of target categories with relatively few training cases, applied to a real-life online tendering system. This is an experiment paper showing our experiences in dealing with a real-life application using the conventional machine learning approaches for TC, namely, the Rocchio method, TF-IDF (term frequency-inverse document frequency), WIDF (weighted inverse document frequency), and naïve Bayes. In order to make the categorization results acceptable for industrial use, we made use of the hierarchical structure of the target categories and investigated the semi-automated ranking categorization.
Journal of Business and Technology1 (1 October 2005).