The occurrence of a word, one or more times, in a document is taken as an attribute of that document. Using a simple formula from Bayes probability, a probability is derived, based on that word, that the document belongs in a certain category. The procedure is applied to all the words of a document and the words are then ordered by probability to form a list. The procedure is also used to form category lists from existing categories although original categories could be formed. Document lists are compared to category lists and probability sums formed for indexing. Two sample category lists, derived from abstracts are given. Simple modifications show the ease of modifying list characteristics – two occurrences of a word, or occurrence in two documents being substituted for a single simple occurrence.
Date de publication
National Research Council of Canada. Radio and Electrical Engineering Division
Report (National Research Council of Canada. Radio and Electrical Engineering Division : ERB), nº ERB-793.