Friday, September 5, 2014

Automatic keyword discovery

Some word may occur N1 times in an entire library of documents.  The same word may appear N2 times in some single document.  Words that have the highest values of the ratio N2/N1 for a given document are the best keywords for that document.  (A stoplist may be used as a preprocessor.) Such a scheme might also be useful for routing input to a collection of specialist AIs.(22 Aug. 2014 blog)  Patterns could be counted as well as traditional words.

