Monday, October 1, 2012

The miscellaneous category (in a casebase)

If a case is "far enough" away from all existing clusters (categories), and  <N other cases have been found that are "near" to it, then add this case to a "miscellaneous" category.

Search and employ the miscellaneous category as if it was a cluster of its own. (But do not define a mean and standard deviations for it. It is not compressible.)

Remove a case from "miscellaneous" and form a new cluster if and when N is exceeded (i.e., when enough cases like this one are discovered).  (A reasonable value for N might be estimated/arrived at by looking at the number of cases found in all the other categories; the mean and standard deviation of this number. N should be set a few standard deviations below the mean. The size of the "miscellaneous" category should also be kept similar to the size of other categories.)

No comments:

Post a Comment