If a case is "far enough" away from all existing clusters (categories), and <N other cases have been found that are "near" to it, then add this case to a "miscellaneous" category.
Search and employ the miscellaneous category as if it was a cluster of its own. (But do not define a mean and standard deviations for it. It is not compressible.)
Remove a case from "miscellaneous" and form a new cluster if and when N is exceeded (i.e., when enough cases like this one are discovered). (A reasonable value for N might be estimated/arrived at by looking at the number of cases found in all the other categories; the mean and standard deviation of this number. N should be set a few standard deviations below the mean. The size of the "miscellaneous" category should also be kept similar to the size of other categories.)
No comments:
Post a Comment