National Research Council of Canada. NRC Institute for Information Technology
Workshop on Dada Mining Methods for Anomaly Detection held in conjunction with the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24,2005, Chicago, IL, USA
Anomalies are rare events. For anomaly detection, severe class imbalance is the norm. Although there has been much research into imbalanced classes, there are sur- prisingly few examples of dealing with severe imbalance. Alternative performance mea- sures have superseded error rate, or accuracy, for algorithm comparison. But whatever their other merits, they tend to obscure the severe imbalance problem. We use the relative cost reduction of a classifier over a trivial classifier that chooses the less costly class. We show that for applications that are inherently noisy there is a limit to the cost reduction achievable. Even a Bayes optimal classifier has a vanishingly small reduction in costs as imbalance increases. If events are rare and not too costly, the unpalatable conclusion is that our learning algorithms can do little. If the events have a higher cost then a large number of false alarms must be tolerated, even if the end user finds that undesirable.
KDD-2005: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2005, Chicago, Illinois, USA: 21–24.