Evaluation Methods for Machine Learning Workshop of the Twenty-First National Conference on Artificial Intelligence, July 16-20, 2006, Boston, Massachusetts, USA
In 1988, Langley wrote an influential editorial in the journal Machine Learning titled “Machine Learning as an Experimental Science”, arguing persuasively for a greater focus on performance testing. Since that time the emphasis has become progressively stronger. Nowadays, to be accepted to one of our major conferences or journals, a paper must typically contain a large experimental section with many tables of results, concluding with a statistical test. In revisiting this paper, I claim that we have ignored most of its advice. We have focused largely on only one aspect, hypothesis testing, and a narrow version at that. This version provides us with evidence that is much more impoverished than many people realize. I argue that such tests are of limited utility either for comparing algorithms or for promoting progress in our field. As such they should not play such a prominent role in our work and publications.
Proceedings of the Evaluation Methods for Machine Learning Workshop of the Twenty-First National Conference on Artificial Intelligence.