Warning: statistical benchmarking is addictive, kicking the habit in machine learning

DOI	Resolve DOI: https://doi.org/10.1080/09528130903010295
Author	Search for: Drummond, Chris¹; Search for: Japkowicz, Nathalie
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Subject	machine learning; algorithm evaluation; benchmarking; null hypothesis tests
Abstract	Algorithm performance evaluation is so entrenched in the machine learning community that one could call it an addiction. Like most addictions, it is harmful and very difficult to give up. It is harmful because it has serious limitations. Yet, we have great faith in practicing it in a ritualistic manner: we follow a fixed set of rules telling us the measure, the data sets and the statistical test to use. When we read a paper, even as reviewers, we are not sufficiently critical of results that follow these rules. Here, we will debate what are the limitations and how to best address them. This article may not cure the addiction but hopefully it will be a good first step along that road.
Publication date	2009-12-18
Publisher	Taylor & Francis
In	Journal of Experimental and Theoretical Artificial Intelligence 22, no. 1: 67–80.
Language	English
Peer reviewed	Yes
NPARC number	23002090
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	a579582d-6412-4e35-b434-e89e008e276a
Record created	2017-08-10
Record modified	2020-04-16