National Research Council of Canada. Information and Communication Technologies
Text, Book Chapter
19th Iberoamerican Congress, CIARP 2014, November 2-5, 2014, Puerto Vallarta, Mexico
knowledge discovery; imbalanced data; gene expression data
The prime motivation for pattern discovery and machine learning research has been the collection and warehousing of large amounts of data, in many domains such as life sciences and industrial processes. Examples of unique problems arisen are situations where the data is imbalanced. The class imbalance problem corresponds to situations where majority of cases belong to one class and a small minority belongs to the other, which in many cases is equally or even more important. To deal with this problem a number of approaches have been studied in the past. In this talk we provide an overview of some existing methods and present novel applications that are based on identifying the inherent characteristics of one class vs the other. We present the results of a number of studies focusing on real data from life science applications.
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (2014): 159–166.