DOI | Resolve DOI: https://doi.org/10.1109/ICDE55515.2023.00114 |
---|
Author | Search for: Dablain, Damien A.; Search for: Bellinger, Colin1; Search for: Krawczyk, Bartosz; Search for: Chawla, Nitesh V. |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Format | Text, Article |
---|
Conference | 2023 IEEE 39th International Conference on Data Engineering (ICDE), April 3-7, 2023, Anaheim, CA, USA |
---|
Subject | machine learning; deep learning; class imbalance; over-sampling; training; source coding; earth observing system; neural networks; training data; data augmentation |
---|
Abstract | Deep learning models may not effectively generalize across under-represented or minority classes. We empirically study a convolutional neural network’s (CNN) internal representation of imbalanced image data and measure the generalization gap between a model’s feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data. The framework involves training the network end-to-end on imbalanced data to learn feature embeddings, performing data augmentation in the learned embedding space to balance the training data distribution, and fine-tuning the classifier head on the embedded balanced training data. We develop Expansive Over-Sampling (EOS) as a data augmentation technique to utilize in the training framework. EOS forms synthetic training instances as convex combinations between the minority class samples and their nearest adversaries in the embedding space to reduce the generalization gap. The proposed framework improves the accuracy over leading cost-sensitive and resampling methods commonly used in imbalanced learning. Moreover, it is more computationally efficient than standard data pre-processing methods, such as SMOTE and GAN-based over-sampling, as it requires fewer parameters and less training time. The source code for the proposed framework is available at: https://github.com/dd1github/EOS. |
---|
Publication date | 2023-04-03 |
---|
Publisher | IEEE |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | e36d8c80-1502-47d5-bfee-59c7849838b9 |
---|
Record created | 2023-08-09 |
---|
Record modified | 2023-08-10 |
---|