Domain Generalization via Balancing Training Diffculty and Model Capability

1S-Lab, Nanyang Technological University, Singapore

*Corresponding author.

ICCV 2023

Illustration of the proposed MoDify framework. Training domain-generalizable models often suffer from clear under-fitting (or over-fitting) if keep feeding over-difficult (or over-easy) training samples, especially at the early (or later) training stage, both leading to degraded generalization of the trained models (as illustrated in yellow/blue lines). Inspired by the Flow Theory that a learner usually has better learning outcome when the learner's skill and the task difficulty are well aligned (i.e., lying within the Flow Channel), the proposed MoDify schedules the training samples adaptively according to the alignment between the sample difficulty and the capability of contemporarily trained models (as illustrated in red line).

Abstract

Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties along the training process. MoDify consists of two novel designs that collaborate to fight against the misalignment while learning domain-generalizable models. The first is MoDify-based Data Augmentation which exploits an RGB Shuffle technique to generate difficulty-aware training samples on the fly. The second is MoDify-based Network Optimization which dynamically schedules the training samples for balanced and smooth learning with appropriate difficulty. Without bells and whistles, a simple implementation of MoDify achieves superior performance across multiple benchmarks. In addition, MoDify can complement existing methods as a plug-in, and it is generic and can work for different visual recognition tasks.

Method Overview

Overall architecture of the proposed Momentum Difficulty (MoDify). In the MoDify-DA flow (highlighted by blue arrows), the network takes the original image as input and generates its loss, and applies the loss to compute the difficulty level with the Loss Bank. MoDify-DA dynamically adjusts the strength of data augmentation. In the MoDify-NO flow (highlighted by red arrows), the network takes the augmented image as input. Then the difficulty degree of the augmented image is calculated in the same way. MoDify-NO decides whether postpone, drop, or learn from this sample. Noted the sample is fed for training only if its difficulty level is aligned with the model's capability. Additionally, MoDify-DA introduces little computational overhead without involving any back propagation.

Visualization

Visualization of the model's capability versus the augmentation degree for new training samples (indicating the difficult level of augmented training samples) along the training iterations. Colors indicate different training iterations, ranging from red to blue as the number of iterations increases. The illustration shows that a low (or high) data augmentation degree is automatically adopted to generate training samples of low (or high) difficult levels at the early (or late) training stage when the capability of contemporarily trained models is low (or high).

Qualitative Illustration

Qualitative illustration of domain generalizable semantic segmentation for GTAV to Cityscapes, BDD, and Mapillary. White boxes highlight regions with clear differences across the compared methods. Compared with other methods, MoDify predicts better building shapes in Row 1, better sidewalk in Row 2, and more accurate fence structures in Row 3.

BibTeX


          @inproceedings{jiang2023domain,
            title={Domain generalization via balancing training difficulty and model capability},
            author={Jiang, Xueying and Huang, Jiaxing and Jin, Sheng and Lu, Shijian},
            booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
            pages={18993--19003},
            year={2023}
          }