[2111.03056] Bootstrap Your Object Detector via Mixed Training

Significance

Augmentation and pseudo-labeling enhances object detection

Keypoints

  • Propose a training framework using augmentation and pseudo-labeling for improving object detection model performance
  • Demonstrate performance improvement in object detection with the proposed method compared to simple training frameworks

Review

Background

Object detection models are often trained in a supervised way with pairs of an input image (with augmentation) and its corresponding human-annotated bounding box coordinates. This simple training framework is straightforward but can easily be affected by input-label mismatch made by strong augmentation discarding appropriate input features, or annotation errors made by human mistakes. The authors address this issue of input-label mismatch by a form of bootstrapping during training. More specifically, the authors address these issues by providing separate augmentation pathways (normal / strong) and revising the human annotation with pretrained detection models.

Keypoints

Propose a training framework using augmentation and pseudo-labeling for improving object detection model performance

211105-1 Schematic illustration of the proposed method

As mentioned in the background section, the propose a training framework by separating the input image augmentation to normal / strong augmentation paths. For augmented images (middle and bottom row of the above figure), probability map of the objects are obtained by the detector model that we are training. While normal augmented images are simply trained with human annotated labels, strong augmented images are trained with refined labels.

To obtain the refined label, exponential moving average of pre-trained detector is used to exploit model prediction of the bounding box annotation. Final training target labels are obtained by combining the label and the model prediction with IoU under 0.5 (missing label correction) and over 0.5 (noisy label correction). The denoised labels with IoU over 0.5 is further filtered by the condition which satisfies that model prediction score over 0.9 is only used, because labels with low prediction score may not appropriately contain strong augmented image features in the input.

211105-2 Pseudo-labeling scheme. (a) Blue and (b) green boxes are human annotations and model predictions, respectively. Application of (c) missing label correction, (d) noisy label correction, and (e) hybrid (both).

Demonstrate performance improvement in object detection with the proposed method compared to simple training frameworks

Performance of the proposed method (MixTraining) is validated on the COCO2017 dataset by comparing it with simple training (SiTraining) frameworks with single augmentation (normal or strong).

211105-3 Definition of normal and strong augmentation

211105-4 Quantiative performance of MixTraining compared to SiTraining

Experiments demonstrate that MixTraining is especially beneficial with longer training setting, suggesting the stabilizing effect of MixTraining compared to SiTraining.

211105-5 MixTraining is beneficial for longer training

Further ablation study results are referred to the original paper.

Related

Share

Comment

#image-generation #multi-modal #language-model #retrieval-augmentation #robotics #forecasting #psychiatry #instruction-tuning #diffusion-model #notice #graph-neural-network #responsible-ai #privacy-preserving #scaling #mixture-of-experts #generative-adversarial-network #speech-model #contrastive-learning #self-supervised #image-representation #image-processing #object-detection #pseudo-labeling #scene-text-detection #neural-architecture-search #data-sampling #long-tail #graph-representation #zero-shot #metric-learning #federated-learning #weight-matrix #low-rank #vision-transformer #computer-vision #normalizing-flow #invertible-neural-network #super-resolution #image-manipulation #thread-summarization #natural-language-processing #domain-adaptation #knowledge-distillation #scene-text #model-compression #semantic-segmentation #instance-segmentation #video-understanding #code-generation #graph-generation #image-translation #data-augmentation #model-pruning #signal-processing #text-generation #text-classification #music-representation #transfer-learning #link-prediction #counterfactual-learning #medical-imaging #acceleration #transformer #style-transfer #novel-view-synthesis #point-cloud #spiking-neural-network #optimization #multi-layer-perceptron #adversarial-training #visual-search #image-retrieval #negative-sampling #action-localization #weakly-supervised #data-compression #hypergraph #adversarial-attack #submodularity #active-learning #deblurring #object-tracking #pyramid-structure #loss-function #gradient-descent #generalization #bug-fix #orthogonality #explainability #saliency-mapping #information-theory #question-answering #knowledge-graph #robustness #limited-data #recommender-system #anomaly-detection #gaussian-discriminant-analysis #molecular-graph #video-processing