[2106.08062] SSMix: Saliency-Based Span Mixup for Text Classification

Significance

Intuitive mixup augmentation for text data

Keypoints

  • Propose a token level mixup augmentation method for text classification
  • Demonstrate improvement of text classification performance using the proposed augmentation

Review

Background

Mixup is a widely adopted data augmentation method for computer vision tasks. Application of mixup to natural language processing is more challenging due to discrete nature of text data and variable sequence lengths. Most previous attempts to bring mixup to natural language is performed on the hidden feature level due to these difficulties. The authors address the issue that mixup augmentation at the input token level, as in the case of mixup augmentation of image data, would result in better performance and propose a saliency guided span mixup method.

Keypoints

Propose a token level mixup augmentation method for text classification

The authors propose SSMix, an intuitive saliency based mixup method which operates on the input token level. 210616-1 Illustration of the proposed SSMix Given a pair of input data $(x^{A},x^{B})$ and corresponding labels $(y^{A},y^{B})$ the augmentation data $\tilde{x}$ is obtained by replacing the span from $x^{A}$ to a span from $x^{B}$ with the same length. In this process, the span is selected by the gradient-based saliency $s=||\partial \mathcal{L} / \partial e||_{2}$ where the span with lowest saliency $s$ from $x^{A}$ is replaced by the span with highest $s$ from $x^{B}$. The mixup ratio for computing the augmentation label $\tilde{y}$ follows the ratio of the replaced span length with respect to the total length of $\tilde{x}$.

Theoretically, input token level mixup can be thought of a nonlinear combination of the input pairs $(x^{A},x^{B})$ covering a higher dimensional subspace of the input space, while previous hidden level linear interpolation mixup stays within the 1 dimensional subspace between the $x^{A}$ and the $x^{B}$. 210616-2 Subspace coverage of SSMix (pink) and hidden-level mixup (black)

Demonstrate improvement of text classification performance using the proposed augmentation

Effectiveness of SSMix is evaluated on eight datasets including SST-2, MNLI, QNLI, RTE, MRPC, QQP (from the GLUE benchmark), TREC, and ANLI. The datasets include both single sentence classification and sentence pair classification tasks. Average accuracy with SSMix is significantly higher when compared to other baseline hidden-level mixup methods (EmbedMix, TMix). Ablation study results also suggest that the saliency and span level augmentation of the SSMix is crucial for achieving its good performance. 210616-3 Comparative and ablation study results of the proposed SSMix

The experiment results suggest that this simple and intuitive method can improve the performance of text classification.

Related

Share

#language-model #responsible-ai #privacy-preserving #scaling #mixture-of-experts #image-generation #diffusion-model #generative-adversarial-network #speech-model #multi-modal #contrastive-learning #self-supervised #image-representation #image-processing #object-detection #pseudo-labeling #scene-text-detection #neural-architecture-search #notice #data-sampling #long-tail #graph-neural-network #graph-representation #zero-shot #metric-learning #federated-learning #weight-matrix #low-rank #vision-transformer #computer-vision #normalizing-flow #invertible-neural-network #super-resolution #image-manipulation #thread-summarization #natural-language-processing #domain-adaptation #knowledge-distillation #scene-text #model-compression #semantic-segmentation #instance-segmentation #video-understanding #code-generation #graph-generation #image-translation #data-augmentation #model-pruning #signal-processing #text-generation #text-classification #music-representation #transfer-learning #link-prediction #counterfactual-learning #medical-imaging #acceleration #transformer #style-transfer #novel-view-synthesis #point-cloud #spiking-neural-network #optimization #multi-layer-perceptron #adversarial-training #visual-search #image-retrieval #negative-sampling #action-localization #weakly-supervised #data-compression #hypergraph #adversarial-attack #submodularity #active-learning #deblurring #object-tracking #pyramid-structure #loss-function #gradient-descent #generalization #bug-fix #orthogonality #explainability #saliency-mapping #information-theory #question-answering #knowledge-graph #robustness #limited-data #recommender-system #anomaly-detection #gaussian-discriminant-analysis #molecular-graph #video-processing