[2103.15061] Invertible Image Signal Processing

Significance

Keep the JPEG, then RAW is in your hand

Keypoints

  • Apply invertible neural network to solve RAW to RGB image signal processing problem in an end-to-end manner
  • Show qualitative and quantitative strengths over a few baseline models

Review

Background

The invertible neural network (INN) is a class of the generative models, along with the generative adversarial network (GAN) and the variational auto-encoder (VAE). Although INNs inherit solid theoretical background and its clever implementation, practical application for image generation task has been limited by its relative under-performance compared to the GANs or the VAEs. However, recent works are starting to recognize the potential strength of the INNs for solving image-processing problems, which can benefit from explicit invertibility. This work also exploits the invertibility of the INN to train an end-to-end model that can convert RAW image data to a compressed JPEG data, and vice versa.

Keypoints

Apply invertible neural network to solve RAW to RGB image signal processing problem in an end-to-end manner

210330-1 Overall pipeline of the InvISP framework The building block of the neural network is based on the affine coupling layer of the RealNVP. An arbitrary function $r$ with second split as the prior is summed up to the first split of the affine coupling layer to increase expressivity of the model following IRN. Also, an 1$\times$1 convolution layer is utilized as a learnable permutation function following Glow. (Theoretical background of the affine coupling layer is not discussed in this post, but it is really worth knowing. NICE, RealNVP are some of the earliest papers that achieve invertibility of a neural network layer by turning its Jacobian determinant into a triangular matrix!)

Although the basic building block is merely a combination of known modules, novelty of the InvISP comes with the differentiable JPEG simulator. The authors address that the JPEG compression is not invertible due to the quantization process, and replace the rounding function to a differentiable function based on the Fourier series: \begin{equation} Q(I) = I - \frac{1}{\pi} \sum\nolimits^{K}_{k=1}\frac{(-1)^{k+1}}{k}sin(2 \pi k I), \end{equation} where $I$ is the input map after splitting, and $K$ is the tradeoff hyperparameter. 210330-2 Differentiable approximate rounding function Now that the JPEG compression is differentiable, it can be incorporated into the backpropagation of the INN in an end-to-end manner.

Show qualitative and quantitative strengths over a few baseline models

First experiment demonstrates the quantiative performance (PSNR/SSIM) of the proposed method over other baselines. 210330-3 Quantitative performance of the proposed method

Considering the invertibility of the proposed model, qualitative improvement of the JPEG to RAW can be a main strength of the proposed method.
The authors provide examples of qualitative improvement over UPI, CycleISP, Invertible Grayscale, and the U-net. 210330-4 Qualitative improvement over SOTA methods, demonstrated by difference image 210330-5 Qualitative improvement over baseline methods, demonstrated by difference image

For the proposed model to hold practical significance, the compression ratio should be at least comparable to that of the conventional RAW to RGB image processing followed by JPEG compression. The authors show that the file size is remarkably reduced, even when compared with lossy DNG. 210330-6 Compression ratio of the proposed model

Related

Share

Comment

#image-generation #multi-modal #language-model #retrieval-augmentation #robotics #forecasting #psychiatry #instruction-tuning #diffusion-model #notice #graph-neural-network #responsible-ai #privacy-preserving #scaling #mixture-of-experts #generative-adversarial-network #speech-model #contrastive-learning #self-supervised #image-representation #image-processing #object-detection #pseudo-labeling #scene-text-detection #neural-architecture-search #data-sampling #long-tail #graph-representation #zero-shot #metric-learning #federated-learning #weight-matrix #low-rank #vision-transformer #computer-vision #normalizing-flow #invertible-neural-network #super-resolution #image-manipulation #thread-summarization #natural-language-processing #domain-adaptation #knowledge-distillation #scene-text #model-compression #semantic-segmentation #instance-segmentation #video-understanding #code-generation #graph-generation #image-translation #data-augmentation #model-pruning #signal-processing #text-generation #text-classification #music-representation #transfer-learning #link-prediction #counterfactual-learning #medical-imaging #acceleration #transformer #style-transfer #novel-view-synthesis #point-cloud #spiking-neural-network #optimization #multi-layer-perceptron #adversarial-training #visual-search #image-retrieval #negative-sampling #action-localization #weakly-supervised #data-compression #hypergraph #adversarial-attack #submodularity #active-learning #deblurring #object-tracking #pyramid-structure #loss-function #gradient-descent #generalization #bug-fix #orthogonality #explainability #saliency-mapping #information-theory #question-answering #knowledge-graph #robustness #limited-data #recommender-system #anomaly-detection #gaussian-discriminant-analysis #molecular-graph #video-processing