Significance
Keypoints
- Propose a framework for fast forward/inverse sampling in trained diffusion models
- Demonstrate performance of the proposed method in image and audio generation
Review
Background
Diffusion models are recently gaining attention as the generative model that enables sampling high quality image or audio data from a known, simple distribution.
The Denoising Diffusion Probabilistic Model (DDPM) is a
Keypoints
Propose a framework for fast forward/inverse sampling in trained diffusion models
The authors propose FastDPM, a method that accelerates the sampling process of the diffusion models by approximating the already trained DDPM with smaller number of steps
The noise level at step
Now, the bijective functions
Demonstrate performance of the proposed method in image and audio generation
The performance of the proposed method with respect to the number of steps
CIFAR-10 image generation result (standard DDPM FID=3.03)
CelebA image generation result (standard DDPM FID=7.00)
LSUN-bedroom image generation result
The quantitative evaluation of image generation indicate that the FastDPM methods with shorter steps can generate images of comparable quality to that of the baselines. Also, DDIM-rev generally produces better quality images when compared to DDPM-rev.
For audio generation, pretrained DiffWave model for SC09 and LJSpeech is used where
SC09 unconditional audio generation result (standard DiffWave FID=1.29, IS=5.30)
LJSpeech audio generation conditioned on mel spectrogram MOS result
For audio generation, DDPM-rev generally produces better quality audio when compared to DDIM-rev.
Another point that can be seen from the data synthesis experiments is that the variance schedule (VAR) tend to perform better for smaller number of steps
Related
- Collaborative Score Distillation for Consistent Visual Synthesis
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
- Palette: Image-to-Image Diffusion Models
- Image Synthesis and Editing with Stochastic Differential Equations
- Cascaded Diffusion Models for High Fidelity Image Generation