Frame Interpolation with Consecutive Brownian Bridge Diffusion

Zonglin Lyu¹, Ming Li², Jianbo Jiao³, Chen Chen²

¹University of Utah,²Center for Research in Compter VisionUniversity of Central Florida,³University of Birmingham
ACM MM 2024

Paper Code arXiv

Overview

Recent diffusion-based methods in Video Frame Interpolation (VFI) uses conditional generation, but the variances at each sampling step will accumulate. VFI requires low-variance generation because the ground truth is determinsitic. Our consecutive Brownian Bridge transits among three points: previous frame I₀, current frame I_n, and next frame I₁, and achieves lower accumulated vairance than the conditional generation (ours is about 2 while conditional generation is more than 11). To be more efficient, images are encoded to latent space, and the encoder features of neighboring frames are passed to the decoder. We also take advangtage of optical flows estimation to warp the encoder features of neighborng frame and fuse them with the decoder features of the intermediate frame.

Overview of our method. (a) Autoencoder The encoder features of neighboring frames are passed to the decode to provide detail information. (b) Ground truth estimation with diffusion. Images are encoded to latent representations to effciently implement diffusion models. (c) Inference. During inference, the sampled latent representation are decoded, with the assistance of features from neighboring frames.

Architecture of The Autoencoder

Our autoencoder takes advantage of optical flow estimation to warp encoder features of neighboring frames and fused them with the decoder features via cross-attention.

Architecture of the autoencoder. The encoder is in green dashed boxes, and the decoder contains all remaining parts. The output of consecutive Brownian Bridge diffusion will be fed to the VQ layer. The features of I₀, I₁ at different down-sampling rate will be sent to the cross-attention module at Up Sample Block in the Decoder

Results

Quantitative Comparison between our method and recent SOTAs in LPIPS/FloLPIPS/FID (the lower the better). The best performance are boldfaced, and the second best performance are underlined. Click the "next" icon to see the qualitative comparison between our method and recent SOTAs.

Qualitative comparison between our method and recent SOTAs. Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance. Click the "next" icon to see additional qualitative results.

Additional Qualitative comparison between our method and recent SOTAs. Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance. Click the "next" icon to see the qualitative comparison between our method and LDMVFI.

Qualitative comparison between our method and LDMVFI Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance.

Multi-frame Interpolation

We interpolate 7 frames between two frames and visualize the generated video with our method and LDMVFI. This multi-frame interpolation is achieved via bisection-like method. Click "next" icon to view more images.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The vehicle on the left is distorted.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The man is overlaid during interpolation.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The bird is not realistic.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The man and his clothes (upper left corner) are unrealistic during interpolation.

Click the "next" icon to see more.

Frame Interpolation with Consecutive Brownian Bridge Diffusion

Abstract

Overview

Architecture of The Autoencoder

Results

Quantitative Comparison between our method and recent SOTAs in LPIPS/FloLPIPS/FID (the lower the better). The best performance are boldfaced, and the second best performance are underlined. Click the "next" icon to see the qualitative comparison between our method and recent SOTAs.

Qualitative comparison between our method and recent SOTAs. Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance. Click the "next" icon to see additional qualitative results.

Additional Qualitative comparison between our method and recent SOTAs. Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance. Click the "next" icon to see the qualitative comparison between our method and LDMVFI.

Qualitative comparison between our method and LDMVFI Images cropped in blue boxes are displayed for comparison, and red circles highlight our stronger performance.

Multi-frame Interpolation

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The vehicle on the left is distorted.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The man is overlaid during interpolation.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The bird is not realistic.

The starting frame of multiframe interpolation. Click the "next" icon to see the ending frame.

The ending frame of multiframe interpolation. Click the "next" icon to see our interpolation result.

Multiframe interpolation result of our method. The first and the last frame are given, the middle 7 frames are interpolated. Click the "next" icon to see the result of LDMVFI.

Multi-frame interpolation result of LDMVFI. The first and the last frame are given, the middle 7 frames are interpolated. The man and his clothes (upper left corner) are unrealistic during interpolation.

More Multiframe Interpolation Visualization of our method. We interpolate 7 frames between two frames.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.

Click the "next" icon to see more.