Viet-Anh on Software Logo

What is: TrIVD-GAN?

SourceTransformation-based Adversarial Video Prediction on Large-Scale Data
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TrIVD-GAN, or Transformation-based & TrIple Video Discriminator GAN, is a type of generative adversarial network for video generation that builds upon DVD-GAN. Improvements include a novel transformation-based recurrent unit (the TSRU) that makes the generator more expressive, and an improved discriminator architecture.

In contrast with DVD-GAN, TrIVD-GAN has an alternative split for the roles of the discriminators, with D_S\mathcal{D}\_{S} judging per-frame global structure, while D_T\mathcal{D}\_{T} critiques local spatiotemporal structure. This is achieved by downsampling the kk randomly sampled frames fed to D_S\mathcal{D}\_{S} by a factor ss, and cropping T×H/s×W/sT \times H/s \times W/s clips inside the high resolution video fed to D_T\mathcal{D}\_{T}, where T,H,W,CT, H, W, C correspond to time, height, width and channel dimension of the input. This further reduces the number of pixels to process per video, from k×H×W+T×H/s×W/sk \times H \times W + T \times H/s \times W/s to (k+T)×H/s×W/s\left(k + T\right) \times H/s \times W/s.