What is: TrIVD-GAN?

TrIVD-GAN, or Transformation-based & TrIple Video Discriminator GAN, is a type of generative adversarial network for video generation that builds upon DVD-GAN. Improvements include a novel transformation-based recurrent unit (the TSRU) that makes the generator more expressive, and an improved discriminator architecture.

In contrast with DVD-GAN, TrIVD-GAN has an alternative split for the roles of the discriminators, with $\mathcal{D}\_{S}$ judging per-frame global structure, while $\mathcal{D}\_{T}$ critiques local spatiotemporal structure. This is achieved by downsampling the $k$ randomly sampled frames fed to $\mathcal{D}\_{S}$ by a factor $s$ , and cropping $T \times H/s \times W/s$ clips inside the high resolution video fed to $\mathcal{D}\_{T}$ , where $T, H, W, C$ correspond to time, height, width and channel dimension of the input. This further reduces the number of pixels to process per video, from $k \times H \times W + T \times H/s \times W/s$ to $\left(k + T\right) \times H/s \times W/s$ .

Source	Transformation-based Adversarial Video Prediction on Large-Scale Data
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com