AICurious Logo

What is: Introspective Adversarial Network?

SourceNeural Photo Editing with Introspective Adversarial Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Introspective Adversarial Network (IAN) is a hybridization of GANs and VAEs that leverages the power of the adversarial objective while maintaining the VAE’s efficient inference mechanism. It uses the discriminator of the GAN, DD, as a feature extractor for an inference subnetwork, EE, which is implemented as a fully-connected layer on top of the final convolutional layer of the discriminator. We infer latent values ZE(X)=q(ZX)Z \sim E\left(X\right) = q\left(Z\mid{X}\right) for reconstruction and sample random values Zp(Z)Z \sim p\left(Z\right) from a standard normal for random image generation using the generator network, GG.

Three distinct loss functions are used:

  • L_img\mathcal{L}\_{img}, the L1 pixel-wise reconstruction loss, which is preferred to the L2 reconstruction loss for its higher average gradient.
  • L_feature\mathcal{L\_{feature}}, the feature-wise reconstruction loss, evaluated as the L2 difference between the original and reconstruction in the space of the hidden layers of the discriminator.
  • L_adv\mathcal{L}\_{adv}, the ternary adversarial loss, a modification of the adversarial loss that forces the discriminator to label a sample as real, generated, or reconstructed (as opposed to a binary real vs. generated label).

Including the VAE’s KL divergence between the inferred latents E(X)E\left(X\right) and the prior p(Z)p\left(Z\right), the loss function for the generator and encoder network is thus:

L_E,G=λ_advL_G_adv+λ_imgL_img+λ_featureL_feature+D_KL(E(X)p(Z))\mathcal{L}\_{E, G} = \lambda\_{adv}\mathcal{L}\_{G\_{adv}} + \lambda\_{img}\mathcal{L}\_{img} + \lambda\_{feature}\mathcal{L}\_{feature} + D\_{KL}\left(E\left(X\right) || p\left(Z\right)\right)

Where the λ\lambda terms weight the relative importance of each loss. We set λ_img\lambda\_{img} to 3 and leave the other terms at 1. The discriminator is updated solely using the ternary adversarial loss. During each training step, the generator produces reconstructions G(E(X))G\left(E\left(X\right)\right) (using the standard VAE reparameterization trick) from data XX and random samples G(Z)G\left(Z\right), while the discriminator observes XX as well as the reconstructions and random samples, and both networks are simultaneously updated.