What is: Introspective Adversarial Network?

The Introspective Adversarial Network (IAN) is a hybridization of GANs and VAEs that leverages the power of the adversarial objective while maintaining the VAE’s efficient inference mechanism. It uses the discriminator of the GAN, $D$ , as a feature extractor for an inference subnetwork, $E$ , which is implemented as a fully-connected layer on top of the final convolutional layer of the discriminator. We infer latent values $Z \sim E\left(X\right) = q\left(Z\mid{X}\right)$ for reconstruction and sample random values $Z \sim p\left(Z\right)$ from a standard normal for random image generation using the generator network, $G$ .

Three distinct loss functions are used:

$\mathcal{L}\_{img}$ , the L1 pixel-wise reconstruction loss, which is preferred to the L2 reconstruction loss for its higher average gradient.
$\mathcal{L\_{feature}}$ , the feature-wise reconstruction loss, evaluated as the L2 difference between the original and reconstruction in the space of the hidden layers of the discriminator.
$\mathcal{L}\_{adv}$ , the ternary adversarial loss, a modification of the adversarial loss that forces the discriminator to label a sample as real, generated, or reconstructed (as opposed to a binary real vs. generated label).

Including the VAE’s KL divergence between the inferred latents $E\left(X\right)$ and the prior $p\left(Z\right)$ , the loss function for the generator and encoder network is thus:

$\mathcal{L}\_{E, G} = \lambda\_{adv}\mathcal{L}\_{G\_{adv}} + \lambda\_{img}\mathcal{L}\_{img} + \lambda\_{feature}\mathcal{L}\_{feature} + D\_{KL}\left(E\left(X\right) || p\left(Z\right)\right)$

Where the $\lambda$ terms weight the relative importance of each loss. We set $\lambda\_{img}$ to 3 and leave the other terms at 1. The discriminator is updated solely using the ternary adversarial loss. During each training step, the generator produces reconstructions $G\left(E\left(X\right)\right)$ (using the standard VAE reparameterization trick) from data $X$ and random samples $G\left(Z\right)$ , while the discriminator observes $X$ as well as the reconstructions and random samples, and both networks are simultaneously updated.

Source	Neural Photo Editing with Introspective Adversarial Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com