**Fisher-BRC** is an actor critic algorithm for offline reinforcement learning that encourages the learned policy to stay close to the data, namely parameterizing the critic as the $\log$-behavior-policy, which generated the offline dataset, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. A gradient penalty regularizer is used for the offset term, which is equivalent to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature.

The **Introspective Adversarial Network (IAN)** is a hybridization of [GANs](https://paperswithcode.com/method/gan) and [VAEs](https://paperswithcode.com/method/vae) that leverages the power of the adversarial objective while maintaining the VAE’s efficient inference mechanism. It uses the discriminator of the GAN, $D$, as a feature extractor for an inference subnetwork, $E$, which is implemented as a fully-connected layer on top of the final convolutional layer of the discriminator. We infer latent values $Z \sim E\left(X\right) = q\left(Z\mid{X}\right)$ for reconstruction and sample random values $Z \sim p\left(Z\right)$ from a standard normal for random image generation using the generator network, $G$.

Three distinct loss functions are used:

- $\mathcal{L}\_{img}$, the L1 pixel-wise reconstruction loss, which is preferred to the L2 reconstruction loss for its higher average gradient.
- $\mathcal{L\_{feature}}$, the feature-wise reconstruction loss, evaluated as the L2 difference between the original and reconstruction in the space of the hidden layers of the discriminator.
- $\mathcal{L}\_{adv}$, the ternary adversarial loss, a modification of the adversarial loss that forces the discriminator to label a sample as real, generated, or reconstructed (as opposed to a binary
real vs. generated label).

Including the VAE’s KL divergence between the inferred latents $E\left(X\right)$ and the prior $p\left(Z\right)$, the loss function for the generator and encoder network is thus:

$$\mathcal{L}\_{E, G} = \lambda\_{adv}\mathcal{L}\_{G\_{adv}} + \lambda\_{img}\mathcal{L}\_{img}  + \lambda\_{feature}\mathcal{L}\_{feature}  + D\_{KL}\left(E\left(X\right) || p\left(Z\right)\right) $$

Where the $\lambda$ terms weight the relative importance of each loss. We set $\lambda\_{img}$ to 3 and leave the other terms at 1. The discriminator is updated solely using the ternary adversarial loss. During each training step, the generator produces reconstructions $G\left(E\left(X\right)\right)$ (using the standard VAE reparameterization trick) from data $X$ and random samples $G\left(Z\right)$, while the discriminator observes $X$ as well as the reconstructions and random samples, and both networks are simultaneously updated.

Neural Photo Editing with Introspective Adversarial Networks

Fisher-BRC

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

**DeepDrug** is a deep learning framework to overcome these shortcomings by using graph convolutional networks to learn the graphical representations of drugs and proteins such as molecular fingerprints and residual structures in order to boost the prediction accuracy.

Source	Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com