AICurious Logo

What is: Fisher-BRC?

SourceOffline Reinforcement Learning with Fisher Divergence Critic Regularization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Fisher-BRC is an actor critic algorithm for offline reinforcement learning that encourages the learned policy to stay close to the data, namely parameterizing the critic as the log\log-behavior-policy, which generated the offline dataset, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. A gradient penalty regularizer is used for the offset term, which is equivalent to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature.