**Bayesian Reward Extrapolation** is a Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference.

**Dynamic Keypoint Head** is an output head for pose estimation that are conditioned on each instance (person), and can encode the instance concept in the dynamically-generated weights of their filters. They are used in the [FCPose](https://paperswithcode.com/method/fcpose) architecture.

The Figure shows the core idea. $F$ denotes a level of feature maps. "Rel. Coord." means the relative coordinates, denoting the relative offsets from the locations of $F$ to the location where the filters are generated. Refer to the text for details. $f\_{\theta\_{i}}$ is the dynamically-generated keypoint head for the $i$-th person instance. Note that each person instance has its own keypoint head.

Dynamic Keypoint Head

FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions

Bayesian REX

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

**ERNIE-GEN** is a multi-flow sequence to sequence pre-training and fine-tuning framework which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder.

Source	Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com