**Vision-and-Language BERT** (**ViLBERT**) is a [BERT](https://paperswithcode.com/method/bert)-based model for learning task-agnostic joint representations of image content and natural language. ViLBERT extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional [transformer](https://paperswithcode.com/method/transformer) layers.

**Go-Explore** is a family of algorithms aiming to tackle two challenges with effective exploration in reinforcement learning: algorithms forgetting how to reach previously visited states ("detachment") and from failing to first return to a state before exploring from it ("derailment").

To avoid detachment, Go-Explore builds an archive of the different states it has visited in the environment, thus ensuring that states cannot be forgotten. Starting with an archive beginning with the initial state, the archive is built iteratively. In Go-Explore we:

(a) Probabilistically select a state from the archive, preferring states associated with promising cells. 

(b) Return to the selected state, such as by restoring simulator state or by running a goal-conditioned policy. 

(c) Explore from that state by taking random actions or sampling from a trained policy. 

(d) Map every state encountered during returning and exploring to a low-dimensional cell representation. 

(e) Add states that map to new cells to the archive and update other archive entries.

Go-Explore

Go-Explore: a New Approach for Hard-Exploration Problems

ViLBERT

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

**BoundaryNet** is a resizing-free approach for layout annotation. The variable-sized user selected region of interest is first processed by an attention-guided skip network. The network optimization is guided via Fast Marching distance maps to obtain a good quality initial boundary estimate and an associated feature representation. These outputs are processed by a Residual Graph [Convolution](https://paperswithcode.com/method/convolution) Network optimized using Hausdorff loss to obtain the final region boundary.

Source	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com