**Context Enhancement Module (CEM)** is a feature extraction module used in object detection (specifically, [ThunderNet](https://paperswithcode.com/method/thundernet)) which aims to  to enlarge the receptive field. The key idea of CEM is to aggregate multi-scale local context information and global context information to generate more discriminative features. In CEM, the feature maps from three scales are merged: $C\_{4}$, $C\_{5}$ and $C\_{glb}$. $C\_{glb}$ is the global context feature vector by applying a [global average pooling](https://paperswithcode.com/method/global-average-pooling) on $C\_{5}$. We then apply a 1 × 1 [convolution](https://paperswithcode.com/method/convolution) on each feature map to squeeze the number of channels to $\alpha \times p \times p = 245$.

Afterwards, $C\_{5}$ is upsampled by 2× and $C\_{glb}$ is broadcast so that the spatial dimensions of the three feature maps are
equal. At last, the three generated feature maps are aggregated. By leveraging both local and global context, CEM effectively enlarges the receptive field and refines the representation ability of the thin feature map. Compared with prior [FPN](https://paperswithcode.com/method/fpn) structures, CEM involves only two 1×1 convolutions and a fc layer.

**AlphaStar** is a reinforcement learning agent for tackling the game of Starcraft II. It learns a policy $\pi\_{\theta}\left(a\_{t}\mid{s\_{t}}, z\right) = P\left[a\_{t}\mid{s\_{t}}, z\right]$ using a neural network for parameters $\theta$ that receives observations $s\_{t} = \left(o\_{1:t}, a\_{1:t-1}\right)$ as inputs and chooses actions as outputs. Additionally, the policy conditions on a statistic $z$ that summarizes a strategy sampled from human data such as a build order [1].

AlphaStar uses numerous types of architecture to incorporate different types of features. Observations of player and enemy units are processed with a [Transformer](https://paperswithcode.com/method/transformer). Scatter connections are used to integrate spatial and non-spatial information. The temporal sequence of observations is processed by a core [LSTM](https://paperswithcode.com/method/lstm). Minimap features are extracted with a Residual Network. To manage the combinatorial action space, the agent uses an autoregressive policy and a recurrent [pointer network](https://paperswithcode.com/method/pointer-net).

The agent is trained first with supervised learning from human replays. Parameters are subsequently trained using reinforcement learning that maximizes the win rate against opponents.  The RL algorithm is based on a policy-gradient algorithm similar to actor-critic. Updates are performed asynchronously and off-policy. To deal with this, a combination of $TD\left(\lambda\right)$ and [V-trace](https://paperswithcode.com/method/v-trace) are used, as well as a new self-imitation algorithm (UPGO).

Lastly, to address game-theoretic challenges, AlphaStar is trained with league training to try to approximate a fictitious self-play (FSP) setting which avoids cycles by computing a best response against a uniform mixture of all previous policies. The league of potential opponents includes a diverse range of agents, including policies from current and previous agents.

Image Credit: [Yekun Chai](https://ychai.uk/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-on-StarCraft-II/)

####  References
1. Chai, Yekun. "AlphaStar: Grandmaster level in StarCraft II Explained." (2019).  [https://ychai.uk/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-on-StarCraft-II/](https://ychai.uk/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-on-StarCraft-II/)

#### Code Implementation
1. https://github.com/opendilab/DI-star

AlphaStar

Context Enhancement Module

ThunderNet: Towards Real-time Generic Object Detection

This method introduces several regularization schemes that can be applied to an Autoencoder. To make the model generative *ex-post* density estimation is proposed and consists in fitting a Mixture of Gaussian distribution on the train data embeddings after the model is trained.

Source	ThunderNet: Towards Real-time Generic Object Detection
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com