**Sarsa_INLINE_MATH_1** extends eligibility-traces to action-value methods. It has the same update rule as for **TD_INLINE_MATH_1** but we use the action-value form of the TD erorr:

$$ \delta\_{t} = R\_{t+1} + \gamma\hat{q}\left(S\_{t+1}, A\_{t+1}, \mathbb{w}\_{t}\right) - \hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t}\right) $$

and the action-value form of the [eligibility trace](https://paperswithcode.com/method/eligibility-trace):

$$ \mathbb{z}\_{-1} = \mathbb{0} $$

$$ \mathbb{z}\_{t} = \gamma\lambda\mathbb{z}\_{t-1} + \nabla\hat{q}\left(S\_{t}, A\_{t}, \mathbb{w}\_{t} \right), 0 \leq t \leq T$$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

A **Variational Autoencoder** is a type of likelihood-based generative model. It consists of an encoder, that takes in data $x$ as input and transforms this into a latent representation $z$,  and a decoder, that takes a latent representation $z$ and returns a reconstruction $\hat{x}$. Inference is performed via variational inference to approximate the posterior of the model.

Auto-Encoding Variational Bayes

Sarsa Lambda

RelDiff generates entity-relation-entity embeddings in a single embedding space. RelDiff adopts two fundamental vector algebraic operators to transform entity and relation embeddings from knowledge graphs into entity-relation-entity embeddings. In particular, RelDiff can encode finer-grained information about the relations than is captured when separate embeddings are learned for the entities and the relations.

Year	2000
Data Source	CC BY-SA - https://paperswithcode.com