**Q-Learning** is an off-policy temporal difference control algorithm:

$$Q\left(S\_{t}, A\_{t}\right) \leftarrow Q\left(S\_{t}, A\_{t}\right) + \alpha\left[R_{t+1} + \gamma\max\_{a}Q\left(S\_{t+1}, a\right) - Q\left(S\_{t}, A\_{t}\right)\right] $$

The learned action-value function $Q$ directly approximates $q\_{*}$, the optimal action-value function, independent of the policy being followed.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed.

Causal inference

Q-Learning

**3DSSD** is a point-based 3D single stage object detection detector. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing point-based methods, are abandoned to reduce the large computation cost. The authors propose a fusion sampling strategy in the downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchor-free regression head with a 3D center-ness assignment strategy is designed to meet the needs of accuracy and speed.

Year	1984
Data Source	CC BY-SA - https://paperswithcode.com