**PyTorch DDP** (Distributed Data Parallel) is a distributed data parallel implementation for PyTorch. To guarantee mathematical equivalence, all replicas start from the same initial values for model parameters and synchronize gradients to keep parameters consistent across training iterations. To minimize the intrusiveness, the implementation exposes the same forward API as the user model, allowing applications to seamlessly replace subsequent occurrences of a user model with the distributed data parallel model object with no additional code changes. Several techniques are integrated into the design to deliver high-performance training, including bucketing gradients, overlapping communication with computation, and skipping synchronization.

**Agglomerative Contextual Decomposition (ACD)** is an interpretability method that produces hierarchical interpretations for a single prediction made by a neural network, by scoring interactions and building them into a tree. Given a prediction from a trained neural network, ACD produces a hierarchical clustering of the input features, along with the contribution of each cluster to the final prediction. This hierarchy is optimized to identify clusters of features that the DNN learned are predictive.

Agglomerative Contextual Decomposition

Hierarchical interpretations for neural network predictions

PyTorch DDP

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

**Zoneout** is a  method for regularizing [RNNs](https://paperswithcode.com/methods/category/recurrent-neural-networks). At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like [dropout](https://paperswithcode.com/method/dropout), zoneout uses random noise to train a pseudo-ensemble, improving generalization.
But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward [stochastic depth](https://paperswithcode.com/method/stochastic-depth) networks.

Source	PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com