AICurious Logo

What is: PyTorch DDP?

SourcePyTorch Distributed: Experiences on Accelerating Data Parallel Training
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

PyTorch DDP (Distributed Data Parallel) is a distributed data parallel implementation for PyTorch. To guarantee mathematical equivalence, all replicas start from the same initial values for model parameters and synchronize gradients to keep parameters consistent across training iterations. To minimize the intrusiveness, the implementation exposes the same forward API as the user model, allowing applications to seamlessly replace subsequent occurrences of a user model with the distributed data parallel model object with no additional code changes. Several techniques are integrated into the design to deliver high-performance training, including bucketing gradients, overlapping communication with computation, and skipping synchronization.