AICurious Logo

What is: ZeRO?

SourceZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Zero Redundancy Optimizer (ZeRO) is a sharded data parallel method for distributed training. ZeRODP removes the memory state redundancies across data-parallel processes by partitioning the model states instead of replicating them, and it retains the compute/communication efficiency by retaining the computational granularity and communication volume of DP using a dynamic communication schedule during training.