What is: ZeRO?
Source | ZeRO: Memory Optimizations Toward Training Trillion Parameter Models |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Zero Redundancy Optimizer (ZeRO) is a sharded data parallel method for distributed training. ZeRODP removes the memory state redundancies across data-parallel processes by partitioning the model states instead of replicating them, and it retains the compute/communication efficiency by retaining the computational granularity and communication volume of DP using a dynamic communication schedule during training.