deepspeed
https://github.com/microsoft/deepspeed
Python
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported9 Subscribers
Add a CodeTriage badge to deepspeed
Help out
- Issues
- [BUG] process exits with return code=-6 during training with bf16 optimizer
- [BUG] FLOPS compute **FAILS** for `F.interpolate` when using `scale_factor`
- [BUG] Deepspeed inference time distribution and max tokens
- [BUG] ZeRO Stage 2 seems to train MoE models incorrectly
- Fixed bug with hybrid engine generation when inference_tp_size > 1
- [BUG] The code for deepspeed.comm.comm.monitored_barrier()
- [BUG] High memory usage on first GPU, despite perfectly-balanced stages in pipeline
- How to get average loss across all ranks using custom loss function
- [BUG] container dose
- [BUG] Activation Offloading with Residual-Type Connections
- Docs
- Python not yet supported