deepspeed
https://github.com/microsoft/deepspeed
Python
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported8 Subscribers
Add a CodeTriage badge to deepspeed
Help out
- Issues
- [BUG] training bug
- [BUG]`assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary()` when training deepspeed-chat step3 with ZeRO3 and a larger `generation_batches`
- When using zero stage 3 for model training, loading custom parameters failed and the model parameter size was 0.
- [REQUEST] During inference, support passing `past_key_values` even if `input_ids.shape[-1] >= 2`
- [BUG] StarCoder inference not working with AutoTP
- [BUG] process exits with return code=-6 during training with bf16 optimizer
- [BUG] FLOPS compute **FAILS** for `F.interpolate` when using `scale_factor`
- [BUG] Deepspeed inference time distribution and max tokens
- [BUG] ZeRO Stage 2 seems to train MoE models incorrectly
- amd-mi200 CI test failure
- Docs
- Python not yet supported