deepspeed
https://github.com/microsoft/deepspeed
Python
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported9 Subscribers
Add a CodeTriage badge to deepspeed
Help out
- Issues
- When using zero stage 3 for model training, loading custom parameters failed and the model parameter size was 0.
- [REQUEST] During inference, support passing `past_key_values` even if `input_ids.shape[-1] >= 2`
- [BUG] StarCoder inference not working with AutoTP
- [BUG] process exits with return code=-6 during training with bf16 optimizer
- [BUG] FLOPS compute **FAILS** for `F.interpolate` when using `scale_factor`
- [BUG] Deepspeed inference time distribution and max tokens
- [BUG] ZeRO Stage 2 seems to train MoE models incorrectly
- Fixed bug with hybrid engine generation when inference_tp_size > 1
- [BUG] The code for deepspeed.comm.comm.monitored_barrier()
- [BUG] High memory usage on first GPU, despite perfectly-balanced stages in pipeline
- Docs
- Python not yet supported