deepspeed
https://github.com/microsoft/deepspeed
Python
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported9 Subscribers
Add a CodeTriage badge to deepspeed
Help out
- Issues
- [REQUEST] can we optimize the logic for checkpoint saving
- [REQUEST] GAN: Different learning rates and schedulers in one config for multiple .initialize() calls possible?
- [BUG] load_checkpoint should load directly to gpu
- [BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"
- [BUG] ZeRO Stage2 and 3, error while loss backward
- [BUG] GPT-J InferenceEngine Initialization Failure: `RuntimeError`
- [BUG] Autotuner is not launching experiments with correct hostfile setting
- [BUG] Zero3 Checkpointing doesn't include HF T5's token embeddings
- [REQUEST] cpu offload needs a max cpu memory config + pointers to cgroups/cpu oom handlers
- [REQUEST] removing the requirement for all layers to always execute in sync
- Docs
- Python not yet supported