lightning
https://github.com/lightning-ai/lightning
Python
Deep learning framework to train, deploy, and ship AI products Lightning fast.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported2 Subscribers
Add a CodeTriage badge to lightning
Help out
- Issues
- trainer.fit from checkpoint without performance improvement will break 'last' link to checkpoint on window11
- Unable to extract confusion matrix as a metric from trainer
- TensorBoardLogger has the wrong epoch numbers much more than the fact
- How to incorporate vLLM in Lightning for LLM inference?
- WandbLogger `save_dir` and `dir` parameters do not work as expected.
- Loading large models with fabric, FSDP and empty_init=True does not work
- AWS Trainium fails number of device validation when using more than 1 accelerator on the instances
- OnExceptionCheckpoint: training resumes if ckpt found, even if no ckpt_path provided
- Checkpoint every_n_steps reruns epoch on restore
- Multi-node Training with DDP stuck at "Initialize distributed..." on SLURM cluster
- Docs
- Python not yet supported