lorax
https://github.com/predibase/lorax
Python
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to lorax
Help out
- Issues
- Fix PyTorch CUDA version in Docker
- Batch inference endpoint (OpenAI compatible)
- Improve async load for adapters to avoid main thread lockups in server
- Add all launcher args as optional in the Helm charts
- Improve warmup checking for max new tokens when using speculative decoding
- Support inference on INF2 instance
- Reject unknown fields from API requests
- AssertionError when using model "google/gemma-2b" with multi-gpus
- Add support for additional custom models
- Use special tokens specific to the fine-tuned adapter during decoding
- Docs
- Python not yet supported