lorax
https://github.com/predibase/lorax
Python
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to lorax
Help out
- Issues
- S3 download issues
- Mixtral nf4 performance 2x slower than expected
- Improve OpenAI API compatibility
- Supporting LmHead and Embedding Layers for Adapters
- Can the basemodel be loaded directly from the local system without connecting to Hugging Face? It cannot be launched because there is no connection.
- Do you support miqu?
- Can I deploy the service using Lorax without using lorax-launcher to start?
- Fp6 quant from deepspeed
- Improve error handling in SGMV kernels
- Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")
- Docs
- Python not yet supported