lorax
https://github.com/predibase/lorax
Python
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to lorax
Help out
- Issues
- Add a small GUI to interact with models easier
- Streamlit frontend to query models
- Add ChatGLM as base model and support adapters
- Include total time to generate tokens in final payload details
- Add support for control vector adapters per request
- Add in "--adapter-memory-fraction" to docs
- Support constrained generation of valid Python types
- Add support for AQLM quantization
- Misleading/wrong openapi schema in REST API docs for structured output
- Supporting inference with EETQ quantized model
- Docs
- Python not yet supported