text-generation-inference
https://github.com/huggingface/text-generation-inference
Python
Large Language Model Text Generation Inference
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported1 Subscribers
Add a CodeTriage badge to text-generation-inference
Help out
- Issues
- Host/CPU memory usage for prefix cache
- Support for returning a `CompletionUsage` object when `streaming=True`
- xpu/cpu: docker images referenced in documentation do not exist
- Add `response_format` input parameter to `v1/chat/completions` endpoint
- * HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable
- RuntimeError: weight model.embed_tokens.weight does not exist
- Multi-LORA feature question
- Multi-LORA feature question-2
- Cant install on Ubuntu 22.04 with Cuda 11.8
- Response prefill logprobs seems to become incorrect when using `AsyncInferenceClient` in some circumstances
- Docs
- Python not yet supported