vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported3 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Feature]: Slurm run_cluster.sh launcher instead of just Ray
- [Bug]: deploy multi lora by vllm mode error
- [WIP][Spec Decode] Add multi-proposer support for variable and flexible speculative decoding
- [Feature]: Does VLLM only support MistralModel Architecture for embedding?
- [Bug]: Is vllm compatible with torchrun?
- [Bug]: RuntimeError: operator torchvision::nms does not exist
- [Performance]: vLLM version issue.
- [Installation]: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback): libcudart.so.12: cannot open shared object file: No such file or directory
- [Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference
- [Usage]: Confirm tool calling is not supported and this is the closest thing can be done
- Docs
- Python not yet supported