vllm
https://github.com/vllm-project/vllm
Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Python not yet supported2 Subscribers
Add a CodeTriage badge to vllm
Help out
- Issues
- [Doc]: Is Qwen2-VL-72B supported?
- [New Model][Format]: Support the HF-version of Pixtral
- [Core] Rename input data types
- [Bugfix]Enable __ldcv in custome allreduce and remove memory fence
- [MISC] Support multi node inference with Neuron
- [Bug]: TypeError: 'NoneType' object is not subscriptable
- [Core] Enable Memory Tiering for vLLM
- [Not to be Submitted] [WIP] Force Unit tests to run with BlockManager V2
- [Doc]: How to Specify System CUTLASS/CUTE Path?
- [Bug]: RuntimeError on A800 using vllm0.6.1.post2
- Docs
- Python not yet supported