vLLM
I have been thinking about start a new open source project beside LangChain and LlamaIndex. vLLM seems a good choice and hopefully there will be more vLLM blogs comming
1 Model hosting
This is essentially how NIM starts the service in early version and now it’s changed to nim_llm
path
export HF_HOME=/raid/models/huggingface
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model=meta-llama/Llama-3.2-1B
Or you can use vllm
binary. The chat-template
is need for VLM hosting and can be found here
vllm serve llava-hf/llava-1.5-7b-hf --dtype auto --chat_template ./template_llava.jinja --api-key token-abc123
2 Infernce
It’s following OpenAI format, and for VLM, you can use traditoinally chat format, for supply image URL as below
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
and for local images, you need to create base64 format
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Getting the base64 string
base64_image = encode_image(image_path)
#... and message is like below
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],