vLLM

1 minute read

I have been thinking about start a new open source project beside LangChain and LlamaIndex. vLLM seems a good choice and hopefully there will be more vLLM blogs comming

1 Model hosting

This is essentially how NIM starts the service in early version and now it’s changed to nim_llm path

export HF_HOME=/raid/models/huggingface
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --model=meta-llama/Llama-3.2-1B 

Or you can use vllm binary. The chat-template is need for VLM hosting and can be found here

vllm serve llava-hf/llava-1.5-7b-hf --dtype auto --chat_template ./template_llava.jinja --api-key token-abc123

2 Infernce

It’s following OpenAI format, and for VLM, you can use traditoinally chat format, for supply image URL as below

"messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ]

and for local images, you need to create base64 format

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Getting the base64 string
base64_image = encode_image(image_path)
#... and message is like below
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],

Twitter Facebook LinkedIn

vLLM

1 Model hosting

2 Infernce

You May Also Enjoy

Stream Batch process

CUDA

Slurm and Enroot

NVLink, InfiniBand and SpectrumX