Video support in Nemontron Nano VL
The PR for Nemotron Nano VL is still on going. There are more modifications as below (Finally merged after I just finished this blog)
1 InternVL’s video extention
Nemotron’s support in Video is questionable, and the benchmark is “calculated with 1 tile per image”. So I removed all the video related code, and here are some notes
- The
BaseInternVLxxxclasses are for image-onlyBaseInternVLProcessingInfoBaseInternVLDummyInputsBuilderBaseInternVLMultiModalProcessor
InternVLxxxx(BaseInternVLxxx)are extended for video supportInternVLProcessingInfo(BaseInternVLProcessingInfo)InternVLDummyInputsBuilder(BaseInternVLDummyInputsBuilder)InternVLMultiModalProcessor(BaseInternVLMultiModalProcessor)
- Instead of inheritate from
InternVLxxxand remove video support, the code should directly inheritate fromBaseInternVL - Base classes are directly used in the decrator now
@MULTIMODAL_REGISTRY.register_processor( BaseInternVLMultiModalProcessor[NemotronVLProcessingInfo], info=NemotronVLProcessingInfo, dummy_inputs=BaseInternVLDummyInputsBuilder[NemotronVLProcessingInfo]) class LlamaNemotronVLChatModel(nn.Module, SupportsMultiModal, SupportsPP, SupportsLoRA):2 Adding test dependency
CI test failed with dependance, to fix it
- add dep at
requirements/test.in - run
pre-commitand callpip-compileto generaterequirments/test.txt - Copy over
test.txtfrom CI if it still failed pre-commit - Fixed a bug in
docker/Dockerfile.cpu3 Config attributes mapping
We actually do NOT need to copy the configuration.py from HF just because of some attributes name mismatch.
- No
Llama_Nemotron_Nano_VL_Configdefined undervllm/transformers_utis/config/nemotron_vl_config.py - No
Llama_Nemotron_Nano_VL_Configreferred undervllm/transformers_utis/config/__init__.py - No
Llama_Nemotron_Nano_VL_Configregisterd in the_CONFIG_REGISTRYundervllm/transformers_utis/config.py - Add Following mapping code to make the HF config to work under
vllm/transformers_utis/config.py_CONFIG_ATTRS_MAPPING: dict[str, str] = { "llm_config": "text_config", }4 Class inheritance
Try to inheritage as much as possible to avoid write duplicate codes. The processor class can be directly inheritaged from
InternVLProcessorbut need to override methods as neededclass NemotronVLProcessor(InternVLProcessor): def __init__(...): # This is combining InternVLProcessor.__init__ and BaseInternVLProcessor.__init__ def _preprocess_image(...): # Due to Nemotron is using <image> as IMG_CONTEXT, which has conflict with vLLM's image placeholder @property def image_token_id(self) -> int: # Due to different IMG_CONTEXT from InternVL def get_image_repl(...): # Due to different IMG_CONTEXT from InternVL5 LoRa support
The LoRA support does NOT actually need to be tested with LoRA adapters It just need to test with engine initialization in
api_server.pyasync def run_server(args, **uvicorn_kwargs) -> None: """Run a single-worker API server.""" args.model="nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1" args.enable_lora=True args.trust_remote_code=True logger.info("args: %s", args) ...
- add dep at