Dynamo Disagg Skeleton

1 minute read

Created a PR to add disagg skeleton example for Dynamo. This actually completes the Backend/Worker guide

1 Processor

Process is smiliar to the hello world multinode example with 2 changes

  1. Adding depends(Router)
  2. Adding kv router mode, which will connect to a dummy kv routner for worker nodes selection and use self.worker_client.direct(..., worker_id) to specify a worker

2 KV Router

This client depends on workers, so it has the same logic as Processor to get workers

  1. The use _cost_function to decide worker selection and customization will be implemented in the cost function. In this dummy example, we used string matching score as the cost.
    hit_rate = SequenceMatcher(
      isjunk=None,  ## junk char to ignore
      self.kv_cache[curr_id], ## target string
      request_prompt).ratio() ## compare string
    
  2. Adding the KV metrics pulisaggregator examples
      self.runtime = dynamo_context["runtime"]
      kv_listener = self.runtime.namespace("dynamo-demo").component("DummyWorker")
      await kv_listener.create_service()
      self.metrics_aggregator = KvMetricsAggregator(kv_listener)
    
  3. The use of this aggregator is integrated in vLLM backend. In this dummy example, we explicitly call this aagregator to get KV metrics.
    metrics = await self.metrics_aggregator.get_metrics()
    for endpoint in metrics.endpoints:
      logger.info(f"KV metrics:{endpoint.worker_id}, {endpoint.num_requests_waiting}")
    

3 Worker

KV publisher is initialized in worker and can publish KV metrics.

self.component = dynamo_context["component"]
self.metrics_publisher = KvMetricsPublisher()  
# Register an endpoint for consumers of the KV Metrics
# (KvMetricsAggregator in kv_router) to listen/gather on.
self.metrics_publisher.create_endpoint(self.component)
self.metrics_publisher.publish(...)

Tags:

Categories:

Updated: