Configuration Reference¶

Complete reference for configuring LlamaStack Kubernetes Operator based on the actual API.

LlamaStackDistribution Specification¶

Basic Structure¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: string
  namespace: string
spec:
  replicas: integer  # Default: 1
  server:
    distribution:
      # Either name OR image (mutually exclusive)
      name: string     # Distribution name from supported distributions
      image: string    # Direct container image reference
    containerSpec:
      name: string     # Default: "llama-stack"
      port: integer    # Default: 8321
      resources:
        requests:
          cpu: string
          memory: string
        limits:
          cpu: string
          memory: string
      env:
      - name: string
        value: string
    podOverrides:      # Optional pod-level customization
      volumes:
      - name: string
        # ... volume spec
      volumeMounts:
      - name: string
        mountPath: string
    storage:           # Optional persistent storage
      size: string     # Default: "10Gi"
      mountPath: string # Default: "/.llama"

Core Configuration¶

Distribution Configuration¶

You can specify either a distribution name OR a direct image reference:

# Option 1: Use a named distribution
spec:
  server:
    distribution:
      name: "ollama"  # Maps to supported distributions

# Option 2: Use a direct image
spec:
  server:
    distribution:
      image: "llamastack/llamastack:latest"

Supported Distribution Names¶

The operator supports the following pre-configured distributions:

Distribution Name	Image	Description
`ollama`	`docker.io/llamastack/distribution-ollama:latest`	Ollama-based distribution for local inference
`hf-endpoint`	`docker.io/llamastack/distribution-hf-endpoint:latest`	Hugging Face Endpoint distribution
`hf-serverless`	`docker.io/llamastack/distribution-hf-serverless:latest`	Hugging Face Serverless distribution
`bedrock`	`docker.io/llamastack/distribution-bedrock:latest`	AWS Bedrock distribution
`cerebras`	`docker.io/llamastack/distribution-cerebras:latest`	Cerebras distribution
`nvidia`	`docker.io/llamastack/distribution-nvidia:latest`	NVIDIA distribution
`open-benchmark`	`docker.io/llamastack/distribution-open-benchmark:latest`	Open benchmark distribution
`passthrough`	`docker.io/llamastack/distribution-passthrough:latest`	Passthrough distribution
`remote-vllm`	`docker.io/llamastack/distribution-remote-vllm:latest`	Remote vLLM distribution
`sambanova`	`docker.io/llamastack/distribution-sambanova:latest`	SambaNova distribution
`tgi`	`docker.io/llamastack/distribution-tgi:latest`	Text Generation Inference distribution
`together`	`docker.io/llamastack/distribution-together:latest`	Together AI distribution
`vllm-gpu`	`docker.io/llamastack/distribution-vllm-gpu:latest`	vLLM GPU distribution
`watsonx`	`docker.io/llamastack/distribution-watsonx:latest`	IBM watsonx distribution
`fireworks`	`docker.io/llamastack/distribution-fireworks:latest`	Fireworks AI distribution

Examples:

# Ollama distribution
spec:
  server:
    distribution:
      name: "ollama"

# Hugging Face Endpoint
spec:
  server:
    distribution:
      name: "hf-endpoint"

# NVIDIA distribution
spec:
  server:
    distribution:
      name: "nvidia"

# vLLM GPU distribution
spec:
  server:
    distribution:
      name: "vllm-gpu"

Replica Configuration¶

spec:
  replicas: 3  # Default: 1

Container Configuration¶

spec:
  server:
    containerSpec:
      name: "llama-stack"  # Default container name
      port: 8321           # Default port
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
      - name: "INFERENCE_MODEL"
        value: "llama2-7b"
      - name: "LOG_LEVEL"
        value: "INFO"

Storage Configuration¶

Basic Storage¶

spec:
  server:
    storage:
      size: "50Gi"              # Default: "10Gi"
      mountPath: "/.llama"      # Default mount path

Custom Mount Path¶

spec:
  server:
    storage:
      size: "100Gi"
      mountPath: "/custom/path"

Advanced Pod Customization¶

Additional Volumes¶

spec:
  server:
    podOverrides:
      volumes:
      - name: "model-cache"
        emptyDir:
          sizeLimit: "20Gi"
      - name: "config"
        configMap:
          name: "llamastack-config"
      volumeMounts:
      - name: "model-cache"
        mountPath: "/cache"
      - name: "config"
        mountPath: "/config"
        readOnly: true

ConfigMap Integration¶

spec:
  server:
    podOverrides:
      volumes:
      - name: "llamastack-config"
        configMap:
          name: "my-llamastack-config"
      volumeMounts:
      - name: "llamastack-config"
        mountPath: "/app/config"

Configuration Examples¶

Minimal Configuration¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: simple-llamastack
spec:
  server:
    distribution:
      name: "ollama"

Development Configuration¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llamastack-dev
spec:
  replicas: 1
  server:
    distribution:
      image: "llamastack/llamastack:latest"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"
        limits:
          cpu: "1"
          memory: "2Gi"
      env:
      - name: "LOG_LEVEL"
        value: "DEBUG"
    storage:
      size: "20Gi"

Production Configuration¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llamastack-prod
spec:
  replicas: 3
  server:
    distribution:
      image: "llamastack/llamastack:v1.0.0"
    containerSpec:
      name: "llama-stack"
      port: 8321
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"
        limits:
          cpu: "4"
          memory: "8Gi"
      env:
      - name: "INFERENCE_MODEL"
        value: "llama2-70b"
      - name: "MAX_WORKERS"
        value: "4"
    storage:
      size: "500Gi"
      mountPath: "/.llama"
    podOverrides:
      volumes:
      - name: "model-cache"
        emptyDir:
          sizeLimit: "100Gi"
      volumeMounts:
      - name: "model-cache"
        mountPath: "/cache"

Custom Image with Configuration¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-llamastack
spec:
  replicas: 2
  server:
    distribution:
      image: "myregistry.com/custom-llamastack:v1.0"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
      - name: "CUSTOM_CONFIG"
        value: "/config/custom.yaml"
    storage:
      size: "100Gi"
    podOverrides:
      volumes:
      - name: "custom-config"
        configMap:
          name: "llamastack-custom-config"
      volumeMounts:
      - name: "custom-config"
        mountPath: "/config"
        readOnly: true

Distribution-Specific Examples¶

Ollama Distribution¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: ollama-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      env:
      - name: OLLAMA_URL
        value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
    storage:
      size: "20Gi"

Hugging Face Endpoint¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: hf-endpoint-llamastack
spec:
  server:
    distribution:
      name: "hf-endpoint"
    containerSpec:
      env:
      - name: HF_TOKEN
        valueFrom:
          secretKeyRef:
            name: hf-credentials
            key: token
      - name: HF_MODEL_ID
        value: "meta-llama/Llama-2-7b-chat-hf"

NVIDIA Distribution¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: nvidia-llamastack
spec:
  server:
    distribution:
      name: "nvidia"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "1"
        limits:
          nvidia.com/gpu: "1"
      env:
      - name: NVIDIA_API_KEY
        valueFrom:
          secretKeyRef:
            name: nvidia-credentials
            key: api-key

vLLM GPU Distribution¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: vllm-gpu-llamastack
spec:
  server:
    distribution:
      name: "vllm-gpu"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "1"
          memory: "8Gi"
        limits:
          nvidia.com/gpu: "1"
          memory: "16Gi"
      env:
      - name: MODEL_NAME
        value: "meta-llama/Llama-2-7b-chat-hf"
    storage:
      size: "50Gi"

AWS Bedrock Distribution¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: bedrock-llamastack
spec:
  server:
    distribution:
      name: "bedrock"
    containerSpec:
      env:
      - name: AWS_REGION
        value: "us-east-1"
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: access-key-id
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: secret-access-key

Together AI Distribution¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: together-llamastack
spec:
  server:
    distribution:
      name: "together"
    containerSpec:
      env:
      - name: TOGETHER_API_KEY
        valueFrom:
          secretKeyRef:
            name: together-credentials
            key: api-key
      - name: MODEL_NAME
        value: "meta-llama/Llama-2-7b-chat-hf"

Status Information¶

The operator provides status information about the distribution:

status:
  version: "1.0.0"
  ready: true
  distributionConfig:
    activeDistribution: "meta-reference"
    providers:
    - api: "inference"
      provider_id: "meta-reference"
      provider_type: "inference"
    availableDistributions:
      "meta-reference": "llamastack/llamastack:latest"

Constants and Defaults¶

The API defines several constants:

Default Container Name: llama-stack
Default Server Port: 8321
Default Service Port Name: http
Default Mount Path: /.llama
Default Storage Size: 10Gi
Default Label Key: app
Default Label Value: llama-stack

Validation Rules¶

The API includes validation:

Distribution: Only one of name or image can be specified
Port: Must be a valid port number
Resources: Follow Kubernetes resource requirements format
Storage Size: Must be a valid Kubernetes quantity