Configuration Reference¶
Complete reference for configuring LlamaStack Kubernetes Operator based on the actual API.
LlamaStackDistribution Specification¶
Basic Structure¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: string
  namespace: string
spec:
  replicas: integer  # Default: 1
  server:
    distribution:
      # Either name OR image (mutually exclusive)
      name: string     # Distribution name from supported distributions
      image: string    # Direct container image reference
    containerSpec:
      name: string     # Default: "llama-stack"
      port: integer    # Default: 8321
      resources:
        requests:
          cpu: string
          memory: string
        limits:
          cpu: string
          memory: string
      env:
      - name: string
        value: string
    podOverrides:      # Optional pod-level customization
      volumes:
      - name: string
        # ... volume spec
      volumeMounts:
      - name: string
        mountPath: string
    storage:           # Optional persistent storage
      size: string     # Default: "10Gi"
      mountPath: string # Default: "/.llama"
Core Configuration¶
Distribution Configuration¶
You can specify either a distribution name OR a direct image reference:
# Option 1: Use a named distribution
spec:
  server:
    distribution:
      name: "ollama"  # Maps to supported distributions
# Option 2: Use a direct image
spec:
  server:
    distribution:
      image: "llamastack/llamastack:latest"
Supported Distribution Names¶
The operator supports the following pre-configured distributions:
| Distribution Name | Image | Description | 
|---|---|---|
| ollama | docker.io/llamastack/distribution-ollama:latest | Ollama-based distribution for local inference | 
| hf-endpoint | docker.io/llamastack/distribution-hf-endpoint:latest | Hugging Face Endpoint distribution | 
| hf-serverless | docker.io/llamastack/distribution-hf-serverless:latest | Hugging Face Serverless distribution | 
| bedrock | docker.io/llamastack/distribution-bedrock:latest | AWS Bedrock distribution | 
| cerebras | docker.io/llamastack/distribution-cerebras:latest | Cerebras distribution | 
| nvidia | docker.io/llamastack/distribution-nvidia:latest | NVIDIA distribution | 
| open-benchmark | docker.io/llamastack/distribution-open-benchmark:latest | Open benchmark distribution | 
| passthrough | docker.io/llamastack/distribution-passthrough:latest | Passthrough distribution | 
| remote-vllm | docker.io/llamastack/distribution-remote-vllm:latest | Remote vLLM distribution | 
| sambanova | docker.io/llamastack/distribution-sambanova:latest | SambaNova distribution | 
| tgi | docker.io/llamastack/distribution-tgi:latest | Text Generation Inference distribution | 
| together | docker.io/llamastack/distribution-together:latest | Together AI distribution | 
| vllm-gpu | docker.io/llamastack/distribution-vllm-gpu:latest | vLLM GPU distribution | 
| watsonx | docker.io/llamastack/distribution-watsonx:latest | IBM watsonx distribution | 
| fireworks | docker.io/llamastack/distribution-fireworks:latest | Fireworks AI distribution | 
Examples:
# Ollama distribution
spec:
  server:
    distribution:
      name: "ollama"
# Hugging Face Endpoint
spec:
  server:
    distribution:
      name: "hf-endpoint"
# NVIDIA distribution
spec:
  server:
    distribution:
      name: "nvidia"
# vLLM GPU distribution
spec:
  server:
    distribution:
      name: "vllm-gpu"
Replica Configuration¶
Container Configuration¶
spec:
  server:
    containerSpec:
      name: "llama-stack"  # Default container name
      port: 8321           # Default port
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
      - name: "INFERENCE_MODEL"
        value: "llama2-7b"
      - name: "LOG_LEVEL"
        value: "INFO"
Storage Configuration¶
Basic Storage¶
Custom Mount Path¶
Advanced Pod Customization¶
Additional Volumes¶
spec:
  server:
    podOverrides:
      volumes:
      - name: "model-cache"
        emptyDir:
          sizeLimit: "20Gi"
      - name: "config"
        configMap:
          name: "llamastack-config"
      volumeMounts:
      - name: "model-cache"
        mountPath: "/cache"
      - name: "config"
        mountPath: "/config"
        readOnly: true
ConfigMap Integration¶
spec:
  server:
    podOverrides:
      volumes:
      - name: "llamastack-config"
        configMap:
          name: "my-llamastack-config"
      volumeMounts:
      - name: "llamastack-config"
        mountPath: "/app/config"
Configuration Examples¶
Minimal Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: simple-llamastack
spec:
  server:
    distribution:
      name: "ollama"
Development Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llamastack-dev
spec:
  replicas: 1
  server:
    distribution:
      image: "llamastack/llamastack:latest"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"
        limits:
          cpu: "1"
          memory: "2Gi"
      env:
      - name: "LOG_LEVEL"
        value: "DEBUG"
    storage:
      size: "20Gi"
Production Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llamastack-prod
spec:
  replicas: 3
  server:
    distribution:
      image: "llamastack/llamastack:v1.0.0"
    containerSpec:
      name: "llama-stack"
      port: 8321
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"
        limits:
          cpu: "4"
          memory: "8Gi"
      env:
      - name: "INFERENCE_MODEL"
        value: "llama2-70b"
      - name: "MAX_WORKERS"
        value: "4"
    storage:
      size: "500Gi"
      mountPath: "/.llama"
    podOverrides:
      volumes:
      - name: "model-cache"
        emptyDir:
          sizeLimit: "100Gi"
      volumeMounts:
      - name: "model-cache"
        mountPath: "/cache"
Custom Image with Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-llamastack
spec:
  replicas: 2
  server:
    distribution:
      image: "myregistry.com/custom-llamastack:v1.0"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
      - name: "CUSTOM_CONFIG"
        value: "/config/custom.yaml"
    storage:
      size: "100Gi"
    podOverrides:
      volumes:
      - name: "custom-config"
        configMap:
          name: "llamastack-custom-config"
      volumeMounts:
      - name: "custom-config"
        mountPath: "/config"
        readOnly: true
Distribution-Specific Examples¶
Ollama Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: ollama-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      env:
      - name: OLLAMA_URL
        value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
    storage:
      size: "20Gi"
Hugging Face Endpoint¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: hf-endpoint-llamastack
spec:
  server:
    distribution:
      name: "hf-endpoint"
    containerSpec:
      env:
      - name: HF_TOKEN
        valueFrom:
          secretKeyRef:
            name: hf-credentials
            key: token
      - name: HF_MODEL_ID
        value: "meta-llama/Llama-2-7b-chat-hf"
NVIDIA Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: nvidia-llamastack
spec:
  server:
    distribution:
      name: "nvidia"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "1"
        limits:
          nvidia.com/gpu: "1"
      env:
      - name: NVIDIA_API_KEY
        valueFrom:
          secretKeyRef:
            name: nvidia-credentials
            key: api-key
vLLM GPU Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: vllm-gpu-llamastack
spec:
  server:
    distribution:
      name: "vllm-gpu"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "1"
          memory: "8Gi"
        limits:
          nvidia.com/gpu: "1"
          memory: "16Gi"
      env:
      - name: MODEL_NAME
        value: "meta-llama/Llama-2-7b-chat-hf"
    storage:
      size: "50Gi"
AWS Bedrock Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: bedrock-llamastack
spec:
  server:
    distribution:
      name: "bedrock"
    containerSpec:
      env:
      - name: AWS_REGION
        value: "us-east-1"
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: access-key-id
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: secret-access-key
Together AI Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: together-llamastack
spec:
  server:
    distribution:
      name: "together"
    containerSpec:
      env:
      - name: TOGETHER_API_KEY
        valueFrom:
          secretKeyRef:
            name: together-credentials
            key: api-key
      - name: MODEL_NAME
        value: "meta-llama/Llama-2-7b-chat-hf"
Status Information¶
The operator provides status information about the distribution:
status:
  version: "1.0.0"
  ready: true
  distributionConfig:
    activeDistribution: "meta-reference"
    providers:
    - api: "inference"
      provider_id: "meta-reference"
      provider_type: "inference"
    availableDistributions:
      "meta-reference": "llamastack/llamastack:latest"
Constants and Defaults¶
The API defines several constants:
- Default Container Name: llama-stack
- Default Server Port: 8321
- Default Service Port Name: http
- Default Mount Path: /.llama
- Default Storage Size: 10Gi
- Default Label Key: app
- Default Label Value: llama-stack
Validation Rules¶
The API includes validation:
- Distribution: Only one of nameorimagecan be specified
- Port: Must be a valid port number
- Resources: Follow Kubernetes resource requirements format
- Storage Size: Must be a valid Kubernetes quantity