Configuration Reference¶
Complete reference for configuring LlamaStack Kubernetes Operator based on the actual API.
LlamaStackDistribution Specification¶
Basic Structure¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: string
namespace: string
spec:
replicas: integer # Default: 1
server:
distribution:
# Either name OR image (mutually exclusive)
name: string # Distribution name from supported distributions
image: string # Direct container image reference
containerSpec:
name: string # Default: "llama-stack"
port: integer # Default: 8321
resources:
requests:
cpu: string
memory: string
limits:
cpu: string
memory: string
env:
- name: string
value: string
podOverrides: # Optional pod-level customization
volumes:
- name: string
# ... volume spec
volumeMounts:
- name: string
mountPath: string
storage: # Optional persistent storage
size: string # Default: "10Gi"
mountPath: string # Default: "/.llama"
Core Configuration¶
Distribution Configuration¶
You can specify either a distribution name OR a direct image reference:
# Option 1: Use a named distribution
spec:
server:
distribution:
name: "ollama" # Maps to supported distributions
# Option 2: Use a direct image
spec:
server:
distribution:
image: "llamastack/llamastack:latest"
Supported Distribution Names¶
The operator supports the following pre-configured distributions:
Distribution Name | Image | Description |
---|---|---|
ollama |
docker.io/llamastack/distribution-ollama:latest |
Ollama-based distribution for local inference |
hf-endpoint |
docker.io/llamastack/distribution-hf-endpoint:latest |
Hugging Face Endpoint distribution |
hf-serverless |
docker.io/llamastack/distribution-hf-serverless:latest |
Hugging Face Serverless distribution |
bedrock |
docker.io/llamastack/distribution-bedrock:latest |
AWS Bedrock distribution |
cerebras |
docker.io/llamastack/distribution-cerebras:latest |
Cerebras distribution |
nvidia |
docker.io/llamastack/distribution-nvidia:latest |
NVIDIA distribution |
open-benchmark |
docker.io/llamastack/distribution-open-benchmark:latest |
Open benchmark distribution |
passthrough |
docker.io/llamastack/distribution-passthrough:latest |
Passthrough distribution |
remote-vllm |
docker.io/llamastack/distribution-remote-vllm:latest |
Remote vLLM distribution |
sambanova |
docker.io/llamastack/distribution-sambanova:latest |
SambaNova distribution |
tgi |
docker.io/llamastack/distribution-tgi:latest |
Text Generation Inference distribution |
together |
docker.io/llamastack/distribution-together:latest |
Together AI distribution |
vllm-gpu |
docker.io/llamastack/distribution-vllm-gpu:latest |
vLLM GPU distribution |
watsonx |
docker.io/llamastack/distribution-watsonx:latest |
IBM watsonx distribution |
fireworks |
docker.io/llamastack/distribution-fireworks:latest |
Fireworks AI distribution |
Examples:
# Ollama distribution
spec:
server:
distribution:
name: "ollama"
# Hugging Face Endpoint
spec:
server:
distribution:
name: "hf-endpoint"
# NVIDIA distribution
spec:
server:
distribution:
name: "nvidia"
# vLLM GPU distribution
spec:
server:
distribution:
name: "vllm-gpu"
Replica Configuration¶
Container Configuration¶
spec:
server:
containerSpec:
name: "llama-stack" # Default container name
port: 8321 # Default port
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
env:
- name: "INFERENCE_MODEL"
value: "llama2-7b"
- name: "LOG_LEVEL"
value: "INFO"
Storage Configuration¶
Basic Storage¶
Custom Mount Path¶
Advanced Pod Customization¶
Additional Volumes¶
spec:
server:
podOverrides:
volumes:
- name: "model-cache"
emptyDir:
sizeLimit: "20Gi"
- name: "config"
configMap:
name: "llamastack-config"
volumeMounts:
- name: "model-cache"
mountPath: "/cache"
- name: "config"
mountPath: "/config"
readOnly: true
ConfigMap Integration¶
spec:
server:
podOverrides:
volumes:
- name: "llamastack-config"
configMap:
name: "my-llamastack-config"
volumeMounts:
- name: "llamastack-config"
mountPath: "/app/config"
Configuration Examples¶
Minimal Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: simple-llamastack
spec:
server:
distribution:
name: "ollama"
Development Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: llamastack-dev
spec:
replicas: 1
server:
distribution:
image: "llamastack/llamastack:latest"
containerSpec:
port: 8321
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
env:
- name: "LOG_LEVEL"
value: "DEBUG"
storage:
size: "20Gi"
Production Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: llamastack-prod
spec:
replicas: 3
server:
distribution:
image: "llamastack/llamastack:v1.0.0"
containerSpec:
name: "llama-stack"
port: 8321
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
env:
- name: "INFERENCE_MODEL"
value: "llama2-70b"
- name: "MAX_WORKERS"
value: "4"
storage:
size: "500Gi"
mountPath: "/.llama"
podOverrides:
volumes:
- name: "model-cache"
emptyDir:
sizeLimit: "100Gi"
volumeMounts:
- name: "model-cache"
mountPath: "/cache"
Custom Image with Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: custom-llamastack
spec:
replicas: 2
server:
distribution:
image: "myregistry.com/custom-llamastack:v1.0"
containerSpec:
port: 8321
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
env:
- name: "CUSTOM_CONFIG"
value: "/config/custom.yaml"
storage:
size: "100Gi"
podOverrides:
volumes:
- name: "custom-config"
configMap:
name: "llamastack-custom-config"
volumeMounts:
- name: "custom-config"
mountPath: "/config"
readOnly: true
Distribution-Specific Examples¶
Ollama Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: ollama-llamastack
spec:
replicas: 1
server:
distribution:
name: "ollama"
containerSpec:
port: 8321
env:
- name: OLLAMA_URL
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
storage:
size: "20Gi"
Hugging Face Endpoint¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: hf-endpoint-llamastack
spec:
server:
distribution:
name: "hf-endpoint"
containerSpec:
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-credentials
key: token
- name: HF_MODEL_ID
value: "meta-llama/Llama-2-7b-chat-hf"
NVIDIA Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: nvidia-llamastack
spec:
server:
distribution:
name: "nvidia"
containerSpec:
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
env:
- name: NVIDIA_API_KEY
valueFrom:
secretKeyRef:
name: nvidia-credentials
key: api-key
vLLM GPU Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: vllm-gpu-llamastack
spec:
server:
distribution:
name: "vllm-gpu"
containerSpec:
resources:
requests:
nvidia.com/gpu: "1"
memory: "8Gi"
limits:
nvidia.com/gpu: "1"
memory: "16Gi"
env:
- name: MODEL_NAME
value: "meta-llama/Llama-2-7b-chat-hf"
storage:
size: "50Gi"
AWS Bedrock Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: bedrock-llamastack
spec:
server:
distribution:
name: "bedrock"
containerSpec:
env:
- name: AWS_REGION
value: "us-east-1"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-access-key
Together AI Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: together-llamastack
spec:
server:
distribution:
name: "together"
containerSpec:
env:
- name: TOGETHER_API_KEY
valueFrom:
secretKeyRef:
name: together-credentials
key: api-key
- name: MODEL_NAME
value: "meta-llama/Llama-2-7b-chat-hf"
Status Information¶
The operator provides status information about the distribution:
status:
version: "1.0.0"
ready: true
distributionConfig:
activeDistribution: "meta-reference"
providers:
- api: "inference"
provider_id: "meta-reference"
provider_type: "inference"
availableDistributions:
"meta-reference": "llamastack/llamastack:latest"
Constants and Defaults¶
The API defines several constants:
- Default Container Name:
llama-stack
- Default Server Port:
8321
- Default Service Port Name:
http
- Default Mount Path:
/.llama
- Default Storage Size:
10Gi
- Default Label Key:
app
- Default Label Value:
llama-stack
Validation Rules¶
The API includes validation:
- Distribution: Only one of
name
orimage
can be specified - Port: Must be a valid port number
- Resources: Follow Kubernetes resource requirements format
- Storage Size: Must be a valid Kubernetes quantity