Bring Your Own (BYO) Distributions¶

The LlamaStack Kubernetes operator supports both pre-built distributions and custom "Bring Your Own" (BYO) distributions. This guide shows you how to build, customize, and deploy your own LlamaStack distributions.

Overview¶

Supported vs BYO Distributions¶

Type	Description	Use Case	Configuration
Supported	Pre-built distributions maintained by the LlamaStack team	Quick deployment, standard configurations	Use `distribution.name` field
BYO	Custom distributions you build and maintain	Custom providers, specialized configurations	Use `distribution.image` field

Why Build Custom Distributions?¶

Custom Providers: Integrate with proprietary or specialized inference engines
Specific Configurations: Tailor the stack for your exact requirements
External Dependencies: Include additional libraries or tools
Security Requirements: Control the entire build process and dependencies
Performance Optimization: Optimize for your specific hardware or use case

Building LlamaStack Distributions¶

Prerequisites¶

Install LlamaStack CLI:
```
pip install llama-stack
```

Docker or Podman (for container builds):

# Verify Docker is running
docker --version

Conda (for conda builds):

# Verify Conda is available
conda --version

Quick Start: Building from Templates¶

1. List Available Templates¶

llama stack build --list-templates

This shows available templates like: - ollama - Ollama-based inference - vllm-gpu - vLLM with GPU support - meta-reference-gpu - Meta's reference implementation - bedrock - AWS Bedrock integration - fireworks - Fireworks AI integration

2. Build from Template¶

# Build a container image from Ollama template
llama stack build --template ollama --image-type container

# Build a conda environment from vLLM template
llama stack build --template vllm-gpu --image-type conda

# Build with custom name
llama stack build --template ollama --image-type container --image-name my-custom-ollama

3. Interactive Build¶

llama stack build

This launches an interactive wizard:

> Enter a name for your Llama Stack (e.g. my-local-stack): my-custom-stack
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): container

Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.

> Enter provider for API inference: inline::meta-reference
> Enter provider for API safety: inline::llama-guard
> Enter provider for API agents: inline::meta-reference
> Enter provider for API memory: inline::faiss
> Enter provider for API datasetio: inline::meta-reference
> Enter provider for API scoring: inline::meta-reference
> Enter provider for API eval: inline::meta-reference
> Enter provider for API telemetry: inline::meta-reference

> (Optional) Enter a short description for your Llama Stack: My custom distribution

Advanced: Custom Configuration Files¶

1. Create a Custom Build Configuration¶

Create my-custom-build.yaml:

name: my-custom-stack
distribution_spec:
  description: Custom distribution with external Ollama
  providers:
    inference: remote::ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
    datasetio: inline::meta-reference
    scoring: inline::meta-reference
    eval: inline::meta-reference
image_name: my-custom-stack
image_type: container

# Optional: External providers directory
external_providers_dir: ~/.llama/providers.d

2. Build from Custom Configuration¶

llama stack build --config my-custom-build.yaml

Image Types¶

Container Images¶

Best for production deployments and Kubernetes:

llama stack build --template ollama --image-type container

Advantages: - Consistent across environments - Easy to deploy in Kubernetes - Isolated dependencies - Reproducible builds

Conda Environments¶

Good for development and local testing:

llama stack build --template ollama --image-type conda

Advantages: - Fast iteration during development - Easy dependency management - Good for experimentation

Virtual Environments¶

Lightweight option for Python-only setups:

llama stack build --template ollama --image-type venv

Custom Providers¶

Adding External Providers¶

1. Create Provider Configuration¶

Create ~/.llama/providers.d/custom-ollama.yaml:

adapter:
  adapter_type: custom_ollama
  pip_packages:
    - ollama
    - aiohttp
    - llama-stack-provider-ollama
  config_class: llama_stack_ollama_provider.config.OllamaImplConfig
  module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []

2. Reference in Build Configuration¶

name: custom-external-stack
distribution_spec:
  description: Custom distro with external providers
  providers:
    inference: remote::custom_ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_type: container
image_name: custom-external-stack
external_providers_dir: ~/.llama/providers.d

Using Custom Distributions with Kubernetes¶

1. Build and Push Container Image¶

# Build the distribution
llama stack build --template ollama --image-type container --image-name my-ollama-dist

# Tag for your registry
docker tag distribution-my-ollama-dist:dev my-registry.com/my-ollama-dist:v1.0.0

# Push to registry
docker push my-registry.com/my-ollama-dist:v1.0.0

2. Deploy with Kubernetes Operator¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: my-custom-distribution
  namespace: default
spec:
  replicas: 1
  server:
    distribution:
      image: "my-registry.com/my-ollama-dist:v1.0.0"  # Custom image
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "16Gi"
          cpu: "8"
      env:
        - name: INFERENCE_MODEL
          value: "llama3.2:1b"
        - name: OLLAMA_URL
          value: "http://ollama-server:11434"
    storage:
      size: "20Gi"

3. Verify Deployment¶

kubectl get llamastackdistribution my-custom-distribution
kubectl get pods -l app=llama-stack
kubectl logs -l app=llama-stack

Examples¶

Example 1: Custom Ollama Distribution¶

Build Configuration (`custom-ollama-build.yaml`)¶

name: custom-ollama
distribution_spec:
  description: Custom Ollama distribution with additional tools
  providers:
    inference: remote::ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: custom-ollama
image_type: container

Build and Deploy¶

# Build the distribution
llama stack build --config custom-ollama-build.yaml

# Tag and push
docker tag distribution-custom-ollama:dev my-registry.com/custom-ollama:latest
docker push my-registry.com/custom-ollama:latest

Kubernetes Deployment¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-ollama-dist
spec:
  replicas: 2
  server:
    distribution:
      image: "my-registry.com/custom-ollama:latest"
    containerSpec:
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "16Gi"
          cpu: "8"
      env:
        - name: INFERENCE_MODEL
          value: "llama3.2:3b"
        - name: OLLAMA_URL
          value: "http://ollama-service:11434"

Example 2: Custom vLLM Distribution¶

Build Configuration (`custom-vllm-build.yaml`)¶

name: custom-vllm
distribution_spec:
  description: Custom vLLM distribution with GPU optimization
  providers:
    inference: inline::vllm
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: custom-vllm
image_type: container

Enhanced Dockerfile¶

Create a custom Dockerfile to extend the base distribution:

FROM distribution-custom-vllm:dev

# Install additional dependencies
RUN pip install custom-optimization-library

# Add custom configuration
COPY custom-vllm-config.json /app/config.json

# Set environment variables
ENV VLLM_OPTIMIZATION_LEVEL=high
ENV CUSTOM_GPU_SETTINGS=enabled

# Expose port
EXPOSE 8321

Build and Deploy¶

# Build the LlamaStack distribution
llama stack build --config custom-vllm-build.yaml

# Build enhanced Docker image
docker build -t my-registry.com/enhanced-vllm:latest .

# Push to registry
docker push my-registry.com/enhanced-vllm:latest

Kubernetes Deployment¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: enhanced-vllm-dist
spec:
  replicas: 1
  server:
    distribution:
      image: "my-registry.com/enhanced-vllm:latest"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "2"
          memory: "32Gi"
          cpu: "8"
        limits:
          nvidia.com/gpu: "2"
          memory: "64Gi"
          cpu: "16"
      env:
        - name: INFERENCE_MODEL
          value: "meta-llama/Llama-2-13b-chat-hf"
        - name: VLLM_GPU_MEMORY_UTILIZATION
          value: "0.9"
        - name: VLLM_TENSOR_PARALLEL_SIZE
          value: "2"

Example 3: Multi-Provider Distribution¶

Build Configuration (`multi-provider-build.yaml`)¶

name: multi-provider
distribution_spec:
  description: Distribution with multiple inference providers
  providers:
    inference: 
      - remote::ollama
      - remote::vllm
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: multi-provider
image_type: container

Testing Custom Distributions¶

Local Testing¶

1. Run Locally with Docker¶

# Set environment variables
export LLAMA_STACK_PORT=8321
export INFERENCE_MODEL="llama3.2:1b"

# Run the custom distribution
docker run -d \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  distribution-custom-ollama:dev \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434

2. Test API Endpoints¶

# Health check
curl http://localhost:8321/v1/health

# List providers
curl http://localhost:8321/v1/providers

# Test inference
curl -X POST http://localhost:8321/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:1b",
    "prompt": "Hello, world!",
    "max_tokens": 50
  }'

Kubernetes Testing¶

1. Deploy to Test Namespace¶

kubectl create namespace llama-test
kubectl apply -f custom-distribution.yaml -n llama-test

2. Port Forward for Testing¶

kubectl port-forward svc/my-custom-distribution-service 8321:8321 -n llama-test

3. Run Tests¶

# Test from within cluster
kubectl run test-pod --image=curlimages/curl --rm -it -- \
  curl http://my-custom-distribution-service:8321/v1/health

Best Practices¶

Security¶

Use Private Registries: Store custom images in private container registries
Scan Images: Use container scanning tools to check for vulnerabilities
Minimal Base Images: Use slim or distroless base images when possible
Secrets Management: Use Kubernetes secrets for API keys and credentials

Performance¶

Multi-stage Builds: Use multi-stage Dockerfiles to reduce image size
Layer Caching: Optimize Dockerfile layer ordering for better caching
Resource Limits: Set appropriate CPU and memory limits
GPU Optimization: Configure GPU settings for inference workloads

Maintenance¶

Version Tags: Use semantic versioning for your custom images
Documentation: Document your custom configurations and dependencies
Testing: Implement automated testing for custom distributions
Monitoring: Set up monitoring and logging for custom deployments

Development Workflow¶

Local Development: Use conda/venv builds for rapid iteration
CI/CD Integration: Automate building and testing of custom distributions
Staging Environment: Test in staging before production deployment
Rollback Strategy: Maintain previous versions for quick rollbacks

Troubleshooting¶

Common Issues¶

Build Failures¶

# Check build logs
llama stack build --template ollama --image-type container --verbose

# Verify dependencies
llama stack build --config my-build.yaml --print-deps-only

Runtime Issues¶

# Check container logs
docker logs <container-id>

# Debug with interactive shell
docker run -it --entrypoint /bin/bash distribution-custom:dev

Kubernetes Issues¶

# Check pod status
kubectl describe pod <pod-name>

# View logs
kubectl logs <pod-name> -f

# Check events
kubectl get events --sort-by=.metadata.creationTimestamp

Getting Help¶

LlamaStack Documentation: Official docs
GitHub Issues: Report bugs and ask questions
Community Forums: Join the LlamaStack community discussions
Operator Documentation: Check the Kubernetes operator guides

Next Steps¶

vLLM Distribution - Learn about vLLM-specific configurations
Ollama Distribution - Explore Ollama distribution options
Configuration Reference - Complete API reference
Scaling Guide - Scale your custom distributions

Bring Your Own (BYO) Distributions¶

Overview¶

Supported vs BYO Distributions¶

Why Build Custom Distributions?¶

Building LlamaStack Distributions¶

Prerequisites¶

Quick Start: Building from Templates¶

1. List Available Templates¶

2. Build from Template¶

3. Interactive Build¶

Advanced: Custom Configuration Files¶

1. Create a Custom Build Configuration¶

2. Build from Custom Configuration¶

Image Types¶

Container Images¶

Conda Environments¶

Virtual Environments¶

Custom Providers¶

Adding External Providers¶

1. Create Provider Configuration¶

2. Reference in Build Configuration¶

Using Custom Distributions with Kubernetes¶

1. Build and Push Container Image¶

2. Deploy with Kubernetes Operator¶

3. Verify Deployment¶

Examples¶

Example 1: Custom Ollama Distribution¶

Build Configuration (custom-ollama-build.yaml)¶

Build and Deploy¶

Kubernetes Deployment¶

Example 2: Custom vLLM Distribution¶

Build Configuration (custom-vllm-build.yaml)¶

Enhanced Dockerfile¶

Build and Deploy¶

Kubernetes Deployment¶

Example 3: Multi-Provider Distribution¶

Build Configuration (multi-provider-build.yaml)¶

Testing Custom Distributions¶

Local Testing¶

1. Run Locally with Docker¶

2. Test API Endpoints¶

Kubernetes Testing¶

1. Deploy to Test Namespace¶

2. Port Forward for Testing¶

3. Run Tests¶

Best Practices¶

Security¶

Performance¶

Maintenance¶

Development Workflow¶

Troubleshooting¶

Common Issues¶

Build Failures¶

Runtime Issues¶

Kubernetes Issues¶

Getting Help¶

Next Steps¶

Build Configuration (`custom-ollama-build.yaml`)¶

Build Configuration (`custom-vllm-build.yaml`)¶

Build Configuration (`multi-provider-build.yaml`)¶