Bring Your Own (BYO) Distributions¶
The LlamaStack Kubernetes operator supports both pre-built distributions and custom "Bring Your Own" (BYO) distributions. This guide shows you how to build, customize, and deploy your own LlamaStack distributions.
Overview¶
Supported vs BYO Distributions¶
| Type | Description | Use Case | Configuration | 
|---|---|---|---|
| Supported | Pre-built distributions maintained by the LlamaStack team | Quick deployment, standard configurations | Use distribution.namefield | 
| BYO | Custom distributions you build and maintain | Custom providers, specialized configurations | Use distribution.imagefield | 
Why Build Custom Distributions?¶
- Custom Providers: Integrate with proprietary or specialized inference engines
- Specific Configurations: Tailor the stack for your exact requirements
- External Dependencies: Include additional libraries or tools
- Security Requirements: Control the entire build process and dependencies
- Performance Optimization: Optimize for your specific hardware or use case
Building LlamaStack Distributions¶
Prerequisites¶
- 
Install LlamaStack CLI: 
- 
Docker or Podman (for container builds): 
- 
Conda (for conda builds): 
Quick Start: Building from Templates¶
1. List Available Templates¶
This shows available templates like:
- ollama - Ollama-based inference
- vllm-gpu - vLLM with GPU support
- meta-reference-gpu - Meta's reference implementation
- bedrock - AWS Bedrock integration
- fireworks - Fireworks AI integration
2. Build from Template¶
# Build a container image from Ollama template
llama stack build --template ollama --image-type container
# Build a conda environment from vLLM template
llama stack build --template vllm-gpu --image-type conda
# Build with custom name
llama stack build --template ollama --image-type container --image-name my-custom-ollama
3. Interactive Build¶
This launches an interactive wizard:
> Enter a name for your Llama Stack (e.g. my-local-stack): my-custom-stack
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): container
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
> Enter provider for API inference: inline::meta-reference
> Enter provider for API safety: inline::llama-guard
> Enter provider for API agents: inline::meta-reference
> Enter provider for API memory: inline::faiss
> Enter provider for API datasetio: inline::meta-reference
> Enter provider for API scoring: inline::meta-reference
> Enter provider for API eval: inline::meta-reference
> Enter provider for API telemetry: inline::meta-reference
> (Optional) Enter a short description for your Llama Stack: My custom distribution
Advanced: Custom Configuration Files¶
1. Create a Custom Build Configuration¶
Create my-custom-build.yaml:
name: my-custom-stack
distribution_spec:
  description: Custom distribution with external Ollama
  providers:
    inference: remote::ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
    datasetio: inline::meta-reference
    scoring: inline::meta-reference
    eval: inline::meta-reference
image_name: my-custom-stack
image_type: container
# Optional: External providers directory
external_providers_dir: ~/.llama/providers.d
2. Build from Custom Configuration¶
Image Types¶
Container Images¶
Best for production deployments and Kubernetes:
Advantages: - Consistent across environments - Easy to deploy in Kubernetes - Isolated dependencies - Reproducible builds
Conda Environments¶
Good for development and local testing:
Advantages: - Fast iteration during development - Easy dependency management - Good for experimentation
Virtual Environments¶
Lightweight option for Python-only setups:
Custom Providers¶
Adding External Providers¶
1. Create Provider Configuration¶
Create ~/.llama/providers.d/custom-ollama.yaml:
adapter:
  adapter_type: custom_ollama
  pip_packages:
    - ollama
    - aiohttp
    - llama-stack-provider-ollama
  config_class: llama_stack_ollama_provider.config.OllamaImplConfig
  module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
2. Reference in Build Configuration¶
name: custom-external-stack
distribution_spec:
  description: Custom distro with external providers
  providers:
    inference: remote::custom_ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_type: container
image_name: custom-external-stack
external_providers_dir: ~/.llama/providers.d
Using Custom Distributions with Kubernetes¶
1. Build and Push Container Image¶
# Build the distribution
llama stack build --template ollama --image-type container --image-name my-ollama-dist
# Tag for your registry
docker tag distribution-my-ollama-dist:dev my-registry.com/my-ollama-dist:v1.0.0
# Push to registry
docker push my-registry.com/my-ollama-dist:v1.0.0
2. Deploy with Kubernetes Operator¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: my-custom-distribution
  namespace: default
spec:
  replicas: 1
  server:
    distribution:
      image: "my-registry.com/my-ollama-dist:v1.0.0"  # Custom image
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "16Gi"
          cpu: "8"
      env:
        - name: INFERENCE_MODEL
          value: "llama3.2:1b"
        - name: OLLAMA_URL
          value: "http://ollama-server:11434"
    storage:
      size: "20Gi"
3. Verify Deployment¶
kubectl get llamastackdistribution my-custom-distribution
kubectl get pods -l app=llama-stack
kubectl logs -l app=llama-stack
Examples¶
Example 1: Custom Ollama Distribution¶
Build Configuration (custom-ollama-build.yaml)¶
name: custom-ollama
distribution_spec:
  description: Custom Ollama distribution with additional tools
  providers:
    inference: remote::ollama
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: custom-ollama
image_type: container
Build and Deploy¶
# Build the distribution
llama stack build --config custom-ollama-build.yaml
# Tag and push
docker tag distribution-custom-ollama:dev my-registry.com/custom-ollama:latest
docker push my-registry.com/custom-ollama:latest
Kubernetes Deployment¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-ollama-dist
spec:
  replicas: 2
  server:
    distribution:
      image: "my-registry.com/custom-ollama:latest"
    containerSpec:
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "16Gi"
          cpu: "8"
      env:
        - name: INFERENCE_MODEL
          value: "llama3.2:3b"
        - name: OLLAMA_URL
          value: "http://ollama-service:11434"
Example 2: Custom vLLM Distribution¶
Build Configuration (custom-vllm-build.yaml)¶
name: custom-vllm
distribution_spec:
  description: Custom vLLM distribution with GPU optimization
  providers:
    inference: inline::vllm
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: custom-vllm
image_type: container
Enhanced Dockerfile¶
Create a custom Dockerfile to extend the base distribution:
FROM distribution-custom-vllm:dev
# Install additional dependencies
RUN pip install custom-optimization-library
# Add custom configuration
COPY custom-vllm-config.json /app/config.json
# Set environment variables
ENV VLLM_OPTIMIZATION_LEVEL=high
ENV CUSTOM_GPU_SETTINGS=enabled
# Expose port
EXPOSE 8321
Build and Deploy¶
# Build the LlamaStack distribution
llama stack build --config custom-vllm-build.yaml
# Build enhanced Docker image
docker build -t my-registry.com/enhanced-vllm:latest .
# Push to registry
docker push my-registry.com/enhanced-vllm:latest
Kubernetes Deployment¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: enhanced-vllm-dist
spec:
  replicas: 1
  server:
    distribution:
      image: "my-registry.com/enhanced-vllm:latest"
    containerSpec:
      resources:
        requests:
          nvidia.com/gpu: "2"
          memory: "32Gi"
          cpu: "8"
        limits:
          nvidia.com/gpu: "2"
          memory: "64Gi"
          cpu: "16"
      env:
        - name: INFERENCE_MODEL
          value: "meta-llama/Llama-2-13b-chat-hf"
        - name: VLLM_GPU_MEMORY_UTILIZATION
          value: "0.9"
        - name: VLLM_TENSOR_PARALLEL_SIZE
          value: "2"
Example 3: Multi-Provider Distribution¶
Build Configuration (multi-provider-build.yaml)¶
name: multi-provider
distribution_spec:
  description: Distribution with multiple inference providers
  providers:
    inference: 
      - remote::ollama
      - remote::vllm
    memory: inline::faiss
    safety: inline::llama-guard
    agents: inline::meta-reference
    telemetry: inline::meta-reference
image_name: multi-provider
image_type: container
Testing Custom Distributions¶
Local Testing¶
1. Run Locally with Docker¶
# Set environment variables
export LLAMA_STACK_PORT=8321
export INFERENCE_MODEL="llama3.2:1b"
# Run the custom distribution
docker run -d \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  distribution-custom-ollama:dev \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434
2. Test API Endpoints¶
# Health check
curl http://localhost:8321/v1/health
# List providers
curl http://localhost:8321/v1/providers
# Test inference
curl -X POST http://localhost:8321/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:1b",
    "prompt": "Hello, world!",
    "max_tokens": 50
  }'
Kubernetes Testing¶
1. Deploy to Test Namespace¶
2. Port Forward for Testing¶
3. Run Tests¶
# Test from within cluster
kubectl run test-pod --image=curlimages/curl --rm -it -- \
  curl http://my-custom-distribution-service:8321/v1/health
Best Practices¶
Security¶
- Use Private Registries: Store custom images in private container registries
- Scan Images: Use container scanning tools to check for vulnerabilities
- Minimal Base Images: Use slim or distroless base images when possible
- Secrets Management: Use Kubernetes secrets for API keys and credentials
Performance¶
- Multi-stage Builds: Use multi-stage Dockerfiles to reduce image size
- Layer Caching: Optimize Dockerfile layer ordering for better caching
- Resource Limits: Set appropriate CPU and memory limits
- GPU Optimization: Configure GPU settings for inference workloads
Maintenance¶
- Version Tags: Use semantic versioning for your custom images
- Documentation: Document your custom configurations and dependencies
- Testing: Implement automated testing for custom distributions
- Monitoring: Set up monitoring and logging for custom deployments
Development Workflow¶
- Local Development: Use conda/venv builds for rapid iteration
- CI/CD Integration: Automate building and testing of custom distributions
- Staging Environment: Test in staging before production deployment
- Rollback Strategy: Maintain previous versions for quick rollbacks
Troubleshooting¶
Common Issues¶
Build Failures¶
# Check build logs
llama stack build --template ollama --image-type container --verbose
# Verify dependencies
llama stack build --config my-build.yaml --print-deps-only
Runtime Issues¶
# Check container logs
docker logs <container-id>
# Debug with interactive shell
docker run -it --entrypoint /bin/bash distribution-custom:dev
Kubernetes Issues¶
# Check pod status
kubectl describe pod <pod-name>
# View logs
kubectl logs <pod-name> -f
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp
Getting Help¶
- LlamaStack Documentation: Official docs
- GitHub Issues: Report bugs and ask questions
- Community Forums: Join the LlamaStack community discussions
- Operator Documentation: Check the Kubernetes operator guides
Next Steps¶
- vLLM Distribution - Learn about vLLM-specific configurations
- Ollama Distribution - Explore Ollama distribution options
- Configuration Reference - Complete API reference
- Scaling Guide - Scale your custom distributions