Bring Your Own (BYO) Distributions¶
The LlamaStack Kubernetes operator supports both pre-built distributions and custom "Bring Your Own" (BYO) distributions. This guide shows you how to build, customize, and deploy your own LlamaStack distributions.
Overview¶
Supported vs BYO Distributions¶
Type | Description | Use Case | Configuration |
---|---|---|---|
Supported | Pre-built distributions maintained by the LlamaStack team | Quick deployment, standard configurations | Use distribution.name field |
BYO | Custom distributions you build and maintain | Custom providers, specialized configurations | Use distribution.image field |
Why Build Custom Distributions?¶
- Custom Providers: Integrate with proprietary or specialized inference engines
- Specific Configurations: Tailor the stack for your exact requirements
- External Dependencies: Include additional libraries or tools
- Security Requirements: Control the entire build process and dependencies
- Performance Optimization: Optimize for your specific hardware or use case
Building LlamaStack Distributions¶
Prerequisites¶
-
Install LlamaStack CLI:
-
Docker or Podman (for container builds):
-
Conda (for conda builds):
Quick Start: Building from Templates¶
1. List Available Templates¶
This shows available templates like:
- ollama
- Ollama-based inference
- vllm-gpu
- vLLM with GPU support
- meta-reference-gpu
- Meta's reference implementation
- bedrock
- AWS Bedrock integration
- fireworks
- Fireworks AI integration
2. Build from Template¶
# Build a container image from Ollama template
llama stack build --template ollama --image-type container
# Build a conda environment from vLLM template
llama stack build --template vllm-gpu --image-type conda
# Build with custom name
llama stack build --template ollama --image-type container --image-name my-custom-ollama
3. Interactive Build¶
This launches an interactive wizard:
> Enter a name for your Llama Stack (e.g. my-local-stack): my-custom-stack
> Enter the image type you want your Llama Stack to be built as (container or conda or venv): container
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
> Enter provider for API inference: inline::meta-reference
> Enter provider for API safety: inline::llama-guard
> Enter provider for API agents: inline::meta-reference
> Enter provider for API memory: inline::faiss
> Enter provider for API datasetio: inline::meta-reference
> Enter provider for API scoring: inline::meta-reference
> Enter provider for API eval: inline::meta-reference
> Enter provider for API telemetry: inline::meta-reference
> (Optional) Enter a short description for your Llama Stack: My custom distribution
Advanced: Custom Configuration Files¶
1. Create a Custom Build Configuration¶
Create my-custom-build.yaml
:
name: my-custom-stack
distribution_spec:
description: Custom distribution with external Ollama
providers:
inference: remote::ollama
memory: inline::faiss
safety: inline::llama-guard
agents: inline::meta-reference
telemetry: inline::meta-reference
datasetio: inline::meta-reference
scoring: inline::meta-reference
eval: inline::meta-reference
image_name: my-custom-stack
image_type: container
# Optional: External providers directory
external_providers_dir: ~/.llama/providers.d
2. Build from Custom Configuration¶
Image Types¶
Container Images¶
Best for production deployments and Kubernetes:
Advantages: - Consistent across environments - Easy to deploy in Kubernetes - Isolated dependencies - Reproducible builds
Conda Environments¶
Good for development and local testing:
Advantages: - Fast iteration during development - Easy dependency management - Good for experimentation
Virtual Environments¶
Lightweight option for Python-only setups:
Custom Providers¶
Adding External Providers¶
1. Create Provider Configuration¶
Create ~/.llama/providers.d/custom-ollama.yaml
:
adapter:
adapter_type: custom_ollama
pip_packages:
- ollama
- aiohttp
- llama-stack-provider-ollama
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
2. Reference in Build Configuration¶
name: custom-external-stack
distribution_spec:
description: Custom distro with external providers
providers:
inference: remote::custom_ollama
memory: inline::faiss
safety: inline::llama-guard
agents: inline::meta-reference
telemetry: inline::meta-reference
image_type: container
image_name: custom-external-stack
external_providers_dir: ~/.llama/providers.d
Using Custom Distributions with Kubernetes¶
1. Build and Push Container Image¶
# Build the distribution
llama stack build --template ollama --image-type container --image-name my-ollama-dist
# Tag for your registry
docker tag distribution-my-ollama-dist:dev my-registry.com/my-ollama-dist:v1.0.0
# Push to registry
docker push my-registry.com/my-ollama-dist:v1.0.0
2. Deploy with Kubernetes Operator¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: my-custom-distribution
namespace: default
spec:
replicas: 1
server:
distribution:
image: "my-registry.com/my-ollama-dist:v1.0.0" # Custom image
containerSpec:
port: 8321
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
env:
- name: INFERENCE_MODEL
value: "llama3.2:1b"
- name: OLLAMA_URL
value: "http://ollama-server:11434"
storage:
size: "20Gi"
3. Verify Deployment¶
kubectl get llamastackdistribution my-custom-distribution
kubectl get pods -l app=llama-stack
kubectl logs -l app=llama-stack
Examples¶
Example 1: Custom Ollama Distribution¶
Build Configuration (custom-ollama-build.yaml
)¶
name: custom-ollama
distribution_spec:
description: Custom Ollama distribution with additional tools
providers:
inference: remote::ollama
memory: inline::faiss
safety: inline::llama-guard
agents: inline::meta-reference
telemetry: inline::meta-reference
image_name: custom-ollama
image_type: container
Build and Deploy¶
# Build the distribution
llama stack build --config custom-ollama-build.yaml
# Tag and push
docker tag distribution-custom-ollama:dev my-registry.com/custom-ollama:latest
docker push my-registry.com/custom-ollama:latest
Kubernetes Deployment¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: custom-ollama-dist
spec:
replicas: 2
server:
distribution:
image: "my-registry.com/custom-ollama:latest"
containerSpec:
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
env:
- name: INFERENCE_MODEL
value: "llama3.2:3b"
- name: OLLAMA_URL
value: "http://ollama-service:11434"
Example 2: Custom vLLM Distribution¶
Build Configuration (custom-vllm-build.yaml
)¶
name: custom-vllm
distribution_spec:
description: Custom vLLM distribution with GPU optimization
providers:
inference: inline::vllm
memory: inline::faiss
safety: inline::llama-guard
agents: inline::meta-reference
telemetry: inline::meta-reference
image_name: custom-vllm
image_type: container
Enhanced Dockerfile¶
Create a custom Dockerfile to extend the base distribution:
FROM distribution-custom-vllm:dev
# Install additional dependencies
RUN pip install custom-optimization-library
# Add custom configuration
COPY custom-vllm-config.json /app/config.json
# Set environment variables
ENV VLLM_OPTIMIZATION_LEVEL=high
ENV CUSTOM_GPU_SETTINGS=enabled
# Expose port
EXPOSE 8321
Build and Deploy¶
# Build the LlamaStack distribution
llama stack build --config custom-vllm-build.yaml
# Build enhanced Docker image
docker build -t my-registry.com/enhanced-vllm:latest .
# Push to registry
docker push my-registry.com/enhanced-vllm:latest
Kubernetes Deployment¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: enhanced-vllm-dist
spec:
replicas: 1
server:
distribution:
image: "my-registry.com/enhanced-vllm:latest"
containerSpec:
resources:
requests:
nvidia.com/gpu: "2"
memory: "32Gi"
cpu: "8"
limits:
nvidia.com/gpu: "2"
memory: "64Gi"
cpu: "16"
env:
- name: INFERENCE_MODEL
value: "meta-llama/Llama-2-13b-chat-hf"
- name: VLLM_GPU_MEMORY_UTILIZATION
value: "0.9"
- name: VLLM_TENSOR_PARALLEL_SIZE
value: "2"
Example 3: Multi-Provider Distribution¶
Build Configuration (multi-provider-build.yaml
)¶
name: multi-provider
distribution_spec:
description: Distribution with multiple inference providers
providers:
inference:
- remote::ollama
- remote::vllm
memory: inline::faiss
safety: inline::llama-guard
agents: inline::meta-reference
telemetry: inline::meta-reference
image_name: multi-provider
image_type: container
Testing Custom Distributions¶
Local Testing¶
1. Run Locally with Docker¶
# Set environment variables
export LLAMA_STACK_PORT=8321
export INFERENCE_MODEL="llama3.2:1b"
# Run the custom distribution
docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
distribution-custom-ollama:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env OLLAMA_URL=http://host.docker.internal:11434
2. Test API Endpoints¶
# Health check
curl http://localhost:8321/v1/health
# List providers
curl http://localhost:8321/v1/providers
# Test inference
curl -X POST http://localhost:8321/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:1b",
"prompt": "Hello, world!",
"max_tokens": 50
}'
Kubernetes Testing¶
1. Deploy to Test Namespace¶
2. Port Forward for Testing¶
3. Run Tests¶
# Test from within cluster
kubectl run test-pod --image=curlimages/curl --rm -it -- \
curl http://my-custom-distribution-service:8321/v1/health
Best Practices¶
Security¶
- Use Private Registries: Store custom images in private container registries
- Scan Images: Use container scanning tools to check for vulnerabilities
- Minimal Base Images: Use slim or distroless base images when possible
- Secrets Management: Use Kubernetes secrets for API keys and credentials
Performance¶
- Multi-stage Builds: Use multi-stage Dockerfiles to reduce image size
- Layer Caching: Optimize Dockerfile layer ordering for better caching
- Resource Limits: Set appropriate CPU and memory limits
- GPU Optimization: Configure GPU settings for inference workloads
Maintenance¶
- Version Tags: Use semantic versioning for your custom images
- Documentation: Document your custom configurations and dependencies
- Testing: Implement automated testing for custom distributions
- Monitoring: Set up monitoring and logging for custom deployments
Development Workflow¶
- Local Development: Use conda/venv builds for rapid iteration
- CI/CD Integration: Automate building and testing of custom distributions
- Staging Environment: Test in staging before production deployment
- Rollback Strategy: Maintain previous versions for quick rollbacks
Troubleshooting¶
Common Issues¶
Build Failures¶
# Check build logs
llama stack build --template ollama --image-type container --verbose
# Verify dependencies
llama stack build --config my-build.yaml --print-deps-only
Runtime Issues¶
# Check container logs
docker logs <container-id>
# Debug with interactive shell
docker run -it --entrypoint /bin/bash distribution-custom:dev
Kubernetes Issues¶
# Check pod status
kubectl describe pod <pod-name>
# View logs
kubectl logs <pod-name> -f
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp
Getting Help¶
- LlamaStack Documentation: Official docs
- GitHub Issues: Report bugs and ask questions
- Community Forums: Join the LlamaStack community discussions
- Operator Documentation: Check the Kubernetes operator guides
Next Steps¶
- vLLM Distribution - Learn about vLLM-specific configurations
- Ollama Distribution - Explore Ollama distribution options
- Configuration Reference - Complete API reference
- Scaling Guide - Scale your custom distributions