Understanding LlamaStack Distributions¶
This guide explains the different ways to deploy LlamaStack using the Kubernetes operator, focusing on the distinction between Supported Distributions and Bring-Your-Own (BYO) Distributions.
Distribution Types Overview¶
The LlamaStack Kubernetes Operator supports two main approaches for deploying LlamaStack:
๐ฏ Supported Distributions (Recommended)¶
Pre-configured, tested distributions maintained by the LlamaStack team with specific provider integrations.
๐ ๏ธ Bring-Your-Own (BYO) Distributions¶
Custom container images that you build and maintain yourself.
Supported Distributions¶
What are Supported Distributions?¶
Supported distributions are pre-built, tested container images that include: - โ Specific provider integrations (Ollama, vLLM, NVIDIA, etc.) - โ Optimized configurations for each provider - โ Tested compatibility with the operator - โ Regular updates and security patches - โ Documentation and examples
Available Pre-Built Distributions¶
Self-Hosted Distributions¶
These require you to provide the underlying infrastructure:
| Distribution | Provider | Use Case | Infrastructure Required | 
|---|---|---|---|
| ollama | Ollama | Local inference with Ollama server | Ollama server | 
| vllm-gpu | vLLM | High-performance GPU inference | GPU infrastructure | 
| tgi | Text Generation Inference | Hugging Face TGI | TGI server setup | 
| nvidia | NVIDIA | NVIDIA AI services | NVIDIA infrastructure | 
| remote-vllm | Remote vLLM | Remote vLLM server | External vLLM server | 
| open-benchmark | Benchmarking | Performance testing | Testing infrastructure | 
External API Distributions¶
These only require API keys to external services:
| Distribution | Provider | Use Case | Requirements | 
|---|---|---|---|
| hf-endpoint | Hugging Face | Hugging Face Inference Endpoints | HF API key | 
| hf-serverless | Hugging Face | Hugging Face Serverless | HF API key | 
| bedrock | AWS Bedrock | AWS Bedrock models | AWS credentials | 
| together | Together AI | Together AI API | Together API key | 
| fireworks | Fireworks AI | Fireworks AI API | Fireworks API key | 
| cerebras | Cerebras | Cerebras inference | Cerebras API key | 
| sambanova | SambaNova | SambaNova inference | SambaNova API key | 
| watsonx | IBM watsonx | IBM watsonx models | IBM API key | 
| passthrough | Generic | API passthrough | Target API access | 
Using Supported Distributions¶
Basic Syntax¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: my-distribution
spec:
  server:
    distribution:
      name: "distribution-name"  # Use distribution name
    # ... other configuration
Example: Ollama Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: ollama-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
      - name: OLLAMA_URL
        value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
    storage:
      size: "20Gi"
Example: vLLM GPU Distribution¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: vllm-gpu-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "vllm-gpu"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "2"
          memory: "8Gi"
          nvidia.com/gpu: "1"
        limits:
          cpu: "4"
          memory: "16Gi"
          nvidia.com/gpu: "1"
      env:
      - name: MODEL_NAME
        value: "meta-llama/Llama-2-7b-chat-hf"
      - name: TENSOR_PARALLEL_SIZE
        value: "1"
    storage:
      size: "50Gi"
Benefits of Supported Distributions¶
- ๐ Quick Setup: No need to build custom images
- ๐ Security: Regular security updates from LlamaStack team
- ๐ Documentation: Comprehensive guides and examples
- ๐งช Tested: Thoroughly tested with the operator
- ๐ง Optimized: Pre-configured for optimal performance
- ๐ Support: Community and official support available
Bring-Your-Own (BYO) Distributions¶
What are BYO Distributions?¶
BYO distributions allow you to use custom container images that you build and maintain: - ๐ ๏ธ Custom integrations not available in supported distributions - ๐จ Specialized configurations for your use case - ๐ง Custom dependencies and libraries - ๐ฆ Private or proprietary model integrations
Using BYO Distributions¶
Basic Syntax¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: my-custom-distribution
spec:
  server:
    distribution:
      image: "your-registry.com/custom-llamastack:tag"  # Use custom image
    # ... other configuration
Example: Custom Image¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-llamastack
spec:
  replicas: 1
  server:
    distribution:
      image: "myregistry.com/custom-llamastack:v1.0.0"
    containerSpec:
      port: 8321
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"
        limits:
          cpu: "4"
          memory: "8Gi"
      env:
      - name: CUSTOM_CONFIG_PATH
        value: "/app/config/custom.yaml"
      - name: API_KEY
        valueFrom:
          secretKeyRef:
            name: custom-credentials
            key: api-key
    storage:
      size: "100Gi"
    podOverrides:
      volumes:
      - name: custom-config
        configMap:
          name: custom-llamastack-config
      volumeMounts:
      - name: custom-config
        mountPath: "/app/config"
        readOnly: true
Building Custom Images¶
Example Dockerfile¶
# Start from a supported distribution or base image
FROM llamastack/llamastack:latest
# Add your custom dependencies
RUN pip install custom-package-1 custom-package-2
# Copy custom configuration
COPY custom-config/ /app/config/
# Copy custom code
COPY src/ /app/src/
# Set custom environment variables
ENV CUSTOM_SETTING=value
# Override entrypoint if needed
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
Building and Pushing¶
# Build your custom image
docker build -t myregistry.com/custom-llamastack:v1.0.0 .
# Push to your registry
docker push myregistry.com/custom-llamastack:v1.0.0
BYO Distribution Considerations¶
Advantages¶
- ๐ฏ Full Control: Complete customization of the stack
- ๐ง Custom Integrations: Add proprietary or specialized providers
- ๐ฆ Private Models: Include private or fine-tuned models
- โก Optimizations: Custom performance optimizations
Responsibilities¶
- ๐ Security: You maintain security updates
- ๐งช Testing: You test compatibility with the operator
- ๐ Documentation: You document your custom setup
- ๐ Support: Limited community support for custom images
- ๐ Updates: You manage updates and compatibility
Key Differences Summary¶
| Aspect | Supported Distributions | BYO Distributions | 
|---|---|---|
| Setup Complexity | โ Simple (just specify name) | ๐ง Complex (build & maintain image) | 
| Maintenance | โ Handled by LlamaStack team | โ Your responsibility | 
| Security Updates | โ Automatic | โ Manual | 
| Documentation | โ Comprehensive | โ You create | 
| Support | โ Community + Official | โ ๏ธ Limited | 
| Customization | โ ๏ธ Limited to configuration | โ Full control | 
| Testing | โ Pre-tested | โ You test | 
| Time to Deploy | โ Minutes | โฑ๏ธ Hours/Days | 
Choosing the Right Approach¶
Use Supported Distributions When:¶
- โ Your use case matches available providers (Ollama, vLLM, etc.)
- โ You want quick setup and deployment
- โ You prefer maintained and tested solutions
- โ You need community support
- โ Security and updates are important
Use BYO Distributions When:¶
- ๐ ๏ธ You need custom provider integrations
- ๐ง You have specialized requirements
- ๐ฆ You use proprietary or private models
- โก You need specific performance optimizations
- ๐ฏ You have the expertise to maintain custom images
Migration Between Approaches¶
From Supported to BYO¶
# Before (supported)
spec:
  server:
    distribution:
      name: "ollama"
# After (BYO)
spec:
  server:
    distribution:
      image: "myregistry.com/custom-ollama:v1.0.0"
From BYO to Supported¶
# Before (BYO)
spec:
  server:
    distribution:
      image: "myregistry.com/custom-vllm:v1.0.0"
# After (supported)
spec:
  server:
    distribution:
      name: "vllm-gpu"
Best Practices¶
For Supported Distributions¶
- Start Simple: Begin with basic configuration
- Use Environment Variables: Configure via envsection
- Monitor Resources: Set appropriate resource limits
- Check Documentation: Review provider-specific guides
For BYO Distributions¶
- Base on Supported Images: Start from llamastack/llamastack:latest
- Document Everything: Maintain clear documentation
- Test Thoroughly: Test with the operator before production
- Version Control: Tag and version your custom images
- Security Scanning: Regularly scan for vulnerabilities
Next Steps¶
- Configuration Reference - Detailed configuration options
- Basic Deployment - Simple deployment examples
- Production Setup - Production-ready configurations
- Custom Images Guide - Building custom images
- Troubleshooting - Common issues and solutions