Basic Deployment Example¶
This example demonstrates a simple LlamaStack deployment suitable for development and testing environments.
Overview¶
This configuration creates a single-replica LlamaStack instance using the ollama distribution with basic resource allocation.
Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: basic-llamastack
  namespace: default
  labels:
    app: llamastack
    environment: development
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      name: "llama-stack"
      port: 8321
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "4Gi"
          cpu: "1"
      env:
      - name: LOG_LEVEL
        value: "info"
      - name: LLAMASTACK_PORT
        value: "8321"
Deployment Steps¶
- 
Save the configuration to a file named basic-deployment.yaml
- 
Apply the configuration: 
- 
Verify the deployment: 
- 
Check the status: 
Expected Resources¶
This deployment will create:
- Deployment: basic-llamastackwith 1 replica
- Service: basic-llamastackexposing port 8321
- ConfigMap: Configuration for the LlamaStack instance
- Pod: Single pod running the LlamaStack container
Accessing the Service¶
Port Forward (Development)¶
Access at: http://localhost:8321
Service Exposure (Testing)¶
Create a NodePort service for external access:
apiVersion: v1
kind: Service
metadata:
  name: basic-llamastack-nodeport
spec:
  type: NodePort
  selector:
    app: llama-stack
    llamastack.io/instance: basic-llamastack
  ports:
  - port: 8321
    targetPort: 8321
    nodePort: 30321
    protocol: TCP
Testing the Deployment¶
Health Check¶
Expected response:
API Endpoints¶
# List providers
curl http://localhost:8321/providers
# Get distribution info
curl http://localhost:8321/distribution/info
# List available models
curl http://localhost:8321/models
Resource Usage¶
This basic deployment typically uses:
- CPU: 0.5-1 core
- Memory: 2-4 GB
- Storage: Ephemeral (no persistent storage)
- Network: Single service port (8321)
Monitoring¶
Pod Status¶
# Check pod status
kubectl get pods -l app=llama-stack
# View pod details
kubectl describe pod -l app=llama-stack
# Check resource usage
kubectl top pod -l app=llama-stack
Logs¶
# View recent logs
kubectl logs deployment/basic-llamastack
# Follow logs in real-time
kubectl logs -f deployment/basic-llamastack
# View logs with timestamps
kubectl logs deployment/basic-llamastack --timestamps
Scaling¶
Manual Scaling¶
Scale the deployment to multiple replicas:
# Scale to 3 replicas
kubectl scale llamastackdistribution basic-llamastack --replicas=3
# Verify scaling
kubectl get pods -l app=llama-stack
Resource Updates¶
Update resource allocations:
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: basic-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "meta-reference"
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "4Gi"  # Increased from 2Gi
          cpu: "1"       # Increased from 500m
        limits:
          memory: "8Gi"  # Increased from 4Gi
          cpu: "2"       # Increased from 1
Apply the update:
Troubleshooting¶
Common Issues¶
Pod not starting:
# Check pod events
kubectl describe pod -l app=llama-stack
# Check resource constraints
kubectl describe node
Service not accessible:
# Check service endpoints
kubectl get endpoints basic-llamastack
# Verify service configuration
kubectl describe service basic-llamastack
Application errors:
# Check application logs
kubectl logs deployment/basic-llamastack --tail=50
# Check for configuration issues
kubectl get configmap -l app=llama-stack
Debug Commands¶
# Get detailed resource information
kubectl get llamastackdistribution basic-llamastack -o yaml
# Check events in the namespace
kubectl get events --sort-by=.metadata.creationTimestamp
# Exec into the pod for debugging
kubectl exec -it deployment/basic-llamastack -- /bin/bash
Cleanup¶
Remove the deployment:
# Delete the LlamaStack instance
kubectl delete llamastackdistribution basic-llamastack
# Verify cleanup
kubectl get pods -l app=llama-stack
kubectl get services -l app=llama-stack
Next Steps¶
After successfully deploying this basic example:
- Try the production setup - Learn about production-ready configurations
- Add persistent storage - Configure persistent volumes
- Set up monitoring - Add observability
- Configure scaling - Learn about auto-scaling
Variations¶
Different Distribution¶
Use the Ollama distribution instead:
spec:
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      env:
      - name: OLLAMA_HOST
        value: "0.0.0.0"
Custom Environment Variables¶
Add custom configuration:
spec:
  server:
    containerSpec:
      env:
      - name: LLAMASTACK_CONFIG_PATH
        value: "/config/llamastack.yaml"
      - name: MODEL_CACHE_DIR
        value: "/tmp/models"
      - name: MAX_CONCURRENT_REQUESTS
        value: "10"
Resource Constraints¶
For resource-constrained environments:
```yaml spec: server: containerSpec: resources: requests: memory: "1Gi" cpu: "250m" limits: memory: "2Gi" cpu: "500m"