Basic Deployment Example¶
This example demonstrates a simple LlamaStack deployment suitable for development and testing environments.
Overview¶
This configuration creates a single-replica LlamaStack instance using the ollama distribution with basic resource allocation.
Configuration¶
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: basic-llamastack
namespace: default
labels:
app: llamastack
environment: development
spec:
replicas: 1
server:
distribution:
name: "ollama"
containerSpec:
name: "llama-stack"
port: 8321
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "1"
env:
- name: LOG_LEVEL
value: "info"
- name: LLAMASTACK_PORT
value: "8321"
Deployment Steps¶
-
Save the configuration to a file named
basic-deployment.yaml
-
Apply the configuration:
-
Verify the deployment:
-
Check the status:
Expected Resources¶
This deployment will create:
- Deployment:
basic-llamastack
with 1 replica - Service:
basic-llamastack
exposing port 8321 - ConfigMap: Configuration for the LlamaStack instance
- Pod: Single pod running the LlamaStack container
Accessing the Service¶
Port Forward (Development)¶
Access at: http://localhost:8321
Service Exposure (Testing)¶
Create a NodePort service for external access:
apiVersion: v1
kind: Service
metadata:
name: basic-llamastack-nodeport
spec:
type: NodePort
selector:
app: llama-stack
llamastack.io/instance: basic-llamastack
ports:
- port: 8321
targetPort: 8321
nodePort: 30321
protocol: TCP
Testing the Deployment¶
Health Check¶
Expected response:
API Endpoints¶
# List providers
curl http://localhost:8321/providers
# Get distribution info
curl http://localhost:8321/distribution/info
# List available models
curl http://localhost:8321/models
Resource Usage¶
This basic deployment typically uses:
- CPU: 0.5-1 core
- Memory: 2-4 GB
- Storage: Ephemeral (no persistent storage)
- Network: Single service port (8321)
Monitoring¶
Pod Status¶
# Check pod status
kubectl get pods -l app=llama-stack
# View pod details
kubectl describe pod -l app=llama-stack
# Check resource usage
kubectl top pod -l app=llama-stack
Logs¶
# View recent logs
kubectl logs deployment/basic-llamastack
# Follow logs in real-time
kubectl logs -f deployment/basic-llamastack
# View logs with timestamps
kubectl logs deployment/basic-llamastack --timestamps
Scaling¶
Manual Scaling¶
Scale the deployment to multiple replicas:
# Scale to 3 replicas
kubectl scale llamastackdistribution basic-llamastack --replicas=3
# Verify scaling
kubectl get pods -l app=llama-stack
Resource Updates¶
Update resource allocations:
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: basic-llamastack
spec:
replicas: 1
server:
distribution:
name: "meta-reference"
containerSpec:
port: 8321
resources:
requests:
memory: "4Gi" # Increased from 2Gi
cpu: "1" # Increased from 500m
limits:
memory: "8Gi" # Increased from 4Gi
cpu: "2" # Increased from 1
Apply the update:
Troubleshooting¶
Common Issues¶
Pod not starting:
# Check pod events
kubectl describe pod -l app=llama-stack
# Check resource constraints
kubectl describe node
Service not accessible:
# Check service endpoints
kubectl get endpoints basic-llamastack
# Verify service configuration
kubectl describe service basic-llamastack
Application errors:
# Check application logs
kubectl logs deployment/basic-llamastack --tail=50
# Check for configuration issues
kubectl get configmap -l app=llama-stack
Debug Commands¶
# Get detailed resource information
kubectl get llamastackdistribution basic-llamastack -o yaml
# Check events in the namespace
kubectl get events --sort-by=.metadata.creationTimestamp
# Exec into the pod for debugging
kubectl exec -it deployment/basic-llamastack -- /bin/bash
Cleanup¶
Remove the deployment:
# Delete the LlamaStack instance
kubectl delete llamastackdistribution basic-llamastack
# Verify cleanup
kubectl get pods -l app=llama-stack
kubectl get services -l app=llama-stack
Next Steps¶
After successfully deploying this basic example:
- Try the production setup - Learn about production-ready configurations
- Add persistent storage - Configure persistent volumes
- Set up monitoring - Add observability
- Configure scaling - Learn about auto-scaling
Variations¶
Different Distribution¶
Use the Ollama distribution instead:
spec:
server:
distribution:
name: "ollama"
containerSpec:
port: 8321
env:
- name: OLLAMA_HOST
value: "0.0.0.0"
Custom Environment Variables¶
Add custom configuration:
spec:
server:
containerSpec:
env:
- name: LLAMASTACK_CONFIG_PATH
value: "/config/llamastack.yaml"
- name: MODEL_CACHE_DIR
value: "/tmp/models"
- name: MAX_CONCURRENT_REQUESTS
value: "10"
Resource Constraints¶
For resource-constrained environments:
```yaml spec: server: containerSpec: resources: requests: memory: "1Gi" cpu: "250m" limits: memory: "2Gi" cpu: "500m"