Basic Deployment Example¶

This example demonstrates a simple LlamaStack deployment suitable for development and testing environments.

Overview¶

This configuration creates a single-replica LlamaStack instance using the ollama distribution with basic resource allocation.

Configuration¶

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: basic-llamastack
  namespace: default
  labels:
    app: llamastack
    environment: development
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      name: "llama-stack"
      port: 8321
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "4Gi"
          cpu: "1"
      env:
      - name: LOG_LEVEL
        value: "info"
      - name: LLAMASTACK_PORT
        value: "8321"

Deployment Steps¶

Save the configuration to a file named basic-deployment.yaml
Apply the configuration:
```
kubectl apply -f basic-deployment.yaml
```

Verify the deployment:

kubectl get llamastackdistribution basic-llamastack
kubectl get pods -l app=llama-stack

Check the status:

kubectl describe llamastackdistribution basic-llamastack

Expected Resources¶

This deployment will create:

Deployment: basic-llamastack with 1 replica
Service: basic-llamastack exposing port 8321
ConfigMap: Configuration for the LlamaStack instance
Pod: Single pod running the LlamaStack container

Accessing the Service¶

Port Forward (Development)¶

kubectl port-forward service/basic-llamastack 8321:8321

Access at: http://localhost:8321

Service Exposure (Testing)¶

Create a NodePort service for external access:

apiVersion: v1
kind: Service
metadata:
  name: basic-llamastack-nodeport
spec:
  type: NodePort
  selector:
    app: llama-stack
    llamastack.io/instance: basic-llamastack
  ports:
  - port: 8321
    targetPort: 8321
    nodePort: 30321
    protocol: TCP

Testing the Deployment¶

Health Check¶

curl http://localhost:8321/health

Expected response:

{
  "status": "healthy",
  "version": "0.0.1",
  "distribution": "meta-reference"
}

API Endpoints¶

# List providers
curl http://localhost:8321/providers

# Get distribution info
curl http://localhost:8321/distribution/info

# List available models
curl http://localhost:8321/models

Resource Usage¶

This basic deployment typically uses:

CPU: 0.5-1 core
Memory: 2-4 GB
Storage: Ephemeral (no persistent storage)
Network: Single service port (8321)

Monitoring¶

Pod Status¶

# Check pod status
kubectl get pods -l app=llama-stack

# View pod details
kubectl describe pod -l app=llama-stack

# Check resource usage
kubectl top pod -l app=llama-stack

Logs¶

# View recent logs
kubectl logs deployment/basic-llamastack

# Follow logs in real-time
kubectl logs -f deployment/basic-llamastack

# View logs with timestamps
kubectl logs deployment/basic-llamastack --timestamps

Scaling¶

Manual Scaling¶

Scale the deployment to multiple replicas:

# Scale to 3 replicas
kubectl scale llamastackdistribution basic-llamastack --replicas=3

# Verify scaling
kubectl get pods -l app=llama-stack

Resource Updates¶

Update resource allocations:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: basic-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "meta-reference"
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "4Gi"  # Increased from 2Gi
          cpu: "1"       # Increased from 500m
        limits:
          memory: "8Gi"  # Increased from 4Gi
          cpu: "2"       # Increased from 1

Apply the update:

kubectl apply -f basic-deployment.yaml

Troubleshooting¶

Common Issues¶

Pod not starting:

# Check pod events
kubectl describe pod -l app=llama-stack

# Check resource constraints
kubectl describe node

Service not accessible:

# Check service endpoints
kubectl get endpoints basic-llamastack

# Verify service configuration
kubectl describe service basic-llamastack

Application errors:

# Check application logs
kubectl logs deployment/basic-llamastack --tail=50

# Check for configuration issues
kubectl get configmap -l app=llama-stack

Debug Commands¶

# Get detailed resource information
kubectl get llamastackdistribution basic-llamastack -o yaml

# Check events in the namespace
kubectl get events --sort-by=.metadata.creationTimestamp

# Exec into the pod for debugging
kubectl exec -it deployment/basic-llamastack -- /bin/bash

Cleanup¶

Remove the deployment:

# Delete the LlamaStack instance
kubectl delete llamastackdistribution basic-llamastack

# Verify cleanup
kubectl get pods -l app=llama-stack
kubectl get services -l app=llama-stack

Next Steps¶

After successfully deploying this basic example:

Try the production setup - Learn about production-ready configurations
Add persistent storage - Configure persistent volumes
Set up monitoring - Add observability
Configure scaling - Learn about auto-scaling

Variations¶

Different Distribution¶

Use the Ollama distribution instead:

spec:
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      env:
      - name: OLLAMA_HOST
        value: "0.0.0.0"

Custom Environment Variables¶

Add custom configuration:

spec:
  server:
    containerSpec:
      env:
      - name: LLAMASTACK_CONFIG_PATH
        value: "/config/llamastack.yaml"
      - name: MODEL_CACHE_DIR
        value: "/tmp/models"
      - name: MAX_CONCURRENT_REQUESTS
        value: "10"

Resource Constraints¶

For resource-constrained environments:

```yaml spec: server: containerSpec: resources: requests: memory: "1Gi" cpu: "250m" limits: memory: "2Gi" cpu: "500m"