Quick Start Guide¶

This guide will help you deploy your first LlamaStack instance using the Kubernetes operator in just a few minutes.

Prerequisites¶

LlamaStack Operator installed (Installation Guide)
kubectl configured and connected to your cluster
At least 4GB of available memory in your cluster

Step 1: Create a Basic LlamaStack Instance¶

Create a file named basic-llamastack.yaml:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: my-first-llamastack
  namespace: default
spec:
  replicas: 1
  server:
    distribution:
      name: "meta-reference"
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "4Gi"
          cpu: "1"

Step 2: Deploy the Instance¶

Apply the configuration to your cluster:

kubectl apply -f basic-llamastack.yaml

Step 3: Monitor the Deployment¶

Watch the deployment progress:

# Check the LlamaStackDistribution status
kubectl get llamastackdistribution my-first-llamastack

# Watch the pods being created
kubectl get pods -l app=llama-stack -w

# Check deployment status
kubectl get deployment my-first-llamastack

Expected output:

NAME                  READY   STATUS    RESTARTS   AGE
my-first-llamastack   1/1     Running   0          2m

Step 4: Access Your LlamaStack Instance¶

Port Forward (Development)¶

For development and testing, use port forwarding:

kubectl port-forward service/my-first-llamastack 8321:8321

Now you can access LlamaStack at http://localhost:8321.

Service Exposure (Production)¶

For production access, expose the service:

apiVersion: v1
kind: Service
metadata:
  name: my-first-llamastack-external
spec:
  type: LoadBalancer
  selector:
    app: llama-stack
    llamastack.io/instance: my-first-llamastack
  ports:
  - port: 80
    targetPort: 8321
    protocol: TCP

Apply the service:

kubectl apply -f service.yaml

Step 5: Test the API¶

Test that your LlamaStack instance is working:

# Health check
curl http://localhost:8321/health

# List available providers
curl http://localhost:8321/providers

# Get distribution info
curl http://localhost:8321/distribution/info

Expected response for health check:

{
  "status": "healthy",
  "version": "0.0.1",
  "distribution": "meta-reference"
}

Step 6: Explore the API¶

LlamaStack provides a comprehensive API for AI applications. Here are some key endpoints:

Models API¶

# List available models
curl http://localhost:8321/models

# Get model info
curl http://localhost:8321/models/{model_id}

Inference API¶

# Text completion
curl -X POST http://localhost:8321/inference/completion \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-2-7b-chat-hf",
    "prompt": "Hello, how are you?",
    "max_tokens": 100
  }'

Memory API¶

# Create memory bank
curl -X POST http://localhost:8321/memory/create \
  -H "Content-Type: application/json" \
  -d '{
    "bank_id": "my-memory",
    "config": {
      "type": "vector",
      "embedding_model": "all-MiniLM-L6-v2"
    }
  }'

Configuration Examples¶

Custom Distribution¶

Use a different LlamaStack distribution:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: ollama-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "ollama"
    containerSpec:
      port: 8321
      env:
      - name: OLLAMA_HOST
        value: "0.0.0.0"

Custom Container Image¶

Use your own LlamaStack image:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: custom-llamastack
spec:
  replicas: 1
  server:
    distribution:
      image: "my-registry.com/llamastack:custom"
    containerSpec:
      port: 8321

With Persistent Storage¶

Add persistent storage for models and data:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: persistent-llamastack
spec:
  replicas: 1
  server:
    distribution:
      name: "meta-reference"
    containerSpec:
      port: 8321
    storage:
      size: "50Gi"
      mountPath: "/.llama"

High Availability Setup¶

Deploy multiple replicas with load balancing:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: ha-llamastack
spec:
  replicas: 3
  server:
    distribution:
      name: "meta-reference"
    containerSpec:
      port: 8321
      resources:
        requests:
          memory: "4Gi"
          cpu: "1"
        limits:
          memory: "8Gi"
          cpu: "2"
    storage:
      size: "100Gi"
      mountPath: "/.llama"

Monitoring and Observability¶

Check Resource Usage¶

Monitor resource consumption:

# Pod resource usage
kubectl top pods -l app=llama-stack

# Node resource usage
kubectl top nodes

View Logs¶

Access application logs:

# View recent logs
kubectl logs deployment/my-first-llamastack

# Follow logs in real-time
kubectl logs -f deployment/my-first-llamastack

# View logs from all replicas
kubectl logs -l app=llama-stack --tail=100

Health Checks¶

The operator automatically configures health checks:

# Readiness probe
readinessProbe:
  httpGet:
    path: /health
    port: 8321
  initialDelaySeconds: 30
  periodSeconds: 10

# Liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8321
  initialDelaySeconds: 60
  periodSeconds: 30

Scaling¶

Manual Scaling¶

Scale your deployment manually:

# Scale to 3 replicas
kubectl patch llamastackdistribution my-first-llamastack -p '{"spec":{"replicas":3}}'

# Verify scaling
kubectl get pods -l app=llama-stack

Resource Updates¶

Update resource allocations:

kubectl patch llamastackdistribution my-first-llamastack -p '{
  "spec": {
    "server": {
      "containerSpec": {
        "resources": {
          "requests": {
            "memory": "4Gi",
            "cpu": "1"
          },
          "limits": {
            "memory": "8Gi",
            "cpu": "2"
          }
        }
      }
    }
  }
}'

Cleanup¶

When you're done experimenting, clean up the resources:

# Delete the LlamaStack instance
kubectl delete llamastackdistribution my-first-llamastack

# Delete any additional services
kubectl delete service my-first-llamastack-external

# Verify cleanup
kubectl get pods -l app=llama-stack

Troubleshooting¶

Common Issues¶

Pod not starting:

# Check pod events
kubectl describe pod <pod-name>

# Check resource constraints
kubectl describe node <node-name>

Service not accessible:

# Check service endpoints
kubectl get endpoints my-first-llamastack

# Verify port forwarding
kubectl port-forward service/my-first-llamastack 8321:8321 --address 0.0.0.0

API errors:

# Check application logs
kubectl logs deployment/my-first-llamastack

# Verify configuration
kubectl get llamastackdistribution my-first-llamastack -o yaml

Next Steps¶

Now that you have a working LlamaStack instance:

Learn about configuration options - Explore advanced configuration
Check out examples - See real-world use cases
Read the API reference - Understand all available options
Set up monitoring - Add observability to your deployment

Getting Help¶

If you encounter issues: