Monitoring¶
Set up comprehensive monitoring for your LlamaStack distributions.
Monitoring Overview¶
Monitor your LlamaStack deployments with:
- Metrics: Performance and resource usage
- Logs: Application and system logs
- Alerts: Proactive issue detection
- Dashboards: Visual monitoring
Metrics Collection¶
Prometheus Setup¶
Deploy Prometheus for metrics collection:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'llamastack'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: llamastack
ServiceMonitor¶
Create a ServiceMonitor for automatic discovery:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: llamastack-monitor
spec:
selector:
matchLabels:
app: llamastack
endpoints:
- port: metrics
interval: 30s
path: /metrics
Key Metrics¶
Application Metrics¶
Monitor LlamaStack-specific metrics:
# Custom metrics exposed by LlamaStack
llamastack_requests_total
llamastack_request_duration_seconds
llamastack_active_connections
llamastack_model_load_time_seconds
llamastack_inference_latency_seconds
Resource Metrics¶
Track resource usage:
# CPU and Memory
container_cpu_usage_seconds_total
container_memory_usage_bytes
container_memory_working_set_bytes
# Network
container_network_receive_bytes_total
container_network_transmit_bytes_total
# Storage
kubelet_volume_stats_used_bytes
kubelet_volume_stats_capacity_bytes
Logging¶
Centralized Logging¶
Set up log aggregation with Fluentd:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*llamastack*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
</source>
<match kubernetes.**>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
index_name llamastack-logs
</match>
Log Levels¶
Configure appropriate log levels:
spec:
env:
- name: LOG_LEVEL
value: "info" # debug, info, warn, error
- name: LOG_FORMAT
value: "json" # json, text
Dashboards¶
Grafana Dashboard¶
Create a comprehensive dashboard:
{
"dashboard": {
"title": "LlamaStack Monitoring",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(llamastack_requests_total[5m])",
"legendFormat": "{{instance}}"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(llamastack_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
}
]
},
{
"title": "Resource Usage",
"type": "graph",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total[5m])",
"legendFormat": "CPU"
},
{
"expr": "container_memory_usage_bytes",
"legendFormat": "Memory"
}
]
}
]
}
}
Alerting¶
Prometheus Alerts¶
Define critical alerts:
groups:
- name: llamastack.rules
rules:
- alert: LlamaStackDown
expr: up{job="llamastack"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "LlamaStack instance is down"
description: "LlamaStack instance {{ $labels.instance }} has been down for more than 1 minute."
- alert: HighErrorRate
expr: rate(llamastack_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors per second."
- alert: HighLatency
expr: histogram_quantile(0.95, rate(llamastack_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "95th percentile latency is {{ $value }} seconds."
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Memory usage is above 90%."
AlertManager Configuration¶
Configure alert routing:
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@example.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@example.com'
subject: 'LlamaStack Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
Health Checks¶
Liveness Probe¶
Configure liveness probes:
spec:
containers:
- name: llamastack
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Readiness Probe¶
Configure readiness probes:
spec:
containers:
- name: llamastack
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
Performance Monitoring¶
Custom Metrics¶
Expose custom application metrics:
# Example Python code for custom metrics
from prometheus_client import Counter, Histogram, Gauge
REQUEST_COUNT = Counter('llamastack_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('llamastack_request_duration_seconds', 'Request latency')
ACTIVE_CONNECTIONS = Gauge('llamastack_active_connections', 'Active connections')
# In your application code
REQUEST_COUNT.labels(method='POST', endpoint='/inference').inc()
REQUEST_LATENCY.observe(response_time)
ACTIVE_CONNECTIONS.set(current_connections)
Distributed Tracing¶
Set up distributed tracing with Jaeger:
apiVersion: v1
kind: ConfigMap
metadata:
name: jaeger-config
data:
config.yaml: |
jaeger:
endpoint: "http://jaeger-collector:14268/api/traces"
service_name: "llamastack"
sampler:
type: "probabilistic"
param: 0.1
Monitoring Best Practices¶
Resource Monitoring¶
Monitor these key resources:
# CPU usage
kubectl top pods -l app=llamastack
# Memory usage
kubectl top pods -l app=llamastack --containers
# Storage usage
kubectl exec -it <pod> -- df -h
# Network usage
kubectl exec -it <pod> -- netstat -i
Log Analysis¶
Analyze logs for issues:
# Check error logs
kubectl logs -l app=llamastack | grep ERROR
# Check recent logs
kubectl logs -l app=llamastack --since=1h
# Follow logs in real-time
kubectl logs -f -l app=llamastack
Troubleshooting Monitoring¶
Common Issues¶
Metrics Not Appearing:
# Check ServiceMonitor
kubectl get servicemonitor
# Check Prometheus targets
kubectl port-forward svc/prometheus 9090:9090
# Visit http://localhost:9090/targets
High Resource Usage:
# Check resource limits
kubectl describe pod <pod-name>
# Check node resources
kubectl describe node <node-name>
Alert Fatigue:
# Review alert thresholds
kubectl get prometheusrule
# Check alert history
kubectl logs -l app=alertmanager