ORCA Quick Start Guide¶
This guide will help you get ORCA up and running in your Kubernetes cluster.
Prerequisites¶
- Kubernetes cluster (1.28+)
- AWS account with appropriate permissions
- kubectl configured to access your cluster
- Go 1.21+ (if building from source)
Installation Steps¶
1. Build ORCA¶
# Clone the repository
git clone https://github.com/scttfrdmn/orca.git
cd orca
# Build the binary
go build -o orca ./cmd/orca
# Or build with version info
VERSION=$(git describe --tags --always --dirty)
GIT_COMMIT=$(git rev-parse HEAD)
BUILD_DATE=$(date -u +%Y-%m-%dT%H:%M:%SZ)
go build \
-ldflags="-X main.version=${VERSION} -X main.gitCommit=${GIT_COMMIT} -X main.buildDate=${BUILD_DATE}" \
-o orca \
./cmd/orca
2. Configure AWS Credentials¶
ORCA needs AWS credentials to create EC2 instances. Three options:
Option A: AWS Profile (Development)
Option B: Environment Variables
Option C: IRSA (Production - Recommended) Create IAM role and service account (see deploy/README.md)
3. Create Configuration File¶
Create config.yaml:
aws:
region: us-west-2
vpcID: vpc-xxxxx
subnetID: subnet-xxxxx
securityGroupIDs:
- sg-xxxxx
tags:
Environment: production
Project: orca
node:
name: orca-aws-node
labels:
orca.research/provider: "aws"
orca.research/region: "us-west-2"
taints:
- key: orca.research/burst-node
value: "true"
effect: NoSchedule
cpu: "1000"
memory: "4Ti"
pods: "1000"
gpu: "100"
instances:
selectionMode: explicit
defaultLaunchType: on-demand
logging:
level: info
format: json
metrics:
enabled: true
port: 8080
path: /metrics
4. Run ORCA Locally (Testing)¶
# Start ORCA
./orca \
--config config.yaml \
--kubeconfig ~/.kube/config \
--namespace kube-system \
--log-level debug
# You should see:
# {"level":"info","time":"...","message":"Starting ORCA","version":"..."}
# {"level":"info","message":"Starting HTTP server","port":8080}
# {"level":"info","message":"Starting ORCA Virtual Kubelet node"}
# {"level":"info","message":"ORCA is running. Press Ctrl+C to stop.","http_port":8080}
5. Verify Node Registration¶
In another terminal:
# Check that orca-aws-node appears
kubectl get nodes
# Should show:
# NAME STATUS ROLES AGE VERSION
# orca-aws-node Ready <none> 10s v1.0.0-orca
# ...
# Check node details
kubectl describe node orca-aws-node
6. Deploy a Test Pod¶
Create test-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: test-burst
annotations:
orca.research/instance-type: "t3.small"
spec:
nodeSelector:
orca.research/provider: "aws"
tolerations:
- key: orca.research/burst-node
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: test
image: nginx:latest
ports:
- containerPort: 80
Deploy and watch:
# Deploy pod
kubectl apply -f test-pod.yaml
# Watch pod status
kubectl get pods -w
# You should see:
# test-burst 0/1 Pending 0 1s
# test-burst 0/1 Pending 0 5s
# test-burst 0/1 Running 0 65s # After EC2 instance starts
7. Verify EC2 Instance Created¶
# List ORCA-managed instances
aws ec2 describe-instances \
--filters "Name=tag:ManagedBy,Values=ORCA" \
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0]]' \
--output table
# Should show:
# | i-0123456789abcdef | t3.small | running | orca-default-test-burst |
8. Test Health Checks¶
# Liveness check
curl http://localhost:8080/healthz
# {"status":"ok","service":"orca"}
# Readiness check
curl http://localhost:8080/readyz
# {"status":"ready","service":"orca"}
# Prometheus metrics
curl http://localhost:8080/metrics
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
# go_goroutines 42
# ...
9. Clean Up¶
# Delete the test pod
kubectl delete pod test-burst
# The EC2 instance will be automatically terminated
# Stop ORCA
# Press Ctrl+C in the ORCA terminal
# Verify instance terminated
aws ec2 describe-instances \
--filters "Name=tag:ManagedBy,Values=ORCA" \
--query 'Reservations[*].Instances[*].State.Name'
Production Deployment¶
For production deployment as a Kubernetes Deployment:
1. Create Namespace¶
2. Create RBAC Resources¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: orca
namespace: orca-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: orca-role
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/status"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods", "pods/status"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "create", "update", "patch", "delete", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: orca-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: orca-role
subjects:
- kind: ServiceAccount
name: orca
namespace: orca-system
3. Create ConfigMap¶
apiVersion: v1
kind: ConfigMap
metadata:
name: orca-config
namespace: orca-system
data:
config.yaml: |
aws:
region: us-west-2
vpcID: vpc-xxxxx
subnetID: subnet-xxxxx
securityGroupIDs:
- sg-xxxxx
tags:
Environment: production
Project: orca
node:
name: orca-aws-node
# ... rest of config
4. Create Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: orca
namespace: orca-system
spec:
replicas: 1
selector:
matchLabels:
app: orca
template:
metadata:
labels:
app: orca
spec:
serviceAccountName: orca
containers:
- name: orca
image: orca:latest # Build and push your image
command:
- /orca
args:
- --config=/config/config.yaml
- --namespace=orca-system
volumeMounts:
- name: config
mountPath: /config
ports:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: config
configMap:
name: orca-config
5. Deploy¶
Troubleshooting¶
Node Not Appearing¶
# Check ORCA logs
kubectl logs -n orca-system deployment/orca
# Check RBAC permissions
kubectl auth can-i create nodes --as=system:serviceaccount:orca-system:orca
Pods Stuck in Pending¶
# Check pod events
kubectl describe pod <pod-name>
# Check ORCA logs for instance creation errors
kubectl logs -n orca-system deployment/orca | grep "CreateInstance"
# Verify AWS credentials
kubectl exec -n orca-system deployment/orca -- env | grep AWS
EC2 Instances Not Terminating¶
# Check ORCA logs
kubectl logs -n orca-system deployment/orca | grep "DeletePod"
# Manually check instances
aws ec2 describe-instances --filters "Name=tag:ManagedBy,Values=ORCA"
# Manually terminate if needed
aws ec2 terminate-instances --instance-ids i-xxxxx
Next Steps¶
- GPU Training Example
- Spot Instance Example
- Architecture Documentation
- Virtual Kubelet Integration
- Instance Selection Guide
Getting Help¶
- GitHub Issues: https://github.com/scttfrdmn/orca/issues
- Documentation: https://github.com/scttfrdmn/orca/tree/main/docs