Scalability
Orion handles high-throughput workloads with token-bucket rate limiting, semaphore-based backpressure, async processing queues, and stateless horizontal scaling, all configurable per channel.
Rate Limiting
Rate limiting operates at two levels: platform-wide (all requests) and per-channel (individual service endpoints).
Platform-level: enable in config:
[rate_limit]
enabled = true
default_rps = 100
default_burst = 50
[rate_limit.endpoints]
admin_rps = 50
data_rps = 200
Per-channel: configure in the channel’s config_json:
{
"rate_limit": {
"requests_per_second": 100,
"burst": 50
}
}
Rate limiting uses the token bucket algorithm: tokens replenish at the configured rate, and burst allows short spikes above the steady-state limit. When the bucket is empty, requests receive 429 Too Many Requests.
Per-client keying: use JSONLogic to compute rate limit keys from request data, enabling per-user or per-tenant limits:
{
"rate_limit": {
"requests_per_second": 10,
"burst": 5,
"key_logic": { "var": "headers.x-api-key" }
}
}
Rate limiter state is per-instance (in-memory). In multi-instance deployments, divide the configured RPS by the number of instances to approximate global limits, or use sticky sessions at the load balancer.
Backpressure
Semaphore-based concurrency limits prevent any single channel from overwhelming the system:
{
"backpressure": {
"max_concurrent": 200
}
}
When all semaphore permits are taken, additional requests receive 503 Service Unavailable immediately. This is load shedding. The system sheds excess load rather than queuing unboundedly, which protects latency for requests that are admitted.
Each channel has its own independent backpressure semaphore, so a spike in one channel doesn’t affect others.
Async Processing
For workloads that don’t need immediate responses, Orion supports async processing via a bounded trace queue:
# Submit for async processing (returns immediately with a trace ID)
curl -s -X POST http://localhost:8080/api/v1/data/orders/async \
-H "Content-Type: application/json" \
-d '{ "data": { "order_id": "ORD-123" } }'
# Poll for the result
curl -s http://localhost:8080/api/v1/data/traces/{trace-id}
The queue is backed by tokio::sync::mpsc channels with configurable concurrency:
[queue]
workers = 4 # Concurrent trace workers
buffer_size = 1000 # Channel buffer for pending traces
processing_timeout_ms = 60000 # Per-trace processing timeout
max_result_size_bytes = 1048576 # Max size of trace result (1 MB)
max_queue_memory_bytes = 104857600 # Max memory for queued traces (100 MB)
Failed traces go to the dead letter queue with automatic retry:
[queue]
dlq_retry_enabled = true
dlq_max_retries = 5
dlq_poll_interval_secs = 30
Completed traces are cleaned up automatically based on retention policy:
[queue]
trace_retention_hours = 72
trace_cleanup_interval_secs = 3600
Horizontal Scaling
Orion is designed for single-instance simplicity with multi-instance capability. Each instance is stateless; all persistent data lives in the shared database.
What works across instances:
| Component | How It Works |
|---|---|
| Database | All instances share the same database (PostgreSQL or MySQL recommended) |
| Kafka consumers | Consumer groups handle partition assignment automatically |
| Traces | Stored in the shared database; queries return consistent results |
| Workflows & Channels | Definitions live in the database; all instances load the same set |
| Audit logs | Stored in the shared database regardless of which instance handles the request |
Per-Instance State
The following components use in-memory state that is local to each instance:
| Component | Impact | Workaround |
|---|---|---|
| Rate Limiting | 3 instances at 100 RPS = 300 RPS effective global limit | Sticky sessions; divide configured RPS by instance count |
| Request Deduplication | Same idempotency key on two instances → processed twice | Sticky sessions, or Redis-backed dedup store |
| Response Caching | Lower cache hit rates (each instance has a cold cache) | Sticky sessions, or Redis-backed cache connector |
| Circuit Breakers | One instance may trip while others keep sending | Acceptable; monitor /health on each instance |
| Engine State | POST /admin/engine/reload only reloads the receiving instance | Script reload to hit all instances (see below) |
Reload all instances:
for host in $INSTANCE_HOSTS; do
curl -X POST "http://$host:8080/api/v1/admin/engine/reload" \
-H "Authorization: Bearer $API_KEY"
done
Alternatively, use a rolling restart strategy with your orchestrator (e.g., Kubernetes rolling deployment).
Topology Control
Use channel include/exclude filters to run different Orion instances for different channel groups:
# Instance A: order processing
[channels]
include = ["orders.*", "payments.*"]
# Instance B: analytics and reporting
[channels]
include = ["analytics.*", "reports.*"]
This enables microservice-style deployment where each instance handles a subset of channels, all sharing the same database.
Database Backend Recommendations
| Backend | Single Instance | Multiple Instances | Notes |
|---|---|---|---|
| SQLite | Recommended | Not recommended | WAL mode supports concurrent reads but only one writer. File-based, cannot be shared across hosts. |
| PostgreSQL | Supported | Recommended | Full multi-connection support. Use connection pooling (PgBouncer) for many instances. |
| MySQL | Supported | Supported | Ensure READ-COMMITTED isolation for best concurrency. |
For multi-instance deployments, use PostgreSQL with connection pooling (PgBouncer). Script engine reloads to broadcast to all instances after workflow or channel changes.