Backend Mode Selection Guide
🎯 Overview
This guide helps you choose the right backend mode for your use case. We offer three modes optimized for different deployment scenarios:
- Portable Mode (PyArrow) - Client-side compute, serverless-friendly
- Standalone Mode (PyO3) - Native Rust performance, zero network
- Distributed Mode (gRPC) - Multi-client memory sharing, streaming
📊 Quick Comparison Matrix
| Feature | Portable | Standalone | Distributed |
|---|---|---|---|
| Performance | ⭐⭐ (30-50% of Polars) | ⭐⭐⭐⭐⭐ (100% Polars) | ⭐⭐⭐⭐ (95-98% Polars) |
| Memory Usage | ⭐⭐⭐ (per-client duplication) | ⭐⭐⭐ (per-client duplication) | ⭐⭐⭐⭐⭐ (shared across clients) |
| Setup Complexity | ⭐⭐⭐⭐⭐ (pip install) | ⭐⭐⭐⭐ (pip install + PyO3) | ⭐⭐ (requires server) |
| Network Latency | ⭐⭐⭐⭐⭐ (zero - local) | ⭐⭐⭐⭐⭐ (zero - embedded) | ⭐⭐⭐ (1-10ms gRPC) |
| Serverless Support | ✅ Yes (WASM, Lambda) | ⚠️ Limited (cold start) | ❌ No (needs persistent server) |
| Multi-Client | ❌ No (each loads data) | ❌ No (each loads data) | ✅ Yes (shared memory) |
| Best For | Demos, WASM, cost savings | Production, single-client | Multi-tenant, streaming |
🚀 Mode 1: Portable (PyArrow)
What Is It?
Pure Python implementation using PyArrow for columnar operations. No Rust compilation required.
When to Use
- ✅ Demos and Trials: Share with users without Rust toolchain
- ✅ WASM Deployment: Run in browser via Pyodide/PyScript
- ✅ Serverless Functions: AWS Lambda, Azure Functions (no cold start penalty)
- ✅ Cost Optimization: Move compute to client-side (80-95% cost reduction)
- ✅ Cross-Platform: Works everywhere Python runs
When NOT to Use
- ❌ Performance-critical applications (30-50% slower than Standalone)
- ❌ Large-scale production workloads
- ❌ Real-time / low-latency requirements
Example
import os
os.environ["BACKEND_MODE"] = "portable"
from python.backends import get_router
router = get_router()
# Uses PyArrow engine (pure Python)
df = router.load_parquet("data.parquet")
result = df.select(["price", "volume"]).sum()
print(f"✅ Portable mode (PyArrow): {result}")
Performance Characteristics
Dataset Size: 1GB Parquet
Operation: select + filter + groupby
Portable (PyArrow): 3.2s
Standalone (PyO3): 1.1s ← 2.9x faster
Distributed (gRPC): 1.3s (1.1s + 0.2s network)
Verdict: Portable is slower BUT enables serverless deployment
Cost Analysis
Scenario: 1000 requests/day, 1GB dataset
Standalone (PyO3) in Container:
- Azure Container Instance: $50/month
- 24/7 running: $50/month
- Total: $50/month
Portable (PyArrow) in Azure Functions:
- Execution time: 3.2s × 1000 = 3200s/day
- Compute cost: 3200s × $0.000016/s = $1.54/month
- Total: $1.54/month ← 97% cost savings ✅
Verdict: 30% slower BUT 97% cheaper
🔥 Mode 2: Standalone (PyO3)
What Is It?
Native Rust PyO3 bindings to Polars. Zero network overhead, same performance as Polars.
When to Use
- ✅ Production Applications: Best performance for single-process workloads
- ✅ HFT / Low-Latency: Sub-millisecond query times required
- ✅ Single-User Tools: Personal research, backtesting, analysis
- ✅ Embedded Applications: Desktop tools, CLI utilities
- ✅ < 10 Concurrent Users: Memory duplication is acceptable
When NOT to Use
- ❌ 10+ concurrent users (memory duplication becomes expensive)
- ❌ Serverless deployments (requires compiled PyO3 module)
- ❌ Cross-platform distribution without Rust toolchain
Example
import os
os.environ["BACKEND_MODE"] = "standalone"
from python.backends import get_router
router = get_router()
# Uses PyO3 Polars bindings (native Rust)
df = router.load_parquet("data.parquet")
result = df.select(["price", "volume"]).sum()
print(f"✅ Standalone mode (PyO3): {result}")
Performance Characteristics
Dataset Size: 1GB Parquet
Operation: select + filter + groupby
Standalone (PyO3): 1.1s ← FASTEST ✅
Distributed (gRPC): 1.3s (1.1s compute + 0.2s network)
Portable (PyArrow): 3.2s
Verdict: Standalone is fastest for single-client workloads
Memory Analysis
Scenario: 10GB dataset, 5 concurrent users
Standalone (PyO3):
- Each user loads data: 10GB × 5 = 50GB RAM
- Cost: $200/month (large instance)
Distributed (gRPC):
- Server loads once: 10GB RAM
- Clients hold handles: 5MB × 5 = 25MB
- Cost: $50/month (small instance) ← 75% savings ✅
Verdict: Standalone is cheaper for < 10 users
🌐 Mode 3: Distributed (gRPC)
What Is It?
Client-server architecture with gRPC streaming. DataFrames live on server, clients hold handles.
When to Use
- ✅ Multi-Tenant Platforms: 10+ concurrent users querying same data
- ✅ Memory Efficiency: Datasets too large to duplicate per-client
- ✅ Streaming Pipelines: Real-time data processing (WebSocket, Kafka)
- ✅ Time-Series Operations: Rolling windows, OHLCV resampling
- ✅ Language-Agnostic: Clients in Python, Rust, Go, TypeScript
When NOT to Use
- ❌ Single-user applications (unnecessary complexity)
- ❌ < 10 concurrent users (operational overhead not justified)
- ❌ Serverless deployments (requires persistent server)
- ❌ Ultra-low-latency (<1ms) requirements
Example
import os
os.environ["BACKEND_MODE"] = "distributed"
os.environ["GRPC_SERVER"] = "localhost:50051"
from python.backends import get_router
router = get_router()
# Uses gRPC client (handle-based)
df = router.load_parquet("data.parquet") # Returns handle, not data
result = df.select(["price", "volume"]).sum().collect() # Collects over gRPC
print(f"✅ Distributed mode (gRPC): {result}")
Performance Characteristics
Dataset Size: 1GB Parquet
Operation: select + filter + groupby
Single Client:
Standalone: 1.1s ← Faster (no network)
Distributed: 1.3s (1.1s + 0.2s network)
10 Concurrent Clients:
Standalone: 11s (sequential due to memory contention)
Distributed: 1.5s (parallel, shared memory) ← 7.3x faster ✅
Verdict: Distributed wins at scale (10+ users)
Cost at Scale
Scenario: 10GB dataset, 50 concurrent users
Standalone (PyO3):
- Each user: 10GB RAM
- Total: 500GB RAM required
- Cost: $2000/month (massive instance) ❌
Distributed (gRPC):
- Server: 10GB RAM (loaded once)
- Clients: 50 × 5MB = 250MB handles
- Total: 10.25GB RAM required
- Cost: $100/month ← 95% savings ✅
Verdict: Distributed is only viable option at scale
🎯 Decision Tree
graph TD
A[Start] --> B{Serverless deployment?}
B -->|Yes| C[Use Portable Mode ✅]
B -->|No| D{How many users?}
D -->|1 user| E[Use Standalone Mode ✅]
D -->|2-9 users| F{Dataset fits in RAM?}
D -->|10+ users| G[Use Distributed Mode ✅]
F -->|Yes| E
F -->|No| G
📋 Step-by-Step Selection
Step 1: Deployment Environment
| Environment | Recommended Mode |
|---|---|
| AWS Lambda, Azure Functions | Portable (PyArrow) |
| Docker Container (single instance) | Standalone (PyO3) |
| Kubernetes (multi-pod) | Distributed (gRPC) |
| Desktop Application | Standalone (PyO3) |
| Browser (WASM) | Portable (PyArrow) |
Step 2: Number of Concurrent Users
| Users | Recommended Mode |
|---|---|
| 1 | Standalone (PyO3) |
| 2-9 | Standalone (PyO3) |
| 10-99 | Distributed (gRPC) |
| 100+ | Distributed (gRPC) |
Step 3: Dataset Size vs Available RAM
| Dataset | RAM | Recommended Mode |
|---|---|---|
| < 1GB | Any | Standalone (PyO3) |
| 1-10GB | < Dataset | Distributed (streaming) |
| 1-10GB | > Dataset | Standalone (PyO3) |
| > 10GB | Any | Distributed (streaming) |
Step 4: Performance Requirements
| Latency Target | Recommended Mode |
|---|---|
| < 1ms | Standalone (PyO3) |
| 1-10ms | Standalone or Distributed |
| > 10ms | Portable or Distributed |
🔧 Configuration Examples
Auto-Detection (Recommended)
from python.backends import get_router
# Router auto-detects best mode based on:
# 1. Environment variables (BACKEND_MODE)
# 2. Available imports (polarway, polars, grpc)
# 3. Data size (small → portable, large → distributed)
router = get_router()
df = router.load_parquet("data.parquet")
Explicit Mode Selection
import os
# Force Portable mode
os.environ["BACKEND_MODE"] = "portable"
# Force Standalone mode
os.environ["BACKEND_MODE"] = "standalone"
# Force Distributed mode
os.environ["BACKEND_MODE"] = "distributed"
os.environ["GRPC_SERVER"] = "polarway-server:50051"
from python.backends import get_router
router = get_router()
Fallback Strategy
from python.backends import BackendRouter
router = BackendRouter(
prefer_mode="standalone", # Try Standalone first
fallback_chain=["distributed", "portable"], # Fallback order
data_size_threshold_mb=1000 # Switch to distributed if > 1GB
)
# Example flow:
# 1. Try Standalone (PyO3)
# 2. If not available, try Distributed (gRPC)
# 3. If not available, fall back to Portable (PyArrow)
📊 Real-World Use Cases
Use Case 1: Personal Research (Standalone)
Scenario: PhD student analyzing 2GB of historical stock data on laptop.
# Standalone Mode
os.environ["BACKEND_MODE"] = "standalone"
router = get_router()
df = router.load_parquet("stocks_2020_2024.parquet")
# Fast local analysis
signals = df.filter(pl.col("volume") > 1e6).group_by("symbol").agg(...)
Why: Single user, data fits in RAM, needs fast iteration.
Use Case 2: Company Analytics Dashboard (Distributed)
Scenario: 50 employees querying 50GB customer database simultaneously.
# Distributed Mode
os.environ["BACKEND_MODE"] = "distributed"
os.environ["GRPC_SERVER"] = "analytics-server:50051"
router = get_router()
df = router.load_parquet("customers/*.parquet")
# Shared memory across all users
revenue_by_region = df.group_by("region").agg(pl.col("revenue").sum())
Why: Multi-user, large dataset, memory sharing critical.
Use Case 3: Serverless Data API (Portable)
Scenario: Azure Function that processes user-uploaded CSVs.
# Portable Mode (Azure Functions)
@azure_function
def process_csv(request):
os.environ["BACKEND_MODE"] = "portable"
router = get_router()
df = router.load_csv(request.file)
# Client-side compute (no server needed)
stats = df.describe()
return stats.to_json()
Why: Serverless, cost-sensitive, no persistent server.
🎓 Migration Guide
From Standalone to Distributed
# Before (Standalone)
import polars as pl
df = pl.read_parquet("data.parquet")
result = df.select(["price"]).sum()
# After (Distributed)
import os
os.environ["BACKEND_MODE"] = "distributed"
os.environ["GRPC_SERVER"] = "localhost:50051"
from python.backends import get_router
router = get_router()
df = router.load_parquet("data.parquet") # Returns handle
result = df.select(["price"]).sum().collect() # Add .collect()
Key Changes:
1. Set environment variables
2. Use get_router() instead of direct polars import
3. Add .collect() to materialize results
From Distributed to Portable
# Before (Distributed)
os.environ["BACKEND_MODE"] = "distributed"
router = get_router()
df = router.load_parquet("data.parquet").collect()
# After (Portable)
os.environ["BACKEND_MODE"] = "portable"
router = get_router()
df = router.load_parquet("data.parquet") # No .collect() needed
Key Changes:
1. Change BACKEND_MODE to "portable"
2. Remove .collect() calls (portable returns data directly)
3. Expect 2-3x slower performance
📈 Performance Benchmarks
Benchmark Setup
- Dataset: 1GB Parquet (10M rows)
- Operation:
select + filter + groupby + agg - Hardware: 8-core CPU, 16GB RAM
Results
| Mode | Single User | 10 Users | 100 Users |
|---|---|---|---|
| Portable | 3.2s | N/A (each loads) | N/A |
| Standalone | 1.1s | 11s (sequential) | OOM |
| Distributed | 1.3s | 1.5s (parallel) | 2.1s (parallel) |
Key Takeaways: - Standalone is fastest for single user - Distributed is only viable option for 10+ users - Portable trades 2-3x speed for 97% cost savings
🎯 Summary
Choose Portable If...
- 💰 Cost savings are priority (serverless)
- 🌐 Need to run in browser (WASM)
- 📦 Simple distribution without Rust toolchain
Choose Standalone If...
- ⚡ Performance is critical (sub-millisecond queries)
- 👤 Single user or < 10 users
- 🖥️ Data fits in memory
Choose Distributed If...
- 👥 10+ concurrent users
- 💾 Memory sharing saves costs
- 🌊 Streaming / time-series workloads
Rule of Thumb: Start with Standalone, upgrade to Distributed when you hit 10+ users or run out of memory. 🚀