Skip to content

Storage Architecture

Polarway's Hybrid Storage Layer is a three-tier architecture that balances performance, cost, and scalability.

Overview

Request ──► LRU Cache (2GB RAM) ──► Cache Hit (~1ms) ──► Return
            Cache Miss
            Parquet (zstd lvl19, 18× compression) ──► Load + Warm (~50ms) ──► Return
            Complex Query
            DuckDB (SQL Analytics) ──► Query (~45ms) ──► Return

Results: -20% cost (24 CHF vs 30 CHF/month for 100GB), 18× compression, 85%+ cache hit rate.

Storage Backends

CacheBackend (LRU)

In-memory LRU cache for hot data with configurable capacity:

use polarway_grpc::storage::CacheBackend;

let cache = CacheBackend::new(2 * 1024 * 1024 * 1024); // 2GB
cache.store("trades_today", df)?;

// Sub-millisecond reads for cached data
let df = cache.load("trades_today")?; // ~0.5ms
  • Capacity: Configurable (default 2GB)
  • Eviction: Least-Recently-Used
  • Hit Rate: 85%+ for typical time-series workloads
  • Latency: < 1ms

ParquetBackend

Compressed columnar storage with predicate pushdown:

use polarway_grpc::storage::ParquetBackend;

let backend = ParquetBackend::new("/data/cold");

// Store with zstd level 19 compression (18× ratio)
backend.store("btc_ticks_20260203", df)?;

// Load with column pruning and predicate pushdown
let df = backend.load("btc_ticks_20260203")?; // ~50ms
  • Compression: zstd level 19 (18:1 ratio)
  • Format: Apache Parquet (columnar)
  • Features: Column pruning, predicate pushdown, row group skipping
  • Latency: ~50ms for typical scans

DuckDB Backend

SQL analytics engine for complex queries on cold data:

use polarway_grpc::storage::DuckDBBackend;

let db = DuckDBBackend::new(":memory:");

let result = db.query("
    SELECT symbol, AVG(price) as avg_price, COUNT(*) as trades
    FROM read_parquet('/data/cold/*.parquet')
    WHERE timestamp > '2026-01-01'
    GROUP BY symbol
    ORDER BY trades DESC
")?;
  • Engine: DuckDB (embedded OLAP)
  • Features: Full SQL, window functions, CTEs, joins
  • Latency: ~45ms for analytical queries

HybridStorage Router

The HybridStorage router automatically selects the optimal backend:

from polarway import StorageClient

client = StorageClient(
    parquet_path="/data/cold",
    enable_cache=True,
    cache_size_gb=2.0
)

# Automatic routing: Cache → Parquet → DuckDB
df = client.load("trades_20260203")       # Cache hit: ~1ms
df = client.load("trades_20250101")       # Cache miss → Parquet: ~50ms

# SQL queries go directly to DuckDB
result = client.query("""
    SELECT symbol, AVG(price) as avg_price
    FROM read_parquet('/data/cold/*.parquet')
    WHERE timestamp > '2026-01-01'
    GROUP BY symbol
""")

Cost Comparison

Metric Polarway QuestDB TimescaleDB
Storage (100GB/mo) 24 CHF 30 CHF 45 CHF
Compression 18:1 1.07:1 3:1
Cache Layer Built-in LRU Limited None
SQL Analytics Full (DuckDB) InfluxQL PostgreSQL

Performance

Operation Latency Throughput
Cache hit < 1ms ~10M rows/s
Parquet scan ~50ms ~8M rows/s
DuckDB query ~45ms ~5M rows/s
Store (zstd) ~100ms ~4M rows/s

Next Steps