Skip to content

When NOT to Use Polarway

๐ŸŽฏ Purpose of This Document

Polarway is a powerful tool for specific use cases, but it's not always the right choice. This guide helps you make informed decisions about when to use Polarway vs alternatives like Polars, Pandas, DuckDB, or Spark.

โŒ Don't Use Polarway When...

1. Single Client, In-Memory Workloads

Scenario: Running notebooks or scripts on your local machine with datasets that fit in RAM.

Why Not Polarway: - Network overhead: gRPC adds 1-10ms latency per operation - No benefit: Single client doesn't need shared memory - Complexity: Client-server architecture is overkill

Use Instead: - โœ… Polars - Same engine, zero network overhead, simpler setup - โœ… Pandas - More familiar API for exploratory analysis

Example:

# โŒ Don't do this (unnecessary overhead)
import polarway as pd
df = pd.read_parquet("local_file.parquet").collect()  # Network round-trip for no benefit

# โœ… Do this instead
import polars as pl
df = pl.read_parquet("local_file.parquet")  # Direct, no network overhead

2. Datasets Smaller Than 1GB

Scenario: Working with small to medium datasets that load into memory instantly.

Why Not Polarway: - Overhead exceeds benefit: Network serialization takes longer than computation - Simpler alternatives: Pandas/Polars are more straightforward - No streaming needed: Entire dataset fits in RAM

Use Instead: - โœ… Polars - Blazing fast for in-memory analytics - โœ… Pandas - Familiar API, good enough for small data - โœ… SQLite/DuckDB - Great for SQL-style queries on small data

Benchmark:

# 100MB dataset benchmark
# Polars:   0.8s (load + query)
# Polarway: 1.2s (network + load + query)
# Winner: Polars โœ…

3. Exploratory Data Analysis (EDA)

Scenario: Jupyter notebooks with ad-hoc queries, visualizations, and iterative exploration.

Why Not Polarway: - Interactive overhead: Every operation requires server round-trip - Debugging harder: Errors happen on server, not local - No notebook magic: Can't use df.head() interactively

Use Instead: - โœ… Pandas - Best for exploration, immediate results - โœ… Polars - Fast exploration with lazy API

Example:

# โŒ Polarway in notebooks (slow iteration)
df = polarway_client.read_parquet("data.parquet")
df.select("price").collect()  # Wait for network
df.filter(price > 100).collect()  # Wait again
df.group_by("symbol").collect()  # And again...

# โœ… Polars in notebooks (instant feedback)
df = pl.read_parquet("data.parquet")
df.select("price")  # Instant
df.filter(pl.col("price") > 100)  # Instant
df.group_by("symbol").agg(pl.col("price").mean())  # Instant

4. SQL-First Workflows

Scenario: Teams that prefer SQL over DataFrame APIs.

Why Not Polarway: - Limited SQL support: Polarway is DataFrame-first - Better alternatives: DuckDB, PostgreSQL have native SQL - No JDBC/ODBC: Can't connect BI tools directly

Use Instead: - โœ… DuckDB - Embedded SQL engine, Parquet-native, very fast - โœ… PostgreSQL - Production-ready, ACID compliance, rich ecosystem - โœ… ClickHouse - Columnar database for analytics

Example:

# โŒ Polarway with SQL (limited support)
result = polarway_client.sql("SELECT * FROM df WHERE price > 100")  # Limited SQL syntax

# โœ… DuckDB with SQL (full support)
import duckdb
result = duckdb.query("SELECT * FROM 'data.parquet' WHERE price > 100")

5. Production Web Applications

Scenario: Building REST APIs or web services that need low-latency responses.

Why Not Polarway: - Latency: Network round-trips add 1-10ms overhead - Complexity: Need to manage gRPC server lifecycle - Overkill: Most web apps don't need distributed DataFrames

Use Instead: - โœ… PostgreSQL/MySQL - Proven, ACID, connection pooling - โœ… Redis - Sub-millisecond latency for hot data - โœ… DuckDB - Embedded, zero-latency queries

Architecture:

# โŒ Polarway in web API (unnecessary complexity)
@app.get("/stats")
async def get_stats():
    df = await polarway_client.read_parquet("data.parquet")
    stats = await df.describe().collect()  # 5-15ms total latency
    return stats

# โœ… PostgreSQL in web API (simpler, proven)
@app.get("/stats")
async def get_stats():
    stats = await db.query("SELECT AVG(price), COUNT(*) FROM orders")  # 1-3ms
    return stats

6. Real-Time OLTP Workloads

Scenario: High-frequency inserts, updates, deletes (e.g., order processing, user sessions).

Why Not Polarway: - Read-optimized: Polarway is for analytics, not transactions - No ACID: Can't guarantee consistency for concurrent writes - Wrong tool: DataFrames aren't for transactional data

Use Instead: - โœ… PostgreSQL - ACID compliance, row-level locking - โœ… MySQL/MariaDB - Proven for OLTP workloads - โœ… CockroachDB/YugabyteDB - Distributed ACID databases

7. Machine Learning Training

Scenario: Training scikit-learn, TensorFlow, or PyTorch models.

Why Not Polarway: - No native integration: ML libraries expect NumPy/Pandas - Unnecessary overhead: Training data usually fits in RAM - Simpler pipelines: Load once, train many times

Use Instead: - โœ… Polars - Convert to Pandas/NumPy for ML libraries - โœ… Pandas - Native integration with scikit-learn - โœ… Ray Datasets - Distributed ML data loading

Example:

# โŒ Polarway for ML (extra conversion step)
from sklearn.ensemble import RandomForestClassifier
df = polarway_client.read_parquet("train.parquet").collect()
X = df.select(features).to_pandas().values  # Extra conversion
y = df.select("label").to_pandas().values
model.fit(X, y)

# โœ… Polars for ML (direct conversion)
df = pl.read_parquet("train.parquet")
X = df.select(features).to_numpy()  # Direct conversion
y = df.select("label").to_numpy()
model.fit(X, y)

8. < 10 Concurrent Users

Scenario: Small team or personal projects with few simultaneous users.

Why Not Polarway: - Benefit threshold: Need 10+ concurrent users to justify distributed architecture - Operational overhead: Managing server, monitoring, deployment - Cost: Server costs vs PyO3 embedded

Use Instead: - โœ… Polars (PyO3) - Embed directly in application, zero network - โœ… Embedded DuckDB - SQL interface, embedded, fast

Cost Analysis:

1-10 users:
  PyO3 Polars: $0/month (embedded)
  Polarway:    $50-100/month (server instance)

10-100 users:
  PyO3 Polars: $200/month (each instance loads data)
  Polarway:    $50-100/month (shared memory)

100+ users:
  PyO3 Polars: $2000+/month (memory duplication)
  Polarway:    $100-300/month (shared memory) โœ…

9. Cloud Functions / Serverless

Scenario: AWS Lambda, Azure Functions, Google Cloud Functions with short-lived compute.

Why Not Polarway: - Cold starts: gRPC connection adds 100-500ms to first request - Complexity: Need persistent server alongside ephemeral functions - Wrong model: Serverless expects stateless execution

Use Instead: - โœ… WASM Polars - Embed compute in function, no network - โœ… DuckDB WASM - SQL queries in browser/function - โœ… S3 Select / Athena - Query Parquet directly in S3

Architecture:

# โŒ Serverless function calling Polarway (cold start penalty)
@azure_function
def process_data(request):
    client = connect_polarway()  # 200ms cold start
    df = client.read_parquet("data.parquet")  # 50ms network
    return df.sum().collect()  # 30ms compute
    # Total: 280ms (80ms is overhead)

# โœ… Serverless with embedded WASM
@azure_function
def process_data(request):
    df = polars_wasm.read_parquet("data.parquet")  # 10ms
    return df.sum()  # 30ms compute
    # Total: 40ms (no overhead) โœ…

10. Compliance-Heavy Industries

Scenario: Finance, healthcare, government with strict data residency/privacy laws.

Why Not Polarway: - Data leaves machine: gRPC sends data over network - Audit complexity: Need to track data movement between client/server - Compliance risk: Some regulations forbid network data transfer

Use Instead: - โœ… Embedded Polars/DuckDB - Data never leaves machine - โœ… On-premises PostgreSQL - Full control, air-gapped if needed

โœ… When Polarway DOES Make Sense

For balance, here's when Polarway is the right tool:

1. Multi-Client Analytics Platform โœ…

  • 10+ concurrent users querying the same datasets
  • Memory sharing saves 10-100x RAM costs
  • Example: Company-wide analytics dashboard

2. Streaming / Time-Series Pipelines โœ…

  • Processing real-time data feeds (WebSocket, Kafka)
  • Rolling window operations on unbounded streams
  • Example: Real-time trading signals

3. Larger-Than-RAM Datasets โœ…

  • Datasets don't fit in memory (10GB+)
  • Need to stream and process in batches
  • Example: Processing 100GB of historical data on 16GB machine

4. Functional Programming Enthusiasts โœ…

  • Want Rust's Result/Option monads in Python
  • Value type safety and composable transformations
  • Example: Safety-critical data pipelines

5. Language-Agnostic Architecture โœ…

  • Need to query from Python, Rust, Go, TypeScript
  • gRPC provides consistent API across languages
  • Example: Polyglot microservices architecture

๐ŸŽฏ Decision Tree

  1. < 1GB of data? โ†’ YES: Use Polars or Pandas โŒ ยท NO: Continue
  2. Single-user / single-process? โ†’ YES: Use Polars (PyO3) โŒ ยท NO: Continue
  3. 10+ concurrent users? โ†’ NO: Use Polars (PyO3) โŒ ยท YES: Continue
  4. Need streaming or time-series? โ†’ NO: Consider DuckDB or PostgreSQL ยท YES: Use Polarway โœ…
  5. Value functional programming? โ†’ NO: Consider DuckDB or PostgreSQL ยท YES: Use Polarway โœ…

๐Ÿ“š Alternatives Comparison

Use Case Recommended Tool Why Not Polarway?
EDA in notebooks Pandas, Polars Network overhead slows iteration
Small data (<1GB) Polars, DuckDB Network overhead > compute time
SQL-first teams DuckDB, PostgreSQL Limited SQL support
Single user Polars (PyO3) No benefit from distributed architecture
OLTP workloads PostgreSQL, MySQL Not designed for transactions
ML training Polars โ†’ NumPy Extra conversion step
Serverless WASM Polars, DuckDB Cold start penalty
< 10 users Polars (PyO3) Operational overhead not justified

๐ŸŽ“ Summary

Polarway is NOT a silver bullet. It excels at: - Multi-client analytics (10+ users) - Streaming time-series data - Functional programming patterns - Language-agnostic architectures

But for most common scenarios (EDA, small data, single user), simpler tools like Polars, Pandas, or DuckDB are better choices.

Rule of thumb: Start with Polars (PyO3). Only add Polarway when you have: 1. 10+ concurrent users, OR 2. Streaming/real-time requirements, OR 3. Strong preference for functional programming

Don't prematurely optimize for scale you don't have yet. ๐ŸŽฏ