Database Partitioning vs Sharding

 


🔹 What is Database Partitioning?

Partitioning means splitting a large table into smaller pieces (partitions) to make queries faster and maintenance easier. Importantly, all partitions live inside the same database instance.

Types of partitioning:

  • Horizontal partitioning: Splits data by rows (e.g., orders by year).

  • Vertical partitioning: Splits data by columns (e.g., frequently used vs. rarely used columns).

✅ Good for performance tuning when data still fits on one server.


🔹 Partitioning in Action (PostgreSQL Example)

Imagine an orders table with millions of rows. We want to split it by year.

Step 1: Create a partitioned table

CREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, customer_id INT NOT NULL, order_date DATE NOT NULL, amount NUMERIC(10,2) NOT NULL ) PARTITION BY RANGE (order_date);

Step 2: Create partitions

CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

Step 3: Insert data (automatic routing)

INSERT INTO orders (customer_id, order_date, amount) VALUES (101, '2023-05-20', 200.50); -- goes to orders_2023

Step 4: Query data

SELECT * FROM orders WHERE order_date >= '2024-01-01';

👉 Only scans the 2024 partition, not the whole table.


🔹 What is Database Sharding?

Sharding is like partitioning, but on a much larger scale: data is distributed across multiple databases and servers. Each shard holds a subset of data (usually by user ID, region, or range).

✅ Sharding is used when a single server can no longer handle the workload, common in web-scale systems like Facebook, Amazon, and Netflix.


🔹 Real-World Examples

  • Partitioning Example:
    A mid-sized e-commerce site keeps all orders in one database but partitions the orders table by year. This speeds up queries without adding new servers.

  • Sharding Example:
    Facebook splits users across multiple shards:

    • Shard 1 → Users 1–100 million (on Server A)

    • Shard 2 → Users 100,000,001–200 million (on Server B)

    • and so on…
      When User 12345 logs in, Facebook’s routing layer knows exactly which server to hit.


⚖️ Sharding vs Partitioning: Key Differences

AspectPartitioningSharding
LocationSame database/serverMultiple databases/servers
ManagementCentralized (one DB engine controls all partitions)Decentralized (each shard is its own DB)
ScalabilityLimited by one server’s capacityUnlimited (add more servers)
ComplexityEasier to manage, simpler queriesComplex (routing, cross-shard queries)
Use caseMedium-size apps needing performanceInternet-scale apps needing distribution

🎯 Takeaway

  • Use partitioning when your dataset is big but still fits on a single server.

  • Use sharding when you’ve outgrown a single server and need to scale horizontally across multiple machines.

Think of it like this:

  • Partitioning is organizing shelves inside one big library.

  • Sharding is building multiple libraries in different cities.

Comments

Popular posts from this blog

REST vs RPC vs GraphQL: Choosing the Right API Style

Fibonacci Agile Estimation

How to Add LICENSE.txt to Your .NET Project Using Azure Pipelines