Shortcuts
BlogMarch 15, 2025

Mohamed Elbarry
Database Design Best Practices
I’ve had to fix a production DB that was normalized in theory but full of JSON columns and missing indexes on the columns we actually filtered on. Queries that were fine at 10k rows fell over at 100k. Now I design for the access patterns first and keep the schema aligned with how the app queries. Here’s the approach I use. On the construction company app we had metadata and file organization that had to scale—indexing and access patterns mattered from day one. I aim for at least 1NF and 2NF so updates stay in one place and there are no partial dependencies. I don’t push to 3NF everywhere if it would force too many joins for hot paths; sometimes a controlled duplicate is worth it. Atomic values per column, unique rows. No comma-separated lists in a single column.
Sql
-- Bad: multiple values in one column
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    phone_numbers TEXT
);

-- Good: separate table for phone numbers
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100)
);

CREATE TABLE user_phone_numbers (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  phone_number VARCHAR(20),
  type VARCHAR(20)
);
Non-key attributes depend on the full primary key. For order items I keep order_id + product_id as the composite key and put quantity/price there; product name and base price live in products.
Sql
CREATE TABLE orders (
  order_id SERIAL PRIMARY KEY,
  customer_id INTEGER,
  order_date TIMESTAMP
);

CREATE TABLE order_items (
  order_id INTEGER REFERENCES orders(order_id),
  product_id INTEGER REFERENCES products(product_id),
  quantity INTEGER,
  price DECIMAL(10,2),
  PRIMARY KEY (order_id, product_id)
);

CREATE TABLE products (
  product_id SERIAL PRIMARY KEY,
  product_name VARCHAR(100),
  base_price DECIMAL(10,2)
);
Indexes help when the planner can use them to touch a small fraction of rows—that’s selectivity. High selectivity: IDs, emails, timestamps. Low selectivity: booleans or status columns with a few values; there the planner often chooses a sequential scan because reading the whole table is cheaper than the index. So I add indexes for columns (and column order) that appear in WHERE, JOIN, and ORDER BY, and I run ANALYZE so the planner has up-to-date stats. A covering index can avoid table lookups for some queries. I avoid indexing every column; each index costs writes and space. PostgreSQL’s index types and examining index usage are worth a read.
Sql
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(100) UNIQUE,
  username VARCHAR(50) UNIQUE
);

-- Composite index for "orders by customer, sorted by date"
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_order_items_order_product ON order_items(order_id, product_id);

-- Covering index when the query only needs these columns
CREATE INDEX idx_orders_covering ON orders(customer_id, order_date, status, total_amount);
Before adding or changing indexes I run EXPLAIN (ANALYZE, BUFFERS) on the slow query and look at sequential scans, high row counts, and buffer usage. PostgreSQL’s EXPLAIN docs spell out what each option does and how to read the output—honestly it’s the first place I go when a query’s slow.
Sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.order_id) > 5
ORDER BY order_count DESC;
I use appropriate types (e.g. DECIMAL for money, TIMESTAMP WITH TIME ZONE for timestamps) and CHECK constraints so invalid data is rejected at the DB level.
Sql
CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price DECIMAL(10,2) NOT NULL CHECK (price > 0),
  quantity INTEGER NOT NULL DEFAULT 0 CHECK (quantity >= 0),
  is_active BOOLEAN NOT NULL DEFAULT true,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  status order_status NOT NULL DEFAULT 'pending'
);
When we need to hide rows without losing history I add deleted_at and a partial index so “active” queries stay fast.
Sql
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(100) NOT NULL UNIQUE,
  deleted_at TIMESTAMP WITH TIME ZONE NULL
);

CREATE INDEX idx_users_active ON users(email) WHERE deleted_at IS NULL;

CREATE OR REPLACE FUNCTION soft_delete_user(user_id INTEGER)
RETURNS VOID AS $$
BEGIN
  UPDATE users
  SET deleted_at = NOW()
  WHERE id = user_id AND deleted_at IS NULL;
END;
$$ LANGUAGE plpgsql;
I use pg_stat_statements and pg_stat_user_tables to find slow queries and tables with a lot of churn so I can tune indexes or vacuum.
Sql
SELECT
  query,
  calls,
  total_time,
  mean_time,
  rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

SELECT
  schemaname,
  tablename,
  n_tup_ins as inserts,
  n_tup_upd as updates,
  n_tup_del as deletes,
  n_live_tup as live_tuples,
  n_dead_tup as dead_tuples
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
Normalize enough to avoid update anomalies; denormalize only when a hot path needs it. Index based on actual query patterns and check with EXPLAIN. Use types and constraints so the DB enforces invariants. Plan for growth (e.g. partitioning, archiving) before the table gets huge. One thing to try: run EXPLAIN on your top 5 slowest queries and add or adjust one index. Use the index, Luke is a great resource for indexing. Hope that helps. I'm currently looking for new challenges in the AI and Full Stack space. If you're building something interesting, let's chat.
Share this post: