Skip to content
S sufi.my
Back to Blog

Article

ELI5: MongoDB Explained

Understanding documents, collections, and flexible schemas.

January 2, 2026 · 13 min read

What is MongoDB?

MongoDB is a NoSQL database that stores data as documents instead of rows. Each document is a JSON-like object that can have its own shape — no rigid table schema required. It’s like having a filing cabinet where each file can be organized differently.

SQL vs MongoDB mental model

In SQL, you think in tables, rows, and columns. In MongoDB, you think in collections and documents:

SQLMongoDB
DatabaseDatabase
TableCollection
RowDocument
ColumnField

What a document looks like

{
  "_id": "ObjectId('507f1f77bcf86cd799439011')",
  "name": "Sufi Afifi",
  "role": "Software Developer",
  "skills": ["Java", "React", "Spring Boot"],
  "experience": {
    "years": 2,
    "domains": ["fintech", "insurtech"]
  }
}

Notice how skills is an array and experience is a nested object. In SQL, these would require separate tables and JOIN queries. In MongoDB, it’s all in one document.

Understanding BSON: What MongoDB Actually Stores

While MongoDB documents look like JSON, they’re actually stored as BSON (Binary JSON). BSON is a binary-encoded format that’s:

  • Smaller on disk — binary encoding is more space-efficient than text JSON
  • Type-aware — BSON stores type information (is this a string, number, date, or ObjectId?)
  • Faster to parse — the server doesn’t need to parse text, just read binary data

Here’s what happens under the hood:

// You write this in JavaScript
db.users.insertOne({
  name: "Sufi",
  age: 25,
  joinDate: new Date("2024-01-15"),
  active: true
});

// MongoDB stores it as BSON internally
// The binary format includes type markers like:
// "name" → UTF-8 string type marker + "Sufi"
// "age" → 32-bit integer type marker + 25
// "joinDate" → UTC DateTime type marker + milliseconds since epoch
// "active" → boolean type marker + true

This is why MongoDB can efficiently store complex types (dates, UUIDs, binary data) without JSON’s limitations. When you retrieve documents, the BSON is automatically converted back to JSON-like objects your application can use.

The _id Field and ObjectId Explained

Every MongoDB document must have an _id field. If you don’t provide one, MongoDB auto-generates an ObjectId.

What is an ObjectId?

An ObjectId is a 12-byte identifier with an interesting structure:

ObjectId("507f1f77bcf86cd799439011")
│       │  │       │  │       │  │
└─ hex format
        └─ 4-byte timestamp (seconds since Jan 1, 1970)
                └─ 5-byte random value (machine + process ID)
                        └─ 3-byte counter (incremented per insertion)

Breaking it down:

  • Timestamp (4 bytes): When the document was created (to the second)
  • Random (5 bytes): Machine identifier + process ID (ensures uniqueness across servers)
  • Counter (3 bytes): Auto-incrementing number per second on that process

This design means ObjectIds are sortable by creation time — you can query documents created between two dates without needing a separate createdAt field (though you can add one anyway).

// Extract timestamp from ObjectId
const id = ObjectId("507f1f77bcf86cd799439011");
const createdAt = id.getTimestamp();
console.log(createdAt); // Wed Oct 12 2011 00:00:00 GMT+0000

// Find documents created in the last hour
// ObjectId.createFromTime() accepts seconds since epoch
const oneHourAgo = ObjectId.createFromTime(Math.floor(Date.now() / 1000) - 3600);
db.events.find({ _id: { $gt: oneHourAgo } });

You can also use custom _id values (strings, numbers, UUIDs) — ObjectId is just MongoDB’s intelligent default.

Basic Operations with More Examples

Find with projections (selecting specific fields)

// Return all engineers
db.engineers.find({ stack: "Java", available: true });

// Return only name and skills, exclude _id
db.engineers.find(
  { available: true },
  { name: 1, skills: 1, _id: 0 }
);

// 1 = include field, 0 = exclude field

Insert operations

// Insert a single document
db.engineers.insertOne({
  name: "Sufi",
  stack: ["Java", "React"],
  available: true
});

// Insert multiple documents
db.engineers.insertMany([
  { name: "Alice", stack: ["Python"], available: true },
  { name: "Bob", stack: ["Go"], available: false }
]);

Update operators

// $set — set a field to a value
db.engineers.updateOne(
  { name: "Sufi" },
  { $set: { role: "Senior Developer" } }
);

// $push — add an item to an array
db.engineers.updateOne(
  { name: "Sufi" },
  { $push: { stack: "Next.js" } }
);

// $pull — remove an item from an array
db.engineers.updateOne(
  { name: "Sufi" },
  { $pull: { stack: "Java" } }
);

// $inc — increment a numeric field
db.engineers.updateOne(
  { name: "Sufi" },
  { $inc: { yearsOfExperience: 1 } }
);

// Update multiple documents
db.engineers.updateMany(
  { stack: "Python" },
  { $set: { language: "Python" } }
);

Indexing: Why Queries Can Be Slow (And How to Fix It)

Imagine you have 1 million documents in your collection. Without an index, MongoDB must scan every single document to find matches — this is a collection scan and it’s slow.

What happens without an index

// Without an index, MongoDB scans all documents
db.users.find({ email: "sufi@example.com" });
// Check document 1: no match
// Check document 2: no match
// ... (check 999,998 more documents)
// Check document 1,000,000: match!
// Result after scanning 1M documents

What happens with an index

An index is like a reference book’s index — instead of reading every page, you look up the topic in the index and jump directly to the pages you need.

// Create an index on the email field
db.users.createIndex({ email: 1 });
// 1 = ascending order, -1 = descending

// Now the query jumps directly to matching documents
db.users.find({ email: "sufi@example.com" });
// MongoDB looks up "sufi@example.com" in the index
// Finds the document pointer immediately
// Result in milliseconds instead of seconds

Common index types

// Single-field index
db.users.createIndex({ email: 1 });

// Compound index (multiple fields)
// Useful for queries that filter by email AND status
db.users.createIndex({ email: 1, status: 1 });

// Text index (for searching within text)
db.posts.createIndex({ title: "text", content: "text" });
db.posts.find({ $text: { $search: "mongodb tutorial" } });

// View existing indexes
db.users.getIndexes();

// Delete an index
db.users.dropIndex({ email: 1 });

Rule of thumb

Create an index when:

  • A field is frequently used in find() queries
  • A field is used in sort operations
  • The collection has more than 10,000 documents

Warning: Indexes use disk space and slow down writes (inserts, updates, deletes). Every time you insert a document, MongoDB must update all relevant indexes. Use them strategically.

Aggregation Pipeline: Complex Queries Made Simple

The aggregation pipeline is MongoDB’s powerful query language for transforming and analyzing data. It works like an assembly line — data flows through stages, each stage transforms it.

Basic aggregation example

// Pipeline with multiple stages
db.orders.aggregate([
  // Stage 1: Match (filter) - like WHERE in SQL
  { $match: { status: "completed" } },

  // Stage 2: Group - like GROUP BY in SQL
  { $group: {
    _id: "$customer_id",
    totalSpent: { $sum: "$amount" },
    orderCount: { $sum: 1 }
  }},

  // Stage 3: Sort
  { $sort: { totalSpent: -1 } },

  // Stage 4: Limit
  { $limit: 10 }
]);

// Returns the top 10 customers by total spending

Common pipeline stages

// $match — filter documents (like WHERE)
{ $match: { age: { $gte: 18 } } }

// $group — group by a field and aggregate
{ $group: {
  _id: "$department",
  averageSalary: { $avg: "$salary" },
  count: { $sum: 1 }
}}

// $sort — order results
{ $sort: { count: -1 } }

// $limit — return only N documents
{ $limit: 20 }

// $skip — skip the first N documents
{ $skip: 100 }

// $project — reshape documents, select fields
{ $project: {
  name: 1,
  email: 1,
  salary: 1,
  _id: 0
}}

// $lookup — join with another collection
{ $lookup: {
  from: "departments",
  localField: "department_id",
  foreignField: "_id",
  as: "department"
}}

// $unwind — flatten arrays
// If a document has skills: ["Java", "Python"]
// $unwind creates two documents with one skill each
{ $unwind: "$skills" }

Real-world aggregation example

// Find the average salary by department, only for departments with >5 people
db.employees.aggregate([
  { $match: { status: "active" } },
  { $group: {
    _id: "$department",
    avgSalary: { $avg: "$salary" },
    count: { $sum: 1 }
  }},
  { $match: { count: { $gt: 5 } } },
  { $sort: { avgSalary: -1 } }
]);

Replica Sets: High Availability and Redundancy

A replica set is a group of MongoDB servers where one is the primary (accepts writes) and others are secondaries (read-only copies).

How it works

┌──────────────┐
│   Primary    │ ← All writes go here
│   Server 1   │
└──────────────┘

       │ Replicates data
       ├─────────┬─────────┐
       ▼         ▼         ▼
┌──────────────┐ ┌──────────────┐
│ Secondary    │ │ Secondary    │
│ Server 2     │ │ Server 3     │
└──────────────┘ └──────────────┘

Benefits:

  • High availability — if the primary fails, a secondary is automatically promoted
  • Read scaling — spread read queries across secondaries
  • Data backup — always have copies of your data

Tradeoff:

  • Eventual consistency — reads from secondaries might be slightly behind the primary
  • Operational complexity — requires at least 3 servers (primary + 2 secondaries)

Basic replica set concept

// You don't query replica sets directly in code
// MongoDB handles failover automatically

// Connection string for replica set
mongodb://server1:27017,server2:27017,server3:27017/?replicaSet=myReplica

// MongoDB automatically:
// 1. Routes writes to the primary
// 2. Routes reads to the primary (or secondaries if configured)
// 3. Detects if the primary fails
// 4. Promotes a secondary to become the new primary

MongoDB vs PostgreSQL: When to Use Each

Both are excellent databases. Here’s how to choose:

ScenarioMongoDBPostgreSQL
Rapidly evolving schemaGreat — add fields without migrationsNeed ALTER TABLE migrations
Nested/hierarchical dataPerfect fit — stores as documentsAwkward — requires multiple tables
Complex relationshipsDifficult — limited JOIN supportPerfect — built for complex joins
Data integrityFlexible — ACID transactions available, schema enforcement optionalStrict — ACID transactions with enforced schemas
Queries on multiple fieldsGood with indexesExcellent — optimizer is mature
Text searchGood with text indexesGood with full-text search
Reporting/analyticsGood — powerful aggregation pipeline, less mature for complex analyticsExcellent — powerful window functions

Concrete scenarios

Use MongoDB when:

// 1. Document structure matches your objects
// Your app's data class:
class User {
  id: string;
  name: string;
  profile: { bio: string; avatar: URL };
  settings: { theme: "dark" | "light"; notifications: boolean };
  tags: string[];
}

// MongoDB document (perfect match):
{
  _id: ObjectId(),
  name: "Sufi",
  profile: { bio: "...", avatar: "..." },
  settings: { theme: "dark", notifications: true },
  tags: ["developer", "tech"]
}

// 2. You frequently fetch entire documents
// Query: "Get user with all their settings"
db.users.findOne({ _id: userId });
// One query, all data

// 3. Array fields grow over time
// Each document can have comments, tags, followers
// No need for separate tables

Use PostgreSQL when:

-- 1. Data has complex relationships
-- Users have many posts
-- Posts have many comments
-- Comments have many likes
-- These relationships change frequently
-- SQL JOINs handle this naturally

-- 2. Data integrity is critical
-- Bank transfers (ACID transactions)
-- Inventory management (must be consistent)
-- Order processing (no over-selling)

-- 3. You query across many relationships
-- "Find all users who commented on posts by authors in category X"
-- With JOINs: straightforward
-- With MongoDB: much more complex

Common Pitfalls to Avoid

Pitfall 1: Unbounded Arrays

// ❌ Bad: array grows without limit
db.posts.updateOne(
  { _id: postId },
  { $push: { comments: newComment } }
);
// After 1 million comments, this document is huge
// Queries get slower because MongoDB loads the entire document

// ✅ Better: store comments in a separate collection
db.comments.insertOne({
  postId: postId,
  author: "Sufi",
  text: "Great article!"
});

// Query them separately
db.comments.find({ postId: postId }).limit(20);

Pitfall 2: Missing Indexes

// ❌ Slow query without index
db.users.find({ email: "sufi@example.com" });
// Collection scan on 1M documents = ~5 seconds

// ✅ Fast with index
db.users.createIndex({ email: 1 });
db.users.find({ email: "sufi@example.com" });
// B-tree lookup = ~5 milliseconds

Pitfall 3: Schema-less Doesn’t Mean Schema-free

// ❌ Dangerous: trusting field existence
db.users.updateOne({ name: "Sufi" }, { $inc: { age: 1 } });
// What if age doesn't exist? ($inc creates it as 0, then increments to 1)
// Now you have inconsistent data — some users have age, some don't

// ✅ Better: validate on the application side
if (user.age !== undefined) {
  user.age += 1;
} else {
  user.age = 0;
}

// ✅ Or use MongoDB schema validation
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      required: ["name", "email"],
      properties: {
        name: { bsonType: "string" },
        email: { bsonType: "string" }
      }
    }
  }
});

Pitfall 4: N+1 Queries

// ❌ Inefficient: loop with queries
const users = db.users.find({ status: "active" }).toArray();
for (const user of users) {
  const profile = db.profiles.findOne({ userId: user._id });
  // This runs N queries (one per user)
}

// ✅ Better: use aggregation pipeline with $lookup
db.users.aggregate([
  { $match: { status: "active" } },
  { $lookup: {
    from: "profiles",
    localField: "_id",
    foreignField: "userId",
    as: "profile"
  }},
  { $unwind: "$profile" }
]);
// One aggregation query, much faster

When to use it

MongoDB works well when your data is application-shaped — meaning the document structure matches how your application reads and writes data. It’s useful for:

  • Rapid prototyping where the schema evolves frequently
  • Content management with varied document structures (blogs, products, profiles)
  • Real-time analytics where read speed matters more than normalization
  • Mobile apps where you want to cache documents locally
  • IoT/sensor data with flexible fields (different devices report different metrics)

It’s less ideal when you need:

  • Complex joins across many tables
  • Strict ACID transactions across collections
  • Heavily relational data (like ERP or financial systems)
  • Data warehousing and complex reporting

Mental model

Think of MongoDB as a digital filing cabinet where each file can be organized differently. The cabinet (collection) holds many files (documents), and you can search across all of them by any field. Indexes are like bookmarks that help you jump directly to the files you need.

SQL, by contrast, is like a spreadsheet with strict columns — every row must have the same columns, but you can easily summarize and combine data across sheets using formulas.

Neither is “better” — they solve different problems. Pick the one that matches your data’s shape.