MongoDB Topics
MongoDB Interview Questions
Comprehensive collection of MongoDB interview questions and answers covering fundamental to advanced concepts.
1. Introduction to MongoDB
MongoDB is a popular open-source NoSQL database that uses a document-oriented data model. It differs from traditional relational databases in several key ways:
- Document-based: Stores data in flexible, JSON-like documents (BSON format)
- Schema-less: Documents in a collection can have different fields
- Scalable: Designed for horizontal scaling through sharding
- High performance: Supports indexing, ad-hoc queries, and real-time analytics
- Rich query language: Powerful querying and aggregation capabilities
MongoDB is particularly well-suited for applications with large amounts of unstructured or semi-structured data, rapid development cycles, and requirements for high scalability.
MongoDB offers several powerful features:
- Document Model: Data is stored as documents (similar to JSON objects) which makes it more natural to work with application data
- Ad-hoc Queries: Supports field, range, and regular expression queries with rich query language
- Indexing: Supports secondary indexes for faster query performance
- Replication: Provides high availability through replica sets
- Sharding: Horizontal scaling across multiple machines
- Aggregation Pipeline: Powerful data processing pipeline for complex analytics
- GridFS: Specification for storing large files
- ACID Transactions: Multi-document transactions support
- BSON Format: Binary JSON for efficient storage and traversal
- Change Streams: Real-time data change notifications
| Feature | MongoDB | Relational Databases |
|---|---|---|
| Data Model | Document-oriented (JSON-like) | Table-oriented (rows and columns) |
| Schema | Dynamic (schema-less) | Fixed (schema required) |
| Query Language | MongoDB query language | SQL |
| Joins | No native joins (but $lookup in aggregation) | Native join support |
| Scalability | Horizontal (sharding) | Primarily vertical |
| Transactions | Multi-document (since 4.0) | Full ACID support |
| Performance | High read performance | Consistent performance for complex queries |
2. Setting Up MongoDB
To install MongoDB Community Edition:
- Windows:
- Download MongoDB MSI installer from official website
- Run the installer and follow the wizard
- Add MongoDB's bin directory to your PATH
- Create data directory:
md \data\db - Start MongoDB:
mongod
- macOS:
- Using Homebrew:
brew tap mongodb/brewthenbrew install mongodb-community - Start MongoDB:
brew services start mongodb-community
- Using Homebrew:
- Linux (Ubuntu/Debian):
- Import public key:
wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo apt-key add - - Create list file:
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list - Update packages:
sudo apt-get update - Install MongoDB:
sudo apt-get install -y mongodb-org - Start MongoDB:
sudo systemctl start mongod
- Import public key:
After installation, verify it's working by connecting to the MongoDB shell: mongosh
Popular MongoDB GUI management tools include:
- MongoDB Compass (Official GUI)
- Visual schema exploration
- CRUD operations through UI
- Performance metrics
- Query building with visual tools
- Robo 3T (formerly Robomongo)
- Lightweight open-source GUI
- Shell integration
- Cross-platform
- NoSQLBooster
- SQL query support for MongoDB
- Visual query builder
- Aggregation pipeline builder
- Studio 3T
- Advanced querying tools
- Data import/export
- Visual aggregation pipeline builder
- DBeaver (Universal database tool with MongoDB support)
Basic MongoDB shell commands:
// Show databases
show dbs
// Use a database
use mydb
// Show collections in current database
show collections
// Create collection (implicitly created when first document inserted)
db.createCollection("users")
// Insert document
db.users.insertOne({name: "John", age: 30})
// Find documents
db.users.find()
// Count documents
db.users.countDocuments()
// Update document
db.users.updateOne({name: "John"}, {$set: {age: 31}})
// Delete document
db.users.deleteOne({name: "John"})
// Drop collection
db.users.drop()
// Drop database
db.dropDatabase()
3. CRUD Operations
MongoDB provides several methods for inserting documents:
- insertOne() - Insert a single document
db.collection.insertOne({ name: "Alice", age: 25, email: "alice@example.com" }) - insertMany() - Insert multiple documents
db.collection.insertMany([ {name: "Bob", age: 30}, {name: "Charlie", age: 35} ]) - insert() - Legacy method (can insert one or many)
db.collection.insert({name: "David", age: 40})
Insert operations automatically create the collection if it doesn't exist. Documents are assigned a unique _id field if not provided.
MongoDB provides flexible querying capabilities:
- find() - Basic query
// Find all documents db.users.find() // Find with equality condition db.users.find({age: 25}) // Find with projection (only return certain fields) db.users.find({age: 25}, {name: 1, email: 1}) - findOne() - Returns first matching document
db.users.findOne({age: {$gt: 30}}) - Query Operators
// Comparison operators db.users.find({age: {$gt: 25, $lt: 40}}) // Logical operators db.users.find({$or: [{age: 25}, {name: "Alice"}]}) // Element operators db.users.find({email: {$exists: true}}) // Array operators db.users.find({skills: {$in: ["MongoDB", "Node.js"]}})
MongoDB provides several update methods:
- updateOne() - Update first matching document
db.users.updateOne( {name: "Alice"}, {$set: {age: 26, status: "active"}} ) - updateMany() - Update all matching documents
db.users.updateMany( {status: "inactive"}, {$set: {status: "active"}} ) - replaceOne() - Replace entire document
db.users.replaceOne( {name: "Alice"}, {name: "Alice", age: 26, email: "new@example.com"} ) - Update Operators
// $set - Set field value // $unset - Remove field // $inc - Increment numeric field // $push - Add to array // $pull - Remove from array // $rename - Rename field // $mul - Multiply field value
MongoDB provides several deletion methods:
- deleteOne() - Delete first matching document
db.users.deleteOne({name: "Alice"}) - deleteMany() - Delete all matching documents
db.users.deleteMany({status: "inactive"}) - remove() - Legacy method (can delete one or many)
db.users.remove({name: "Bob"}, {justOne: true}) - drop() - Delete entire collection
db.users.drop()
4. Querying Documents
MongoDB provides various query operators:
- Comparison Operators
$eq- Equal to$ne- Not equal to$gt- Greater than$gte- Greater than or equal to$lt- Less than$lte- Less than or equal to$in- Matches any value in array$nin- Matches none of values in array
- Logical Operators
$and- Logical AND$or- Logical OR$not- Logical NOT$nor- Logical NOR
- Element Operators
$exists- Field exists check$type- Field type check
- Array Operators
$all- Array contains all elements$elemMatch- Element matches condition$size- Array size matches
MongoDB provides several ways to query embedded documents:
- Dot Notation - Query specific fields in embedded documents
// Documents have structure: {name: "...", address: {city: "...", state: "..."}} db.users.find({"address.city": "New York"}) - Exact Match - Match entire embedded document
db.users.find({address: {city: "New York", state: "NY"}}) - $elemMatch - For arrays of embedded documents
// Documents have structure: {name: "...", orders: [{product: "...", qty: n}]} db.users.find({ orders: { $elemMatch: { product: "Laptop", qty: {$gt: 1} } } })
MongoDB provides cursor methods for sorting and limiting:
- sort() - Sort results
// Ascending sort db.users.find().sort({age: 1}) // Descending sort db.users.find().sort({age: -1}) // Multiple fields db.users.find().sort({age: 1, name: -1}) - limit() - Limit number of results
db.users.find().limit(10) - skip() - Skip documents
// Pagination example db.users.find().skip(20).limit(10) - count() - Count documents
db.users.countDocuments({age: {$gt: 30}})
5. Aggregation Framework
The aggregation pipeline is a framework for data processing that transforms documents through a series of stages. Each stage processes the documents and passes the results to the next stage.
Key features:
- Processes data records and returns computed results
- Uses a multi-stage pipeline approach
- Can perform operations similar to SQL GROUP BY, JOIN, etc.
- Supports complex transformations and calculations
- Can optimize operations using indexes
Basic syntax:
db.collection.aggregate([
{ $stage1: { ... } },
{ $stage2: { ... } },
...
])
Common aggregation stages include:
- $match - Filters documents (like WHERE in SQL)
{ $match: { status: "A" } } - $group - Groups documents by expression
{ $group: { _id: "$department", total: { $sum: "$salary" } } } - $project - Reshapes documents (like SELECT in SQL)
{ $project: { name: 1, department: 1 } } - $sort - Sorts documents
{ $sort: { age: -1 } } - $limit - Limits number of documents
{ $limit: 5 } - $skip - Skips documents
{ $skip: 10 } - $lookup - Performs left outer join
{ $lookup: { from: "orders", localField: "_id", foreignField: "customerId", as: "customerOrders" } } - $unwind - Deconstructs array fields
{ $unwind: "$tags" }
Common aggregation operators include:
- Arithmetic Operators
$add,$subtract,$multiply,$divide$mod,$abs,$ceil,$floor
- Array Operators
$arrayElemAt,$concatArrays,$filter$size,$slice,$map
- Comparison Operators
$eq,$ne,$gt,$gte,$lt,$lte
- Conditional Operators
$cond,$ifNull,$switch
- Date Operators
$dateToString,$dayOfMonth,$year
- String Operators
$concat,$substr,$toLower,$toUpper
- Accumulators (used in $group)
$sum,$avg,$min,$max$push,$addToSet,$first,$last
6. Indexes and Performance
Indexes are special data structures that store a small portion of the collection's data in an easy-to-traverse form. They improve query performance by reducing the number of documents MongoDB needs to examine.
Key characteristics:
- Indexes can significantly improve query performance
- They are defined at the collection level
- MongoDB automatically creates an index on the _id field
- Indexes consume additional storage space
- They add overhead for write operations
Basic index operations:
// Create index
db.collection.createIndex({field: 1}) // 1 for ascending, -1 for descending
// List indexes
db.collection.getIndexes()
// Drop index
db.collection.dropIndex("index_name")
MongoDB supports several index types:
- Single Field Index - Index on a single field
db.users.createIndex({name: 1}) - Compound Index - Index on multiple fields
db.users.createIndex({name: 1, age: -1}) - Multikey Index - Index on array fields
db.users.createIndex({tags: 1}) - Text Index - For text search
db.articles.createIndex({content: "text"}) - Geospatial Index - For geospatial queries
db.places.createIndex({location: "2dsphere"}) - Hashed Index - For hash-based sharding
db.users.createIndex({_id: "hashed"}) - TTL Index - For automatic document expiration
db.logs.createIndex({createdAt: 1}, {expireAfterSeconds: 3600}) - Partial Index - Only indexes documents that meet criteria
db.users.createIndex( {name: 1}, {partialFilterExpression: {age: {$gt: 18}}} ) - Sparse Index - Only indexes documents with the field
db.users.createIndex({email: 1}, {sparse: true})
MongoDB provides several tools for query analysis:
- explain() - Shows query execution plan
db.users.find({age: {$gt: 30}}).explain("executionStats")Key metrics to examine:
- executionTimeMillis - Total execution time
- totalDocsExamined - Documents scanned
- totalKeysExamined - Index keys examined
- stage - Operation type (COLLSCAN vs IXSCAN)
- Database Profiler - Logs slow operations
// Enable profiling db.setProfilingLevel(1, {slowms: 100}) // View profile data db.system.profile.find().sort({ts: -1}).limit(10) - Index Usage Stats
db.collection.aggregate([{$indexStats: {}}]) - mongotop/mongostat - Command-line monitoring tools
7. Data Modeling
MongoDB supports several data modeling approaches:
- Embedded Documents
- Store related data in a single document structure
- Good for one-to-one or one-to-few relationships
- Provides better read performance
- Example: Store address inside user document
- Document References
- Store references (IDs) to related documents
- Good for one-to-many or many-to-many relationships
- Requires additional queries to resolve references
- Example: Store user ID in order document
- Hybrid Approach
- Combine embedding and referencing
- Embed frequently accessed data, reference less used data
- Example: Embed recent orders in user, reference older orders
Considerations when modeling data:
- Query patterns (how data will be accessed)
- Write/read ratio
- Data relationships
- Data growth
- Atomicity requirements
Common MongoDB data modeling patterns include:
- Attribute Pattern - For sets of fields with similar characteristics
// Instead of {cpu: "...", ram: "...", hdd: "..."} { specifications: [ {name: "cpu", value: "..."}, {name: "ram", value: "..."}, {name: "hdd", value: "..."} ] } - Bucket Pattern - Group data into buckets (e.g., time-series)
// Instead of individual readings { sensor_id: 123, start_date: ISODate("2023-01-01"), end_date: ISODate("2023-01-02"), readings: [ {time: ISODate("2023-01-01T00:00"), value: 25}, {time: ISODate("2023-01-01T01:00"), value: 26}, // ... ] } - Polymorphic Pattern - Different document shapes in same collection
// Products collection with different types {_id: 1, type: "book", title: "...", author: "..."} {_id: 2, type: "movie", title: "...", director: "..."} - Extended Reference Pattern - Copy frequently accessed fields
// Order with embedded user info { _id: 123, user_id: 456, user_name: "Alice", user_email: "alice@example.com", items: [...] } - Subset Pattern - Keep only a subset of data in memory
// Main collection {_id: 1, name: "...", details: "..."} // Subset collection (frequently accessed) {_id: 1, name: "..."}
MongoDB handles relationships differently than relational databases:
- One-to-One
- Embed the related document directly
- Example: User and profile (embed profile in user)
- One-to-Few
- Embed an array of subdocuments
- Example: Blog post and comments (embed comments in post)
- One-to-Many
- Use document references (array of IDs)
- Example: User and orders (store user ID in each order)
- Many-to-Many
- Use document references in both collections
- Example: Students and courses (store arrays of IDs in both)
- Tree Structures
- Parent references (store parent ID in each child)
- Child references (store array of child IDs in parent)
- Materialized paths (store full path as string)
- Nested sets (store left/right values)
8. Replication
Replication in MongoDB is the process of synchronizing data across multiple servers to provide:
- High availability - Automatic failover if primary goes down
- Data redundancy - Multiple copies of data
- Disaster recovery - Protection against data loss
- Read scalability - Distribute read operations
Key components:
- Replica Set - Group of MongoDB instances that maintain the same data
- Primary Node - Accepts all write operations
- Secondary Nodes - Replicate primary's data (can accept reads)
- Arbiter - Special member that votes in elections but doesn't store data
Replica sets typically have an odd number of members (minimum 3) to ensure proper election.
MongoDB replication works through:
- Oplog (Operations Log)
- Primary records all write operations in its oplog
- Secondaries copy and apply these operations asynchronously
- Oplog is a capped collection (fixed size)
- Heartbeats
- Members send periodic heartbeats to each other
- Used to detect failures and trigger elections
- Elections
- When primary becomes unavailable, secondaries hold an election
- Member with highest priority and most recent oplog usually wins
- Requires majority of voting members
- Read Preference
- Clients can specify where to route read operations
- Options: primary (default), primaryPreferred, secondary, secondaryPreferred, nearest
- Write Concern
- Specifies how many members must acknowledge writes
- Examples: w:1 (primary only), w:majority, w:2 (any two members)
Basic steps to set up a replica set:
- Start MongoDB instances with replica set option
mongod --replSet rs0 --port 27017 --dbpath /data/db1 mongod --replSet rs0 --port 27018 --dbpath /data/db2 mongod --replSet rs0 --port 27019 --dbpath /data/db3 - Connect to one instance and initiate the replica set
rs.initiate({ _id: "rs0", members: [ {_id: 0, host: "localhost:27017"}, {_id: 1, host: "localhost:27018"}, {_id: 2, host: "localhost:27019"} ] }) - Check replica set status
rs.status() - Add additional members if needed
rs.add("localhost:27020") - Configure member priorities and other settings
cfg = rs.conf() cfg.members[0].priority = 2 cfg.members[1].priority = 1 cfg.members[2].priority = 0.5 rs.reconfig(cfg)
9. Sharding
Sharding is MongoDB's approach to horizontal scaling, where data is distributed across multiple machines (shards). Key benefits:
- Horizontal scaling - Distribute data across multiple servers
- Increased throughput - Parallel operations across shards
- Larger dataset support - Beyond single server capacity
Sharding components:
- Shard - Individual MongoDB instance (or replica set) storing subset of data
- Config Servers - Store cluster metadata and chunk mappings
- Mongos - Router process that directs operations to appropriate shards
- Chunk - Contiguous range of shard key values (default size 64MB)
Sharding is transparent to applications - they connect to mongos as if it were a regular MongoDB server.
MongoDB sharding works through:
- Shard Key
- Field or fields used to distribute data
- Critical choice affecting performance
- Types: Hashed (even distribution), Ranged (locality)
- Chunk Splitting
- Data divided into chunks based on shard key
- Chunks split when they grow beyond chunk size
- Balancing
- Balancer process redistributes chunks to equalize data
- Moves chunks from overloaded to underloaded shards
- Query Routing
- mongos routes queries to appropriate shards
- Targeted queries (with shard key) go to specific shards
- Broadcast queries (without shard key) go to all shards
Basic steps to set up sharding:
- Start config servers (replica set recommended)
mongod --configsvr --replSet configRS --port 27019 --dbpath /data/configdb - Start mongos (router) process
mongos --configdb configRS/localhost:27019 --port 27017 - Start shard servers (replica sets recommended)
mongod --shardsvr --replSet shard1 --port 27018 --dbpath /data/shard1 mongod --shardsvr --replSet shard2 --port 27020 --dbpath /data/shard2 - Connect to mongos and add shards
sh.addShard("shard1/localhost:27018") sh.addShard("shard2/localhost:27020") - Enable sharding for a database
sh.enableSharding("mydb") - Shard a collection
sh.shardCollection("mydb.mycol", {user_id: "hashed"})
10. Security
MongoDB offers several security features:
- Authentication
- SCRAM (default) - Salted Challenge Response Authentication Mechanism
- X.509 certificates - For internal authentication and client authentication
- LDAP proxy - Enterprise feature for LDAP integration
- Kerberos - Enterprise feature for Kerberos authentication
- Authorization (Role-Based Access Control)
- Built-in roles (read, readWrite, dbAdmin, etc.)
- Custom roles with granular privileges
- Collection-level access control
- Encryption
- TLS/SSL for network encryption
- Encrypted storage engine (Enterprise feature)
- Client-side field level encryption
- Auditing (Enterprise feature)
- Log authentication and authorization events
- Track schema changes
- Monitor CRUD operations
- Network Security
- IP binding
- Firewall configuration
- VPN connections
To enable authentication:
- Start MongoDB with authentication disabled initially
- Connect to the MongoDB instance
- Create admin user
use admin db.createUser({ user: "admin", pwd: "securepassword", roles: ["root"] }) - Restart MongoDB with authentication enabled
mongod --auth --port 27017 --dbpath /data/db - Connect and authenticate
mongosh -u admin -p securepassword --authenticationDatabase admin - Create additional users as needed
use mydb db.createUser({ user: "appuser", pwd: "apppassword", roles: ["readWrite"] })
MongoDB provides built-in roles at different levels:
- Database User Roles
read- Read datareadWrite- Read and write data
- Database Administration Roles
dbAdmin- Administrative tasksuserAdmin- Manage usersdbOwner- Combines readWrite, dbAdmin, userAdmin
- Cluster Administration Roles
clusterAdmin- Full cluster managementclusterManager- Monitoring and maintenanceclusterMonitor- Read-only monitoringhostManager- Manage servers
- Backup/Restore Roles
backup- Backup datarestore- Restore data
- All-Database Roles
readAnyDatabasereadWriteAnyDatabaseuserAdminAnyDatabasedbAdminAnyDatabase
- Superuser Roles
root- Full superuser access
Custom roles can be created with specific privileges.
11. Transactions
Yes, MongoDB supports multi-document ACID transactions starting from version 4.0 (for replica sets) and 4.2 (for sharded clusters).
Key characteristics:
- Provides atomicity across multiple documents
- Works across multiple collections and databases
- Supports read and write concerns
- Has performance overhead - should be used judiciously
Transaction operations:
// Start a session
const session = db.getMongo().startSession()
try {
// Start transaction
session.startTransaction({
readConcern: {level: "snapshot"},
writeConcern: {w: "majority"}
})
// Operations
db.accounts.updateOne(
{_id: 1, balance: {$gte: 100}},
{$inc: {balance: -100}},
{session}
)
db.accounts.updateOne(
{_id: 2},
{$inc: {balance: 100}},
{session}
)
// Commit transaction
session.commitTransaction()
} catch (error) {
// Abort transaction on error
session.abortTransaction()
throw error
} finally {
session.endSession()
}
MongoDB transactions have some limitations:
- Performance Impact
- Slower than single-document operations
- Not suitable for high-throughput use cases
- Time Limit
- Default 60-second timeout (configurable)
- Operations taking longer will abort
- Memory Usage
- All modifications must fit in memory
- Large transactions may fail
- Sharded Collections
- Cannot create collections in transactions
- Cannot create indexes in transactions
- Some DDL operations are restricted
- Feature Support
- Some commands cannot be used in transactions
- Certain operations have restrictions
Best practice is to use appropriate data modeling to minimize the need for transactions.
Key differences between MongoDB and RDBMS transactions:
| Feature | MongoDB | RDBMS |
|---|---|---|
| Scope | Multiple documents (can span collections) | Multiple rows (can span tables) |
| Default | Single-document atomicity by default | Explicit transactions often required |
| Performance | Higher overhead, not for high-throughput | Optimized for transactions |
| Isolation | Snapshot isolation | Various isolation levels |
| Duration | Limited (default 60s) | Can be long-running |
| Sharding | Supported (with limitations) | Varies by database |
12. MongoDB Atlas
MongoDB Atlas is the fully managed cloud database service for MongoDB, offering:
- Fully Managed - Automated provisioning, patching, upgrades
- Global Clusters - Deploy across multiple cloud regions
- Scalability - Easily scale up/down or out
- Security - Encryption, VPC peering, auditing
- Monitoring - Performance metrics and alerts
- Backups - Continuous and point-in-time recovery
- Integrations - BI connectors, triggers, serverless functions
Atlas is available on AWS, Azure, and Google Cloud, with multiple pricing tiers from free shared clusters to dedicated enterprise-grade instances.
To deploy a cluster in MongoDB Atlas:
- Sign up for an Atlas account
- Create a new project
- Click "Build a Cluster"
- Choose cloud provider and region
- Select cluster tier (M0 free tier available)
- Configure additional options:
- Cluster name
- MongoDB version
- Backup options
- Additional settings (BI connector, etc.)
- Click "Create Cluster"
- Configure database users and IP whitelist
- Connect to your cluster using the provided connection string
Cluster provisioning typically takes 5-10 minutes.
Atlas provides several additional features beyond standard MongoDB:
- Atlas Search - Full-text search capabilities
- Atlas Data Lake - Query data in S3 buckets
- Atlas Online Archive - Automatically archive old data
- Atlas Charts - Data visualization tool
- Atlas Triggers - Serverless functions for events
- Atlas App Services - Backend application platform
- BI Connector - SQL interface for BI tools
- Performance Advisor - Query optimization suggestions
- Global Clusters - Geographically distributed deployments
- Serverless Instances - Auto-scaling based on workload
13. Best Practices
MongoDB schema design best practices:
- Understand your access patterns - Design for how data will be queried
- Favor embedding for:
- One-to-one relationships
- One-to-few relationships
- Data that's always accessed together
- Use references for:
- One-to-many relationships
- Many-to-many relationships
- Large hierarchical data sets
- Consider write/read ratio - Optimize for your dominant operation
- Use appropriate data types - Proper types improve performance
- Plan for growth - Avoid unbounded document growth
- Denormalize carefully - Balance read performance vs. data consistency
- Implement document versioning if schema may change
MongoDB performance best practices:
- Use indexes effectively
- Create indexes to support your queries
- Use compound indexes for multiple fields
- Monitor index usage and remove unused indexes
- Optimize queries
- Use projection to return only needed fields
- Use covered queries when possible
- Avoid $where and JavaScript expressions
- Use explain() to analyze queries
- Hardware considerations
- Use SSDs for storage
- Ensure sufficient RAM for working set
- Consider dedicated servers for production
- Write concern and read preference
- Use appropriate write concern for your needs
- Distribute reads to secondaries when possible
- Sharding considerations
- Choose a good shard key
- Monitor chunk distribution
- Pre-split chunks for initial data load
MongoDB security best practices:
- Enable authentication - Always require authentication
- Use role-based access control - Follow principle of least privilege
- Encrypt communications - Use TLS/SSL for all connections
- Secure network exposure
- Bind to private IPs where possible
- Use firewalls to restrict access
- Consider VPN for remote access
- Regularly update MongoDB - Apply security patches
- Enable auditing (Enterprise feature) - Track sensitive operations
- Secure backups - Encrypt and protect backup data
- Monitor for suspicious activity - Set up alerts for unusual patterns
- Follow MongoDB security checklist - Refer to MongoDB documentation
14. Tricky interview questions (topic-wise)
BSON, replication, and aggregation edge cases that show up after “I know CRUD”—grouped so you can drill one theme per study session.
BSON types, comparison, and query matching
{ a: 42 } and { a: "42" } behave differently in filters and indexes?MongoDB preserves BSON types. Numeric 42 and string "42" are different values; comparison order follows BSON type order (arrays, objects, strings, numbers, etc.). Tricky interviews probe whether you rely on implicit string/numeric equality across drivers—always normalize at the application boundary or schema validation.
{ field: { $eq: null } } match both missing fields and explicit nulls?$eq: null matches documents where the field is absent or the field is null. If you need only explicit nulls, combine with $type: 10 (BSON Null) or use schema validation. Common pitfall when migrating SQL “IS NULL” intuition.
$where discouraged even when it “works”?JavaScript evaluation cannot use indexes, breaks aggregation pushdown, and scales poorly on large collections—good for tiny admin scripts, bad for production hot paths. Interview follow-up: prefer aggregation, partial indexes, or precomputed fields.
Indexes: prefixes, multikey, and covered queries
{ a: 1, b: 1, c: 1 }. Which queries can use the index efficiently?Equality on prefix fields preserves index usefulness: a, then a,b, then a,b,c in order. Range on b often stops further c keys from being used in the same compound way—know IXSCAN vs FETCH in explain("executionStats"). Include projection to discuss covered queries when all fields come from the index.
Indexing an array field produces a multikey index: one document generates multiple index keys. MongoDB cannot compound certain multikey patterns the same way (e.g., two parallel arrays on one compound index is rejected). Interviewers check you’ve hit “cannot index parallel arrays” errors in modeling reviews.
hint() make performance worse after data growth?Hints force a plan choice that may have been optimal for yesterday’s cardinality. Data skew, new query shapes, or storage engine cache state change best plans. Strong answer: re-benchmark with explain, adjust indexes, don’t cargo-cult hints.
Aggregation: $lookup, memory, and ordering
$lookup behave like a correlated subquery—and why is that expensive?Pipeline form $lookup with a subpipeline can execute per incoming document stream, multiplying work if not selective. Classic DBA question: compare to embedding vs pre-joining in ETL, and watch allowDiskUse when stages exceed memory limits.
Only stages that define order ($sort near the end, or ordered operators that preserve input) give guarantees. Pipelines without $sort may reorder for optimization—don’t assume $group output order matches insertion unless documented for that stage pairing.
$facet multiply memory pressure?$facet runs multiple subpipelines over the same input document stream; each branch can materialize results. Large fan-out facets need limits, earlier $match pruning, or redesign outside MongoDB for OLAP-scale workloads.
Replication, elections, and read concerns
writeConcern: { w: "majority" } still lose acknowledged writes on a full cluster failure?Majority is defined against replica set members that can persist the oplog; certain simultaneous failures or misconfigured PSA (Primary–Secondary–Arbiter) setups can create durability edge cases if a data-bearing node was never part of the majority commit point. Pair with j: true journaling discussion and backup RPO/RTO expectations—not a Mongo-only myth.
readPreference: secondary can return lagging data; maxStalenessSeconds caps acceptable lag. “Tricky” follow-up: causal consistency sessions and after-write reads—when must you read from primary?
Replica sets elect at most one primary at a time via Raft-like consensus; brief windows of divergent writes happen only in misconfigured networks or forced reconfigs. Interview answer: majority nodes, correct electable priorities, and network partition design.
Sharding, chunk splits, and hot shard keys
_id (e.g., ObjectId-only) a bad shard key for write-heavy inserts?Inserts concentrate on the chunk covering the high end of the key range, creating a hot shard and serialization on chunk migration. Better patterns: hashed sharding on _id, compound keys with high cardinality prefix, or pre-split + careful key design.
mongos may broadcast the query to all shards (scatter/gather), which explodes latency at scale. Tricky panel: how partial indexes or embedding the shard key in query patterns from the app layer prevents cluster-wide scans.
Transactions, document limits, and working set
The 16 MB BSON document limit still caps a single document; transactions coordinate locks across documents but don’t relax per-document size rules. Large blob patterns belong in GridFS or external object storage—common architecture trap question.
They retain snapshot history and cache pressure, blocking eviction and increasing conflict rates for writers. Answer ties to monitoring slow transactions, shortening critical sections, and schema patterns that avoid cross-shard transactions when possible.
TTL indexes, clocks, and collations
A background thread removes documents whose TTL date field is older than the server’s mongod clock by expireAfterSeconds; deletions are not instantaneous. Bad NTP skew causes surprise retention—ops-minded interview follow-up.
You need indexes created with a matching collation (or compatible key pattern); otherwise plans may ignore the index or require expensive transforms. Trick: mixing string cases in queries without collation on the operation side.