Russ Bateman
5 October 2012
last update:
For training, if using separate mongod from the one already running on your host, it must be launched on a different port:
$ mongod --port 30000
Thereafter (as seen occasionally in these notes), run tools against that port explicitly:
$ mongorestore --port 30000 -d digg dump/digg
Here's what the training subdirectory looked like on my host by the time it was over:
~ $ tree -d mongo-training/
mongo-training/
|— bin
`— data
|— db
| `— journal
|— dump
| |— digg
| `— training
|— rs1
| `— journal
|— rs2
| |— journal
| `— _tmp
|— rs3
| |— journal
| `— _tmp
`— rs4
|— journal
`— _tmp
18 directories
For the questions to the exercise answers below, the original MongoDB Training Course Manual is required. This may no longer exist.
To see options to configure:
$ bin/mongod --help
Journaling (100ms default), disk flush (60s default), data file size, logfiles, etc. MongoDB writes to a memory-mapped journal file, then begins asynchronously to write to the database. In the event that the node goes down, the journal is replayed.
$ mongod -v $ mongod -vv (up to 5 vs)
Can store these changes/settings in the configuration file, mongodb.conf. Typically, this file is on the path /etc/mongodb.conf, but in formal, data-center installations, it often ends up on a different path, e.g.: /data/mongodb/mongodb.conf.
$ mongo --port 30000
In shell, issue the following to see how MongoDB was launched:
> db.adminCommand( { getCmdLineOpts: 1 } ) { "argv" : [ "/usr/bin/mongod", "--config", "/etc/mongodb.conf" ], "parsed" : { "config" : "/etc/mongodb.conf", "dbpath" : "/var/lib/mongodb", "logappend" : "true", "logpath" : "/var/log/mongodb/mongodb.log" }, "ok" : 1 } > db.adminCommand( { getCmdLineOpts: 1 } ) { "argv" : [ "bin/mongod", "--port", "30000" ], "parsed" : { "port" : 30000 }, "ok" : 1 }
Everything in the shell is a command:
> db.adminCommand( { listDatabases : 1 } )
{
"databases" : [
{
"name" : "training",
"sizeOnDisk" : 218103808,
"empty" : false
},
{
"name" : "digg",
"sizeOnDisk" : 218103808,
"empty" : false
},
{
"name" : "twitter",
"sizeOnDisk" : 486539264,
"empty" : false
},
{
"name" : "local",
"sizeOnDisk" : 1,
"empty" : true
},
{
"name" : "test",
"sizeOnDisk" : 1,
"empty" : true
}
],
"totalSize" : 922746880,
"ok" : 1
}
...which is the long-hand for:
> show dbs
digg 0.203125GB
local (empty)
test (empty)
training 0.203125GB
twitter 0.453125GB
...which is a wrapper for the longer command (that organizes stuff differently).
Here's how to see what the size of a document would be:
> Object.bsonsize( { "hello" : "world" } )
22
> db.people.insert( { "name" : "Smith", "age" : 30 } ) > for( i = 0; i < 1000; i++ ) { db.people.insert( { a : 20 } ); } > db.people.find() { "_id" : ObjectId("50634c529cf47e02347c2ee2"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee3"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee4"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee5"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee6"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee7"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee8"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ee9"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eea"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eeb"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eec"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eed"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eee"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2eef"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef0"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef1"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef2"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef3"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef4"), "a" : 20 } { "_id" : ObjectId("50634c529cf47e02347c2ef5"), "a" : 20 } Type "it" for more
> db.exercise.insert( { "_id":"Jill", x : 1 } ) > db.exercise.findOne() { "_id" : "Jill", "x" : 1 } > db.exercise.insert( { "_id":"Jill", x : 2 } ) E11000 duplicate key error index: test.exercise.$_id_ dup key: { : "Jill" } > db.exercise.insert( { "_id" : 1.78 } ) > db.exercise.find() { "_id" : "Jill", "x" : 1 } { "_id" : 1.78 }
> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.people", "name" : "_id_" }
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.exercise", "name" : "_id_" }
> db.foo.insert( { a : { b : 2 } } ) > db.foo.find() { "_id" : ObjectId("506350809cf47e02347c32cb"), "a" : { "b" : 2 } }
(How to discover how 'find' works...)
> db.foo.find function (query, fields, limit, skip, batchSize, options) { return new DBQuery(this._mongo, this._db, this, this._fullName, this._massageObject(query), fields, limit, skip, batchSize, options || this.getQueryOptions()); } > db.foo.insert( { a : 100, b : 200 } ) > db.foo.find( { a : 100 } ) { "_id" : ObjectId("506350f89cf47e02347c32cc"), "a" : 100, "b" : 200 } > db.foo.find( { a : 100, b : 200 }, { a : true } ) { "_id" : ObjectId("506350f89cf47e02347c32cc"), "a" : 100 }
> db.serverStatus()
{
"host" : "russ-elite-book:30000",
"version" : "2.2.0",
"process" : "mongod",
"pid" : 4305,
"uptime" : 6099,
"uptimeMillis" : NumberLong(6098936),
"uptimeEstimate" : 6030,
"localTime" : ISODate("2012-09-26T19:09:23.219Z"),
"locks" : {
"." : {
"timeLockedMicros" : {
"R" : NumberLong(152395),
"W" : NumberLong(798928)
},
"timeAcquiringMicros" : {
"R" : NumberLong(5791312),
"W" : NumberLong(13902)
}
},
"admin" : {
"timeLockedMicros" : {
},
"timeAcquiringMicros" : {
}
},
"local" : {
"timeLockedMicros" : {
"r" : NumberLong(5088),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(829),
"w" : NumberLong(0)
}
},
"digg" : {
"timeLockedMicros" : {
"r" : NumberLong(12318),
"w" : NumberLong(2280581)
},
"timeAcquiringMicros" : {
"r" : NumberLong(339),
"w" : NumberLong(29432)
}
},
"test" : {
"timeLockedMicros" : {
"r" : NumberLong(2532),
"w" : NumberLong(1361803)
},
"timeAcquiringMicros" : {
"r" : NumberLong(144),
"w" : NumberLong(1140)
}
},
"training" : {
"timeLockedMicros" : {
"r" : NumberLong(3918),
"w" : NumberLong(1443396)
},
"timeAcquiringMicros" : {
"r" : NumberLong(235),
"w" : NumberLong(2509)
}
},
"twitter" : {
"timeLockedMicros" : {
"r" : NumberLong(3650),
"w" : NumberLong(2556026)
},
"timeAcquiringMicros" : {
"r" : NumberLong(206),
"w" : NumberLong(837704)
}
}
},
"globalLock" : {
"totalTime" : NumberLong("6098936000"),
"lockTime" : NumberLong(798928),
"currentQueue" : {
"total" : 0,
"readers" : 0,
"writers" : 0
},
"activeClients" : {
"total" : 0,
"readers" : 0,
"writers" : 0
}
},
"mem" : {
"bits" : 64,
"resident" : 142,
"virtual" : 1068,
"supported" : true,
"mapped" : 448,
"mappedWithJournal" : 896
},
"connections" : {
"current" : 2,
"available" : 817
},
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : 65816224,
"page_faults" : 0
},
"indexCounters" : {
"btree" : {
"accesses" : 149948,
"hits" : 149948,
"misses" : 0,
"resets" : 0,
"missRatio" : 0
}
},
"backgroundFlushing" : {
"flushes" : 101,
"total_ms" : 171,
"average_ms" : 1.693069306930693,
"last_ms" : 0,
"last_finished" : ISODate("2012-09-26T19:08:44.294Z")
},
"cursors" : {
"totalOpen" : 0,
"clientCursors_size" : 0,
"timedOut" : 2
},
"network" : {
"bytesIn" : 108016512,
"bytesOut" : 19095,
"numRequests" : 75547
},
"opcounters" : {
"insert" : 65438,
"query" : 450,
"update" : 0,
...
"local" : {
"accessesNotInMemory" : 0,
"pageFaultExceptionsThrown" : 0
},
"test" : {
"accessesNotInMemory" : 0,
"pageFaultExceptionsThrown" : 0
},
"training" : {
"accessesNotInMemory" : 0,
"pageFaultExceptionsThrown" : 0
},
"twitter" : {
"accessesNotInMemory" : 0,
"pageFaultExceptionsThrown" : 0
}
},
"ok" : 1
}
How findOne() works:
> db.foo.findOne
function (query, fields, options) {
var cursor = this._mongo.find(this._fullName, this._massageObject(query) || {}, fields, -1, 0, 0, options || this.getQueryOptions());
if (!cursor.hasNext()) {
return null;
}
var ret = cursor.next();
if (cursor.hasNext()) {
throw "findOne has more than 1 result!";
}
if (ret.$err) {
throw "error " + tojson(ret);
}
return ret;
}
> db.foo.find( { "a.b" : 2 } ) { "_id" : ObjectId("506350809cf47e02347c32cb"), "a" : { "b" : 2 } } > db.foo.insert( { a : { b : 2 } } ) > db.foo.find( { "a.b" : 2 } ) { "_id" : ObjectId("506363a59cf47e02347c32ce"), "a" : { "b" : 2 } } > db.foo.find( { a : { b : 2 } } ) { "_id" : ObjectId("506363a59cf47e02347c32ce"), "a" : { "b" : 2 } } > db.foo.insert( { a : { b : 2, c : 1 } } ) > db.foo.find( { a : { b : 2 } } ) { "_id" : ObjectId("506363a59cf47e02347c32ce"), "a" : { "b" : 2 } } > db.foo.find( { "a.b" : 2 } ) { "_id" : ObjectId("506363a59cf47e02347c32ce"), "a" : { "b" : 2 } } { "_id" : ObjectId("506363d79cf47e02347c32cf"), "a" : { "b" : 2, "c" : 1 } }
This is to illustrate that a document,
{ "a" : { "b":2, "c":1 } }
matched by { "a.b":2 } because this query is only concerned with matching that a.b is 2 and not whether the rest of the document matches. { "a":{"b":2} } says that the subdocument (underneath a) is closed to include only { "b":2 }.
Matching operators:
> db.foo.drop() true > db.foo.insert( { a:100, b:200 }) > db.foo.insert( { a:50, b:200 }) > db.foo.find() { "_id" : ObjectId("506365b29cf47e02347c32d0"), "a" : 100, "b" : 200 } { "_id" : ObjectId("506365b79cf47e02347c32d1"), "a" : 50, "b" : 200 } > db.foo.find( { a:{ $gte:60 } } ) { "_id" : ObjectId("506365b29cf47e02347c32d0"), "a" : 100, "b" : 200 } > db.foo.find( { a:{ $in: [50, 60] } } ) { "_id" : ObjectId("506365b79cf47e02347c32d1"), "a" : 50, "b" : 200 }
1.
> use training switched to db training > show collections scores system.indexes > db.scores.findOne() { "_id" : ObjectId("4c90f2543d937c033f424701"), "kind" : "quiz", "score" : 50, "student" : 0 } > db.scores.find( { "score" : { $lt : 65 } } ) { "_id" : ObjectId("4c90f2543d937c033f424701"), "kind" : "quiz", "score" : 50, "student" : 0 } { "_id" : ObjectId("4c90f2543d937c033f424703"), "kind" : "exam", "score" : 56, "student" : 0 } { "_id" : ObjectId("4c90f2543d937c033f424706"), "kind" : "exam", "score" : 58, "student" : 1 } { "_id" : ObjectId("4c90f2543d937c033f424709"), "kind" : "exam", "score" : 53, "student" : 2 } { "_id" : ObjectId("4c90f2543d937c033f42470a"), "kind" : "quiz", "score" : 58, "student" : 3 } { "_id" : ObjectId("4c90f2543d937c033f424710"), "kind" : "quiz", "score" : 54, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424711"), "kind" : "essay", "score" : 50, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424712"), "kind" : "exam", "score" : 50, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424714"), "kind" : "essay", "score" : 53, "student" : 6 } { "_id" : ObjectId("4c90f2543d937c033f424715"), "kind" : "exam", "score" : 51, "student" : 6 } { "_id" : ObjectId("4c90f2543d937c033f424718"), "kind" : "exam", "score" : 63, "student" : 7 } { "_id" : ObjectId("4c90f2543d937c033f424719"), "kind" : "quiz", "score" : 57, "student" : 8 } { "_id" : ObjectId("4c90f2543d937c033f42471e"), "kind" : "exam", "score" : 60, "student" : 9 } { "_id" : ObjectId("4c90f2543d937c033f42471f"), "kind" : "quiz", "score" : 50, "student" : 10 } { "_id" : ObjectId("4c90f2543d937c033f424722"), "kind" : "quiz", "score" : 64, "student" : 11 } { "_id" : ObjectId("4c90f2543d937c033f424725"), "kind" : "quiz", "score" : 59, "student" : 12 } { "_id" : ObjectId("4c90f2543d937c033f424729"), "kind" : "essay", "score" : 63, "student" : 13 } { "_id" : ObjectId("4c90f2543d937c033f42472e"), "kind" : "quiz", "score" : 54, "student" : 15 } { "_id" : ObjectId("4c90f2543d937c033f424731"), "kind" : "quiz", "score" : 54, "student" : 16 } { "_id" : ObjectId("4c90f2543d937c033f424734"), "kind" : "quiz", "score" : 61, "student" : 17 }
2.
> db.scores.find( { } ).sort( { "score" : 1 } ).limit( 1 ) { "_id" : ObjectId("4c90f2543d937c033f424701"), "kind" : "quiz", "score" : 50, "student" : 0 } > db.scores.find( { } ).sort( { "score" : -1 } ).limit( 1 ) { "_id" : ObjectId("4c90f2543d937c033f42471c"), "kind" : "quiz", "score" : 99, "student" : 9 }
3.
> db.stories.find( { "shorturl" : { $gt : { "view_count" : 1000 } } } ).count()
10000
4.
> db.stories.find( { $or : [ { "media" : "news" }, { "media" : "images" } ] } ).count() 8986 > db.stories.find( { "topic.name" : "Comedy" } ).count() 422 > db.stories.find( { $or : [ { "media" : "news" }, { "media" : "images" } ], "topic.name" : "Comedy" } ).count() 308
5.
> db.stories.find( { $or : [ { "topic.name" : "Television" }, { "media" : "videos" } ] }
).count()
1218
> db.stuff.insert( { _id:123, "foo" : "bar" } ) > db.stuff.find() { "_id" : 123, "foo" : "bar" } > db.stuff.update( { _id:123 }, { "hello" : "world" } ) > db.stuff.find() { "_id" : 123, "hello" : "world" }
This is "replace"--uncommon. This works thus:
> db.func.update
function (query, obj, upsert, multi) {
assert(query, "need a query");
assert(obj, "need an object");
var firstKey = null;
for (var k in obj) {
firstKey = k;
break;
}
if (firstKey != null && firstKey[0] == "$") {
this._validateObject(obj);
} else {
this._validateForStorage(obj);
}
if (typeof upsert === "object") {
assert(multi === undefined, "Fourth argument must be empty when specifying upsert and multi with an object.");
opts = upsert;
multi = opts.multi;
upsert = opts.upsert;
}
this._db._initExtraInfo();
this._mongo.update(this._fullName, query, obj, upsert ? true : false, multi ? true : false);
this._db._getExtraInfo("Updated");
}
This is upsert...
> db.stuff.update( { _id:123 }, { $set : { "foo" : "bar" } } ) > db.stuff.find() { "_id" : 123, "foo" : "bar", "hello" : "world" }
Pushing...
> db.stuff.insert( { a:1, b:[] } ) > db.stuff.find() { "_id" : 123, "foo" : "bar", "hello" : "world" } { "_id" : ObjectId("5063737c9cf47e02347c32d3"), "a" : 1, "b" : [] } > db.stuff.update( { a : 1 }, { $push : { b : 2 } } ) > db.stuff.find() { "_id" : 123, "foo" : "bar", "hello" : "world" } { "_id" : ObjectId("5063737c9cf47e02347c32d3"), "a" : 1, "b" : [ 2 ] }
Updating, incrementing...
> db.stuff.insert( { _id:1, a : 10 } ) > db.stuff.find() { "_id" : 123, "foo" : "bar", "hello" : "world" } { "_id" : ObjectId("5063737c9cf47e02347c32d3"), "a" : 1, "b" : [ 2 ] } { "_id" : ObjectId("506373ca9cf47e02347c32d4"), "a" : 1, "c" : [ ] } { "_id" : 1, "a" : 10 } > db.stuff.update( { _id:1 }, { $inc : { "a" : 5 } } ) > db.stuff.find() { "_id" : 123, "foo" : "bar", "hello" : "world" } { "_id" : ObjectId("5063737c9cf47e02347c32d3"), "a" : 1, "b" : [ 2 ] } { "_id" : ObjectId("506373ca9cf47e02347c32d4"), "a" : 1, "c" : [ ] } { "_id" : 1, "a" : 15 }
1.
> db.scores.update( { "score" : { $gt : 90 } }, { $set : { "grade" : "A" } }, false, true ) > db.scores.find( { "score" : { $gt : 90 } } ) { "_id" : ObjectId("4c90f2543d937c033f42470d"), "grade" : "A", "kind" : "quiz", "score" : 98, "student" : 4 } { "_id" : ObjectId("4c90f2543d937c033f424713"), "grade" : "A", "kind" : "quiz", "score" : 98, "student" : 6 } { "_id" : ObjectId("4c90f2543d937c033f42471c"), "grade" : "A", "kind" : "quiz", "score" : 99, "student" : 9 } { "_id" : ObjectId("4c90f2543d937c033f424724"), "grade" : "A", "kind" : "exam", "score" : 96, "student" : 11 } { "_id" : ObjectId("4c90f2543d937c033f42472c"), "grade" : "A", "kind" : "essay", "score" : 92, "student" : 14 } { "_id" : ObjectId("4c90f2543d937c033f42472f"), "grade" : "A", "kind" : "essay", "score" : 95, "student" : 15 } { "_id" : ObjectId("4c90f2543d937c033f424732"), "grade" : "A", "kind" : "essay", "score" : 94, "student" : 16 } { "_id" : ObjectId("4c90f2543d937c033f424748"), "grade" : "A", "kind" : "exam", "score" : 92, "student" : 23 } { "_id" : ObjectId("4c90f2543d937c033f42474d"), "grade" : "A", "kind" : "essay", "score" : 93, "student" : 25 } { "_id" : ObjectId("4c90f2543d937c033f424755"), "grade" : "A", "kind" : "quiz", "score" : 98, "student" : 28 } { "_id" : ObjectId("4c90f2543d937c033f424758"), "grade" : "A", "kind" : "quiz", "score" : 94, "student" : 29 } { "_id" : ObjectId("4c90f2543d937c033f424761"), "grade" : "A", "kind" : "quiz", "score" : 98, "student" : 32 } { "_id" : ObjectId("4c90f2543d937c033f424764"), "grade" : "A", "kind" : "quiz", "score" : 95, "student" : 33 } { "_id" : ObjectId("4c90f2543d937c033f424766"), "grade" : "A", "kind" : "exam", "score" : 91, "student" : 33 } { "_id" : ObjectId("4c90f2543d937c033f424768"), "grade" : "A", "kind" : "essay", "score" : 98, "student" : 34 } { "_id" : ObjectId("4c90f2543d937c033f424774"), "grade" : "A", "kind" : "essay", "score" : 93, "student" : 38 } { "_id" : ObjectId("4c90f2543d937c033f424776"), "grade" : "A", "kind" : "quiz", "score" : 91, "student" : 39 } { "_id" : ObjectId("4c90f2543d937c033f42477a"), "grade" : "A", "kind" : "essay", "score" : 96, "student" : 40 } { "_id" : ObjectId("4c90f2543d937c033f42477b"), "grade" : "A", "kind" : "exam", "score" : 98, "student" : 40 } { "_id" : ObjectId("4c90f2543d937c033f424788"), "grade" : "A", "kind" : "quiz", "score" : 98, "student" : 45 } Type "it" for more
Uh... this didn't really work. In fact, I goofed and got "grade" inserted, then fixed it. Trying the second half of the exercise, I never could get it to work. The query works; the update doesn't.
> db.scores.find( { $and : [ { "score" : { $lte : 90 } }, { "score" : { $gt : 80 } } ] }, {
$set : { "grade" : "B" } }, false, true )
error: { "$err" : "Unsupported projection option: grade", "code" : 13097 }
Here, I kept doing "find" instead of update (as I was copying and pasting in order not to have to re-type).
> db.scores.find( { $and : [ { "score" : { $lte : 90 } }, { "score" : { $gt : 80 } } ] }
)
{ "_id" : ObjectId("4c90f2543d937c033f424707"), "grade" : "B", "kind" : "quiz", "score" : 90, "student" : 2 }
{ "_id" : ObjectId("4c90f2543d937c033f42470f"), "grade" : "B", "kind" : "exam", "score" : 86, "student" : 4 }
{ "_id" : ObjectId("4c90f2543d937c033f4247b0"), "grade" : "B", "kind" : "essay", "score" : 85, "student" : 58 }
{ "_id" : ObjectId("4c90f2543d937c033f4247bf"), "grade" : "B", "kind" : "essay", "score" : 83, "student" : 63 }
{ "_id" : ObjectId("4c90f2543d937c033f4247d1"), "grade" : "B", "kind" : "essay", "score" : 85, "student" : 69 }
{ "_id" : ObjectId("4c90f2543d937c033f4247e6"), "grade" : "B", "kind" : "essay", "score" : 87, "student" : 76 }
{ "_id" : ObjectId("4c90f2543d937c033f4247f2"), "grade" : "B", "kind" : "essay", "score" : 84, "student" : 80 }
{ "_id" : ObjectId("4c90f2543d937c033f4247fb"), "grade" : "B", "kind" : "essay", "score" : 90, "student" : 83 }
{ "_id" : ObjectId("4c90f2543d937c033f4247fc"), "grade" : "B", "kind" : "exam", "score" : 88, "student" : 83 }
{ "_id" : ObjectId("4c90f2543d937c033f424800"), "grade" : "B", "kind" : "quiz", "score" : 90, "student" : 85 }
{ "_id" : ObjectId("4c90f2543d937c033f424803"), "grade" : "B", "kind" : "quiz", "score" : 84, "student" : 86 }
{ "_id" : ObjectId("4c90f2543d937c033f42480a"), "grade" : "B", "kind" : "essay", "score" : 90, "student" : 88 }
{ "_id" : ObjectId("4c90f2543d937c033f424817"), "grade" : "B", "kind" : "exam", "score" : 87, "student" : 92 }
{ "_id" : ObjectId("4c90f2543d937c033f424819"), "grade" : "B", "kind" : "essay", "score" : 84, "student" : 93 }
{ "_id" : ObjectId("4c90f2543d937c033f42481b"), "grade" : "B", "kind" : "quiz", "score" : 89, "student" : 94 }
{ "_id" : ObjectId("4c90f2543d937c033f42481d"), "grade" : "B", "kind" : "exam", "score" : 87, "student" : 94 }
{ "_id" : ObjectId("4c90f2543d937c033f424826"), "grade" : "B", "kind" : "exam", "score" : 87, "student" : 97 }
{ "_id" : ObjectId("4c90f2543d937c033f424829"), "grade" : "B", "kind" : "exam", "score" : 82, "student" : 98 }
{ "_id" : ObjectId("4c90f2543d937c033f42482e"), "grade" : "B", "kind" : "essay", "score" : 83, "student" : 100 }
{ "_id" : ObjectId("4c90f2543d937c033f42483b"), "grade" : "B", "kind" : "exam", "score" : 86, "student" : 104 }
Type "it" for more
2.
> db.scores.update( { "score" : { $lt : 60 } }, { $inc : { "score" : 10 } }, false, true ) > db.scores.find( { "score" : { $lt : 60 } } ) > db.scores.find( { "score" : { $lt : 70 } } ) { "_id" : ObjectId("4c90f2543d937c033f424701"), "kind" : "quiz", "score" : 60, "student" : 0 } { "_id" : ObjectId("4c90f2543d937c033f424703"), "kind" : "exam", "score" : 66, "student" : 0 } { "_id" : ObjectId("4c90f2543d937c033f424706"), "kind" : "exam", "score" : 68, "student" : 1 } { "_id" : ObjectId("4c90f2543d937c033f424709"), "kind" : "exam", "score" : 63, "student" : 2 } { "_id" : ObjectId("4c90f2543d937c033f42470a"), "kind" : "quiz", "score" : 68, "student" : 3 } { "_id" : ObjectId("4c90f2543d937c033f424710"), "kind" : "quiz", "score" : 64, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424711"), "kind" : "essay", "score" : 60, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424712"), "kind" : "exam", "score" : 60, "student" : 5 } { "_id" : ObjectId("4c90f2543d937c033f424714"), "kind" : "essay", "score" : 63, "student" : 6 } { "_id" : ObjectId("4c90f2543d937c033f424715"), "kind" : "exam", "score" : 61, "student" : 6 } { "_id" : ObjectId("4c90f2543d937c033f424718"), "kind" : "exam", "score" : 63, "student" : 7 } { "_id" : ObjectId("4c90f2543d937c033f424719"), "kind" : "quiz", "score" : 67, "student" : 8 } { "_id" : ObjectId("4c90f2543d937c033f42471e"), "kind" : "exam", "score" : 60, "student" : 9 } { "_id" : ObjectId("4c90f2543d937c033f42471f"), "kind" : "quiz", "score" : 60, "student" : 10 } { "_id" : ObjectId("4c90f2543d937c033f424722"), "kind" : "quiz", "score" : 64, "student" : 11 } { "_id" : ObjectId("4c90f2543d937c033f424725"), "kind" : "quiz", "score" : 60, "student" : 12 } { "_id" : ObjectId("4c90f2543d937c033f424729"), "kind" : "essay", "score" : 63, "student" : 13 } { "_id" : ObjectId("4c90f2543d937c033f42472e"), "kind" : "quiz", "score" : 64, "student" : 15 } { "_id" : ObjectId("4c90f2543d937c033f424731"), "kind" : "quiz", "score" : 64, "student" : 16 } { "_id" : ObjectId("4c90f2543d937c033f424734"), "kind" : "quiz", "score" : 61, "student" : 17 } Type "it" for more
Write locking, favored over read locks, exists since 2.2 on a per database level. This means that if there's contention with lots of writes going on, it may be useful to move to a one collection per database layout for Account, Address, Payment, Partner.
How does finding a document work...
> db.tweets.find( { "user.followers_count" : 1000 } ).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 8,
"nscannedObjects" : 51428,
"nscanned" : 51428,
"nscannedObjectsAllPlans" : 51428,
"nscannedAllPlans" : 51428,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 134,
"indexBounds" : {
},
"server" : "russ-elite-book:30000"
}
Shows that we looked through 51428 documents and it took 134 milliseconds. This isn't good. Create an index:
> db.tweets.ensureIndex( { "user.followers_count" : 1 } ) > show collections system.indexes tweets > db.system.indexes.find() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "twitter.tweets", "name" : "_id_" } { "v" : 1, "key" : { "user.followers_count" : 1 }, "ns" : "twitter.tweets", "name" : "user.followers_count_1" }
Then, we re-run the query:
> db.tweets.find( { "user.followers_count" : 1000 } ).explain()
{
"cursor" : "BtreeCursor user.followers_count_1",
"isMultiKey" : false,
"n" : 8,
"nscannedObjects" : 8,
"nscanned" : 8,
"nscannedObjectsAllPlans" : 8,
"nscannedAllPlans" : 8,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"user.followers_count" : [
[
1000,
1000
]
]
},
"server" : "russ-elite-book:30000"
}
...and notice the difference. We only needed to look through 8 documents and it took "no time" at all.
Multikey indices; tags is an array...
> db.foo.insert( { name:"Raleigh", "tags": [ "north", "carolina", "unc" ] } ) > db.foo.ensureIndex( { tags : 1 } ) > db.foo.find( { tags : "north" } ).explain() { "cursor" : "BtreeCursor tags_1", "isMultiKey" : true, "n" : 1, "nscannedObjects" : 1, "nscanned" : 1, "nscannedObjectsAllPlans" : 1, "nscannedAllPlans" : 1, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "tags" : [ [ "north", "north" ] ] }, "server" : "russ-elite-book:30000" }
Only 64 indices per collection.
Polish off indices...
> db.foo.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "twitter.foo", "name" : "_id_" }, { "v" : 1, "key" : { "tags" : 1 }, "ns" : "twitter.foo", "name" : "tags_1" } ] > db.system.indexes.find() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "twitter.tweets", "name" : "_id_" } { "v" : 1, "key" : { "user.followers_count" : 1 }, "ns" : "twitter.tweets", "name" : "user.followers_count_1" } { "v" : 1, "key" : { "_id" : 1 }, "ns" : "twitter.foo", "name" : "_id_" } { "v" : 1, "key" : { "tags" : 1 }, "ns" : "twitter.foo", "name" : "tags_1" } > db.foo.reIndex()...
A sparse index is when documents missing particular fields, that are explicitly referenced in an index, are not included in that index which narrows the search space. For instance, imagine the following document:
{ title : "Don't Stop Me Now", artist : "Queen", metadata : { genre : "rock", length : 120, bps : 120, key : "A" } }
If the index is built with the fields in the subdocument, metadata, thus:
> db.foo.ensureIndex( { "metatdata.genre" : 1, > ..."metadata.length" : 1, > ..."metadata.bps" : 1, > ..."metadata.key" : 1 }, > ...{ sparse : true } } )
...those titles for which there's no record of what key they are in will not encumber the index.
The problem with doing this is, however, that as new fields are added conveniently to the schema, all the collection must be traversed for each new index added. The advantages to sparse indices are:
...while the disadvantages are:
One solution to this problem, to continue the example, is multikeying, which groups elements in an array (instead of a subdocument as here) making the metadata items here just part of the same, single field:
{ title : "Don't Stop Me Now", artist : "Queen", metadata : [ { genre : "rock" }, { length : 120 }, { bps : 120 }, { key : "A" } ] }
The index is built and finds conducted:
> db.foo.ensureIndex( { "metatdata" : 1 } ) > db.foo.find( { metatdata : { duration : 120 } } )
There is a different problem with this approach. It is that anything, but a direct match on an element will lead to reverting to the basic cursor or simple traversal of the entire collection again. The advantages of this approach were:
...but the disadvantages are:
To have but a single index, but one that's powerful enough to match on fields no matter how the fields are represented, use this approach. In essence it's a an array of key/value pairs.
{ title : "Don't Stop Me Now", artist : "Queen", metadata : [ { key : "genre", value : "rock" }, { key : "length", value : 120 }, { key : "bps", value : 120 }, { key : "key", value : "A" } ] }
The index is built thus:
> db.foo.ensureIndex( { "metatdata.key" : 1, "metadata.value" : 1 } ) > db.foo.find( { metatdata : { duration : 120 } } )
The advantages to this approach are:
...while the (inevitable) disadvantages are:
The new document is uglier and less straighforward; storage is amplified as "key" and "value" are everywhere repeated over and over.
(I took no notes; http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart is a good resource for recovering SQL addicts.)
(I took few notes because I'm really a Java guy.)
(Yup, the duplicate section number was in fact in the otherwise very nice 10gen manual too. Sorry.)
There's a heartbeat that goes around between mongod processes.
Replication happens in a few seconds or a few minutes.
The primary replica is "elected" by all the active nodes.
You can only read and write off the primary.
If there's no majority and the primary isn't up, then the system goes into a WAIT state.
How to elect a new primary...
After the former primary comes back up, it's only a secondary..
An arbiter is a node that isn't a replica (performs no database function), but can vote for a new primary.
By default, all replicas start out with equal priority (1).
A capped collection is a circular buffer.
You can copy the database by freezing the replica for a sufficient number of seconds during the copy (cp -R).
> db.runCommand( { getLastError:1, w:3, wtimeout:1000 } )
--must not return until back from committing to at least 3 replicas.
What's in the database?
db |— digg.0 |— digg.1 |— digg.ns |— journal | |— j._0 | |— prealloc.1 | `— prealloc.2 |— mongod.lock |— test.0 |— test.1 |— test.ns |— training.0 |— training.1 |— training.ns |— twitter.0 |— twitter.1 |— twitter.2 `— twitter.ns
1.
Three.
2.
DC-A | DC-B |
0 | 5 |
1 | 4 |
2 | 3 |
3 | 2 |
4 | 1 |
5 | 0 |
3.
Think about the scenarios above and the answers below.
4.
Three and two.
5.
Nothing can happen as there aren't enough voting members.
6.
?
(I offer a live example of setting up four replica nodes and an arbiter across two pieces of hardware at http://www.javahotchocolate.com/notes/mongodb-replica.html).
Note in what's going on below that I was already using port 30000 in order not to monkey with MongoDB running on 27017.
Start all of this by creating a few subdirectories for the replicas to live in:
$ mkdir rs1 rs2 rs3
#!/bin/sh # subdirectories data/rs1, rs2 and rs3 must already exist. ../bin/mongod --port 30000 --replSet foo --logpath "1.log" --dbpath /data/rs1 --fork ../bin/mongod --port 30001 --replSet foo --logpath "2.log" --dbpath /data/rs2 --fork ../bin/mongod --port 30002 --replSet foo --logpath "3.log" --dbpath /data/rs3 --fork
It doesn't do any good to run this against the mongo shell from the command line as then launching the shell, config will still not be defined. Instead, simply copy and paste it into the shell.
config = { _id:"foo", members: [ { _id:0, host:"localhost:30000" }, { _id:1, host:"localhost:30001" }, { _id:2, host:"localhost:30002" } ] }
$ mongo --port 30000
> (paste replica-config.js here so that config is defined)
> rs.initiate( config )
> rs.status()
Here's the illustration...
bash shell work...
~/mongo-training/data $ ./erect-replicas.sh
forked process: 5669
all output going to: /home/russ/mongo-training/data/1.log
log file [/home/russ/mongo-training/data/1.log] exists; copied to temporary file [/home/russ/mongo-training/data/1.log.2012-09-27T21-04-17]
child process started successfully, parent exiting
forked process: 5717
all output going to: /home/russ/mongo-training/data/2.log
log file [/home/russ/mongo-training/data/2.log] exists; copied to temporary file [/home/russ/mongo-training/data/2.log.2012-09-27T21-04-17]
forked process: 5724
all output going to: /home/russ/mongo-training/data/3.log
log file [/home/russ/mongo-training/data/3.log] exists; copied to temporary file [/home/russ/mongo-training/data/3.log.2012-09-27T21-04-17]
Mongo shell work...
~/mongo-training/data $ ../bin/mongo --port 30000 MongoDB shell version: 2.2.0 connecting to: 127.0.0.1:30000/test > config = ... { _id:"foo", members: ... [ ... { _id:0, host:"localhost:30000" }, ... { _id:1, host:"localhost:30001" }, ... { _id:2, host:"localhost:30002" } ... ] ... } { "_id" : "foo", "members" : [ { "_id" : 0, "host" : "localhost:30000" }, { "_id" : 1, "host" : "localhost:30001" }, { "_id" : 2, "host" : "localhost:30002" } ] } > rs.initiate( config ) { "info" : "Config now saved locally. Should come online in about a minute.", "ok" : 1 } > rs.status() { "set" : "foo", "date" : ISODate("2012-09-27T21:12:22Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "localhost:30000", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 485, "optime" : Timestamp(1348780280000, 1), "optimeDate" : ISODate("2012-09-27T21:11:20Z"), "self" : true }, { "_id" : 1, "name" : "localhost:30001", "health" : 1, "state" : 5, "stateStr" : "STARTUP2", "uptime" : 62, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2012-09-27T21:12:21Z"), "pingMs" : 62 }, { "_id" : 2, "name" : "localhost:30002", "health" : 1, "state" : 3, "stateStr" : "RECOVERING", "uptime" : 62, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2012-09-27T21:12:22Z"), "pingMs" : 0 } ], "ok" : 1 } foo:PRIMARY>
$ mongod --port 30000
Looking at replica status...
foo:PRIMARY> rs.conf()
{
"_id" : "foo",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "localhost:30000"
},
{
"_id" : 1,
"host" : "localhost:30001"
},
{
"_id" : 2,
"host" : "localhost:30002"
}
]
}
How to modify a configuration element. See here:
foo:PRIMARY> rs.conf() { "_id" : "foo", "version" : 1, "members" : [ { "_id" : 0, "host" : "localhost:30000" }, { "_id" : 1, "host" : "localhost:30001" }, { "_id" : 2, "host" : "localhost:30002" } ] } foo:PRIMARY> var config=rs.conf() foo:PRIMARY> config.members[ 2 ].priority = 0 0 foo:PRIMARY> rs.reconfig( config ) Thu Sep 27 16:00:47 DBClientCursor::init call() failed Thu Sep 27 16:00:47 query failed : admin.$cmd { replSetReconfig: { _id: "foo", version: 2, members: [ { _id: 0, host: "localhost:30000" }, { _id: 1, host: "localhost:30001" }, { _id: 2, host: "localhost:30002", priority: 0.0 } ] } } to: 127.0.0.1:30000 Thu Sep 27 16:00:47 trying reconnect to 127.0.0.1:30000 Thu Sep 27 16:00:47 reconnect 127.0.0.1:30000 ok reconnected to server after rs command (which is normal) foo:PRIMARY> rs.conf() { "_id" : "foo", "version" : 2, "members" : [ { "_id" : 0, "host" : "localhost:30000" }, { "_id" : 1, "host" : "localhost:30001" }, { "_id" : 2, "host" : "localhost:30002", "priority" : 0 } ] } foo:PRIMARY>
As soon as you reconfigure, the shell loses connection with the primary until it reconnects.
foo:PRIMARY> rs.status()
{
"set" : "foo",
"date" : ISODate("2012-09-27T22:02:39Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:30000",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 3502,
"optime" : Timestamp(1348783247000, 1),
"optimeDate" : ISODate("2012-09-27T22:00:47Z"),
"self" : true
},
{
"_id" : 1,
"name" : "localhost:30001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 110,
"optime" : Timestamp(1348783247000, 1),
"optimeDate" : ISODate("2012-09-27T22:00:47Z"),
"lastHeartbeat" : ISODate("2012-09-27T22:02:39Z"),
"pingMs" : 0,
"errmsg" : "syncing to: localhost:30000"
},
{
"_id" : 2,
"name" : "localhost:30002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 112,
"optime" : Timestamp(1348783247000, 1),
"optimeDate" : ISODate("2012-09-27T22:00:47Z"),
"lastHeartbeat" : ISODate("2012-09-27T22:02:39Z"),
"pingMs" : 0,
"errmsg" : "syncing to: localhost:30000"
}
],
"ok" : 1
}
The errmsg isn't really an error at all.
foo:PRIMARY> var config=rs.conf() foo:PRIMARY> config.members[ 2 ].hidden = true true foo:PRIMARY> rs.reconfig( config ) Thu Sep 27 16:08:06 DBClientCursor::init call() failed Thu Sep 27 16:08:06 query failed : admin.$cmd { replSetReconfig: { _id: "foo", version: 3, members: [ { _id: 0, host: "localhost:30000" }, { _id: 1, host: "localhost:30001" }, { _id: 2, host: "localhost:30002", priority: 0.0, hidden: true } ] } } to: 127.0.0.1:30000 Thu Sep 27 16:08:06 trying reconnect to 127.0.0.1:30000 Thu Sep 27 16:08:06 reconnect 127.0.0.1:30000 ok reconnected to server after rs command (which is normal) foo:PRIMARY> rs.conf() { "_id" : "foo", "version" : 3, "members" : [ { "_id" : 0, "host" : "localhost:30000" }, { "_id" : 1, "host" : "localhost:30001" }, { "_id" : 2, "host" : "localhost:30002", "priority" : 0, "hidden" : true } ] } foo:PRIMARY>
Looking at the oplog...
foo:PRIMARY> use local switched to db local foo:PRIMARY> show tables oplog.rs slaves system.indexes system.replset foo:PRIMARY> db.me.find() foo:PRIMARY> use twitter switched to db twitter foo:PRIMARY> db.tweets.insert( { "name", "Smith" } ) Thu Sep 27 16:11:28 SyntaxError: missing : after property id (shell):1 foo:PRIMARY> use local switched to db local foo:PRIMARY> show collections oplog.rs slaves system.indexes system.replset foo:PRIMARY> db.oplog.rs.find() { "ts" : Timestamp(1348780280000, 1), "h" : NumberLong(0), "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } } { "ts" : Timestamp(1348783247000, 1), "h" : NumberLong("6246938530832973841"), "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 2 } } { "ts" : Timestamp(1348783686000, 1), "h" : NumberLong("-9112696901504822284"), "op" : "n", "ns" : "", "o" : { "msg" : "Reconfig set", "version" : 3 } }
This isn't explained very well. bind_ip is used particularly when the host has more than one IP address (interface) associated with it and you don't want MongoDB to be listening on every one of them, but only one or two. It accepts a comma-delimited list.
This example isn't very complete, but here's how. To add an arbiter, you need
Launch it thus:
$ mongod --port 37016 --replSet foo
Add it using the MongoDB shell:
$ mongo --port 30000 foo:PRIMART> rs.addArb( "localhost:37021" )
You'll see the arbiter using rs.status() along with the rest of the replica set details.
1.
foo:PRIMARY> config.members[ 2 ].slaveDelay = 60 60 foo:PRIMARY> rs.reconfig( config ) Thu Sep 27 16:24:35 DBClientCursor::init call() failed Thu Sep 27 16:24:35 query failed : admin.$cmd { replSetReconfig: { _id: "foo", version: 4, members: [ { _id: 0, host: "localhost:30000" }, { _id: 1, host: "localhost:30001" }, { _id: 2, host: "localhost:30002", priority: 0.0, hidden: true, slaveDelay: 60.0 } ] } } to: 127.0.0.1:30000 Thu Sep 27 16:24:35 trying reconnect to 127.0.0.1:30000 Thu Sep 27 16:24:35 reconnect 127.0.0.1:30000 ok reconnected to server after rs command (which is normal) foo:PRIMARY> rs.conf() { "_id" : "foo", "version" : 4, "members" : [ { "_id" : 0, "host" : "localhost:30000" }, { "_id" : 1, "host" : "localhost:30001" }, { "_id" : 2, "host" : "localhost:30002", "priority" : 0, "slaveDelay" : 60, "hidden" : true } ] }
2.
foo:PRIMARY> rs.status() { "set" : "foo", "date" : ISODate("2012-09-27T22:25:43Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "localhost:30000", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 4886, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "self" : true }, { "_id" : 1, "name" : "localhost:30001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 66, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:25:41Z"), "pingMs" : 0, "errmsg" : "syncing to: localhost:30000" }, { "_id" : 2, "name" : "localhost:30002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 66, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:25:41Z"), "pingMs" : 0, "errmsg" : "syncing to: localhost:30000" } ], "ok" : 1 } foo:PRIMARY> db.adminCommand( { replSetStepDown:1 } ) Thu Sep 27 16:30:12 DBClientCursor::init call() failed Thu Sep 27 16:30:12 query failed : admin.$cmd { replSetStepDown: 1.0 } to: 127.0.0.1:30000 Thu Sep 27 16:30:12 Error: error doing query: failed src/mongo/shell/collection.js:155 Thu Sep 27 16:30:12 trying reconnect to 127.0.0.1:30000 Thu Sep 27 16:30:12 reconnect 127.0.0.1:30000 ok foo:SECONDARY> rs.status() { "set" : "foo", "date" : ISODate("2012-09-27T22:32:45Z"), "myState" : 2, "syncingTo" : "localhost:30001", "members" : [ { "_id" : 0, "name" : "localhost:30000", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 5308, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "self" : true }, { "_id" : 1, "name" : "localhost:30001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 150, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:32:43Z"), "pingMs" : 0 }, { "_id" : 2, "name" : "localhost:30002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 150, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:32:43Z"), "pingMs" : 0 } ], "ok" : 1 }
3.
~/mongo-training/data $ ps -ef | grep [m]ongo mongodb 1343 1 0 09:11 ? 00:01:08 /usr/bin/mongod --config /etc/mongodb.conf russ 5490 1 0 14:57 ? 00:00:38 ../bin/mongod --port 30001 --replSet foo --logpath 2.log --dbpath /data/rs2 --fork russ 5578 1 0 14:59 ? 00:00:37 ../bin/mongod --port 30002 --replSet foo --logpath 3.log --dbpath /data/rs3 --fork russ 5669 1 0 15:04 ? 00:00:30 ../bin/mongod --port 30000 --replSet foo --logpath 1.log --dbpath /data/rs1 --fork russ 5767 2085 0 15:08 pts/2 00:00:00 ../bin/mongo --port 30000 russ 7936 5124 0 16:26 pts/4 00:00:00 mongo --port 30001 ~/mongo-training/data $ kill -9 5669
And, from the Mongo shell that was looking at the mongod process we just killed...
foo:SECONDARY> rs.status()
Thu Sep 27 16:36:31 DBClientCursor::init call() failed
Thu Sep 27 16:36:31 query failed : admin.$cmd { replSetGetStatus: 1.0 } to: 127.0.0.1:30000
Thu Sep 27 16:36:31 Error: error doing query: failed src/mongo/shell/collection.js:155
Thu Sep 27 16:36:31 trying reconnect to 127.0.0.1:30000
Thu Sep 27 16:36:31 reconnect 127.0.0.1:30000 failed couldn't connect to server 127.0.0.1:30000
But, from the original secondary, now become primary (in exercise 2):
foo:PRIMARY> rs.status()
{
"set" : "foo",
"date" : ISODate("2012-09-27T22:37:36Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:30000",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1348784675000, 1),
"optimeDate" : ISODate("2012-09-27T22:24:35Z"),
"lastHeartbeat" : ISODate("2012-09-27T22:35:46Z"),
"pingMs" : 0,
"errmsg" : "socket exception [CONNECT_ERROR] for localhost:30000"
},
{
"_id" : 1,
"name" : "localhost:30001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 5985,
"optime" : Timestamp(1348784675000, 1),
"optimeDate" : ISODate("2012-09-27T22:24:35Z"),
"self" : true
},
{
"_id" : 2,
"name" : "localhost:30002",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 780,
"optime" : Timestamp(1348784675000, 1),
"optimeDate" : ISODate("2012-09-27T22:24:35Z"),
"lastHeartbeat" : ISODate("2012-09-27T22:37:34Z"),
"pingMs" : 0
}
],
"ok" : 1
}
4. Now, use the original command that created the original primary, which we've just killed, to set back up a replica:
~/mongo-training/data $ cat erect-replicas.sh #!/bin/sh # subdirectories data/rs1, rs2 and rs3 must already exist. ../bin/mongod --port 30000 --replSet foo --logpath "1.log" --dbpath /data/rs1 --fork ../bin/mongod --port 30001 --replSet foo --logpath "2.log" --dbpath /data/rs2 --fork ../bin/mongod --port 30002 --replSet foo --logpath "3.log" --dbpath /data/rs3 --fork ~/mongo-training/data $ ../bin/mongod --port 30000 --replSet foo --logpath "1.log" --dbpath /data/rs1 --fork forked process: 8767 all output going to: /home/russ/mongo-training/data/1.log log file [/home/russ/mongo-training/data/1.log] exists; copied to temporary file [/home/russ/mongo-training/data/1.log.2012-09-27T22-39-11] child process started successfully, parent exiting
No! Don't do rs.add() for a replica that already existed. This is not the exercise. Create a brand new replica! So, kill this one for now and don't use it.
~/mongo-training/data $ ps -ef | grep [m]ongo mongodb 1343 1 0 09:11 ? 00:01:09 /usr/bin/mongod --config /etc/mongodb.conf russ 5490 1 0 14:57 ? 00:00:39 ../bin/mongod --port 30001 --replSet foo --logpath 2.log --dbpath /data/rs2 --fork russ 5578 1 0 14:59 ? 00:00:38 ../bin/mongod --port 30002 --replSet foo --logpath 3.log --dbpath /data/rs3 --fork russ 5767 2085 0 15:08 pts/2 00:00:00 ../bin/mongo --port 30000 russ 7936 5124 0 16:26 pts/4 00:00:00 mongo --port 30001 russ 8767 1 1 16:39 ? 00:00:02 ../bin/mongod --port 30000 --replSet foo --logpath 1.log --dbpath /data/rs1 --fork ~/mongo-training/data $ kill -9 8767 ~/mongo-training/data $ ../bin/mongod --port 30003 --replSet foo --logpath "4.log" --dbpath /data/rs4 --fork forked process: 9031 all output going to: /home/russ/mongo-training/data/4.log
Add the new replica...
foo:PRIMARY> rs.add( "localhost:30003" )
{
"errmsg" : "exception: need most members up to reconfigure, not ok : localhost:30003",
"code" : 13144,
"ok" : 0
}
Oops, didn't create the new subdirectory, so it didn't run:
~/mongo-training/data $ mkdir rs4 ~/mongo-training/data $ ../bin/mongod --port 30003 --replSet foo --logpath "4.log" --dbpath /data/rs4 --fork forked process: 9352 all output going to: /home/russ/mongo-training/data/4.log log file [/home/russ/mongo-training/data/4.log] exists; copied to temporary file [/home/russ/mongo-training/data/4.log.2012-09-27T22-46-35] child process started successfully, parent exiting
Try it again:
foo:PRIMARY> rs.add( "localhost:30003" ) { "down" : [ "localhost:30000" ], "ok" : 1 } foo:PRIMARY> rs.status() { "set" : "foo", "date" : ISODate("2012-09-27T22:49:27Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "localhost:30000", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:44:05Z"), "pingMs" : 0, "errmsg" : "socket exception [CONNECT_ERROR] for localhost:30000" }, { "_id" : 1, "name" : "localhost:30001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 6696, "optime" : Timestamp(1348786131000, 1), "optimeDate" : ISODate("2012-09-27T22:48:51Z"), "self" : true }, { "_id" : 2, "name" : "localhost:30002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 1491, "optime" : Timestamp(1348784675000, 1), "optimeDate" : ISODate("2012-09-27T22:24:35Z"), "lastHeartbeat" : ISODate("2012-09-27T22:49:27Z"), "pingMs" : 0 }, { "_id" : 3, "name" : "localhost:30003", "health" : 1, "state" : 5, "stateStr" : "STARTUP2", "uptime" : 36, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2012-09-27T22:49:25Z"), "pingMs" : 0 } ], "ok" : 1 }
After a few minutes, this will change to a full blown SECONDARY:
foo:PRIMARY> rs.status()
{
"set" : "foo",
"date" : ISODate("2012-09-27T22:49:27Z"),
"myState" : 1,
"members" : [
...
{
"_id" : 3,
"name" : "localhost:30003",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 122,
"optime" : Timestamp(1348786131000, 1),
"optimeDate" : ISODate("2012-09-27T22:48:51Z"),
"lastHeartbeat" : ISODate("2012-09-27T22:50:51Z"),
"pingMs" : 0
}
],
"ok" : 1
}
This would be the real way to set up replicas (as compared to what we did above), i.e.: one node per hardware host. The steps to do this are:
$ mongod --config config-file # (typically /etc/mongodb.conf)
$ mongo [ --port port-number ] [ --host hostname ]
A simple configuration file might contain:
port=37018 replSet=myset logpath=/mongodb/data/myset.log dbpath=/mongodb/data/americas fork=true logappend=true
See page 35.
RAID10 is RAID 1 and RAID 0.
A good idea is to use a couple of replicas to stand in for a back-up.
flushes and comes back...
foo:PRIMARY> use admin switched to db admin foo:PRIMARY> db.runCommand( { fsync:1 } ) { "numFiles" : 6, "ok" : 1 }
with lock, locks against writes...
foo:PRIMARY> db.runCommand( { fsync:1, lock:true } ) { "info" : "now locked against writes, use db.fsyncUnlock() to unlock", "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand", "ok" : 1 } foo:PRIMARY> db.fsyncUnlock() { "ok" : 1, "info" : "unlock completed" }
(Some monitoring yesterday afternoon already...)
Using mongostat...
~/mongo-training/data $ ../bin/mongostat --port 30000 2
connected to: 127.0.0.1:30000
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 0 0 448m 1g 95m 0 twitter:0.0% 0 0|0 0|0 31b 1k 2 10:08:49
0 0 0 0 0 0 0 448m 1g 95m 0 twitter:0.0% 0 0|0 0|0 31b 1k 2 10:08:51
0 0 0 0 0 0 0 448m 1g 95m 0 twitter:0.0% 0 0|0 0|0 31b 1k 2 10:08:53
0 0 0 0 0 0 0 448m 1g 95m 0 twitter:0.0% 0 0|0 0|0 31b 1k 2 10:08:55
^C
mongotop...
~/mongo-training/data $ ../bin/mongotop --port 30000
connected to: 127.0.0.1:30000
ns total read write 2012-09-28T16:10:17
twitter.system.namespaces 0ms 0ms 0ms
twitter.system.indexes 0ms 0ms 0ms
training.system.namespaces 0ms 0ms 0ms
training.system.indexes 0ms 0ms 0ms
test.system.namespaces 0ms 0ms 0ms
test.system.indexes 0ms 0ms 0ms
^C
> use test switched to db test > show collections exercise foo people system.indexes > db.setProfilingLevel( 1, 250 ) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.foo.insert( { "party":"pooper" } ) > db.foo.find().pretty() { "_id" : ObjectId("506365b29cf47e02347c32d0"), "a" : 100, "b" : 200 } { "_id" : ObjectId("506365b79cf47e02347c32d1"), "a" : 50, "b" : 200 } { "_id" : ObjectId("5063674f9cf47e02347c32d2"), "a" : 100 } { "_id" : ObjectId("5065ce8913c1801431eb47fe"), "party" : "pooper" }
A shard consists of a replica set (or even just a node, but that's not reliable). Shards are based on collections since they are formulated using a shard key, taken as one or more fields, and therefore, of a single collection.
See http://www.mongodb.org/display/DOCS/Sharding+Introduction.
If you wish a replica set to be a shard, you add an option. See page 31, second line:
$ mongod --replSet ... --shardsvr
Sharding makes use of:
* Note, however, that this makes problems a lot harder to debug. Do this
for performance, but not in developer or QA environments where you would be better
off just using a single mongos.
** In developer and QA environments, it's unnecessary to erect more than
one configuration server.
The mongos provides an interface for applications to interact with sharded clusters hiding the complexity of data partitioning. It receives queries from the applications and uses metadata from the configuration server to route queries to appropriate mongod instances (on the various replica nodes). From the application's perspective, mongos behaves identically to a mongod, but it is, of course, more of a router to one or more mongods.
A mongos instance is lightweight and doesn't require a data directory. It can be run on an application server or even a server running a mongod process. By default, it runs on port 27017.
The mongos binary consists of a balancer and a router. Everything else is mongod. The configuration servers are mongod binaries (as shown on this page).
shard 1 | shard 2 | |
documents | 1-5 | 51-100 |
s1a | s2a | |
s1b | s2b | |
s1c | s2c |
Note that a document could switch shards. Sharding doesn't work at the document level, but at the chunk level.
Shard 1, documents 1-50, contains 10 chunks (example hypothesized); shard 2, documents 51-100 in twenty chunks (but was originally 10). The balancer may ask a shard how many chunks. If the difference is greater than 8, then some chunks will be moved from the larger shard to the smaller one. After chunk migration, the migrated chunks exist in both shards. The map is updated at the configuration servers, then the unneeded chunks are removed from the shard giving them up. Even if the system went down before the delete happened, the shard will no longer respond to requests for the migrated chunks.
Chunks are (up to) 64Mb; they get split as soon as they get bigger.
MongoDB 2.2 has "tagged" sharding that obviates maintaining our continent distinction. If you wish to shard such that all countries' documents end up in a shard whose primary is located in a data center on a particular continent, you'd need to tag them (since using the ISO country code isn't going to result naturally in all of Europe ending upon in the EMEA data center). A solution might be to create a continent code to go along with the country code, but with tagging, that's not necessary.
Run the mongos binary on each application server.
A mongod, called a configuration (or config) server, maintains shared-cluster metadata in a configuration database for the mongos instances. A sharded cluster operates with a group of three configuration servers that use a two-phase commit process ensuring immediate consistency and reliability. For testing, it's possible to deploy with a single configuration server, but this isn't recommended for production as the mongod would be a single point of failure.
Each configuration server instance can run alongside the usual mongod instance of a common replica node.
Mongo only talks to C1 (of the three configuration servers shown). When that's gone, it talks to C2, etc.
If the query contains the shard key, it will go directly to the shard containing it. See this presentation:
http://www.slideshare.net/fullscreen/TylerBrock/sharding-mongo-dc-2012/71
By Shard Key Routed db.users.find( { email:"[email protected]" } );
How to choose shard key?
See bad example (in slide presentation).
{ node:1 } { node:1, time:1 }
mongos sorts everything in memory.
Try to make it so that writes should be disputed, but reads should go directly to the shard.
...how to set up MongoDB sharding. This assumes the presence of one (or more) existing replica sets, ready to be inducted into a shard. In this example of setting up an existing replica in one shard, our replica set is named "my-replica". Color here helps sort out port numbers, green for the configuration server port number we create here, blue for the mongos port number created here, and orange for any node of the existing replica set whose set-up is not shown here, but please see higher up in this document for how to do that.
$ mongod --configsvr --dbpath path --port port
...where
$ mongos --configdb hostname:port [, hostname:port, hostname:port ] \
--port port \
--logpath path
...where
Repeat this step for as many instances of mongos you need. As noted elsewhere, this would likely be a single one for testing, but perhaps one each on application nodes in production.
$ mongo --port port > sh.addShard( "my-replica/hostname:port" )
...where
Repeat this step for as many instances of mongos you created. Do not attach any replica set to more than one mongos.
> sh.enableSharding( "foodb" ) > sh.shardOn( "foodb.bar", { drinkid:1, _id } )
The previous set of instructions was for a simple, single shard and its replica set. That isn't much of an example because, while the single sharded replica set is a useful concept, real, industrial-strength sharding would involve multiple shards (and their replica sets). In general, follow the instructions above for detail that's not as unctuously developed in the following. These steps carry the same numbering as the previous ones.
# mongo --host hostname --port port
...where hostname is the hostname (or IP address) of the host running the mongos and port is the port number, possibly assigned in the mongos' configuration file, or on its command line at launch, on which the mongos is listening. (Remember, the default is 27017.)
> sh.addShard( "first-replica-set-name/hostname:port" ) > sh.addShard( "second-replica-set-name/hostname:port" )
For example, let's say that we launched our mongos dæmon thus:
# mongo --host 16.86.192.103 --port 47017
...and our replica sets were named "humpity" and "dumpity", we would add our replica sets in this way:
> sh.addShard( "humpity/16.86.192.103:47017" ) > sh.addShard( "dumpity/16.86.192.103:47017" )
It's much easier to do this stuff in configuration files than by hand, especially each time one, two or all of the elements of a complex cluster go down. These configuration files are required:
TTL - "time to live": "Any document that is 3 days old or older will be deleted."
In MongoDB parlance, a cluster is a sharded set-up.
Siddarth did the exercise, page 31 under OSX. There were two egregious typos, command-line argument order and use of an invalid port number (the highest possible port number is 65535).
$ mongos --configdb localhost:57017,localhost:57018,localhost:57019 --logpath "mongos-1.log"
--port 60000 --fork
Then use the shell to connect to mongos; again, the port number is bogus, use 60000.
Suggestion: specify smaller chunk size as 64Mb is too big for a simple test/exercise.
Fill the database collection with records before invoking db.adminCommand( { addshard:"s1/localhost:37017" } ):
for( i = 0; i < 10000; i++ ) { db.users.insert( { a:i } ) }
mongos> sh.shardCollection( "test.users", { a:1 } )
Have to create index...
mongos> db.ensureIndex( { a:1 } )
Stop the balancer...
> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
GridFS is a datastore for big files; can do store, retrieve, list and remove.
GridFS fs = new GridFS( ... );
In the database, the chunks collection is the file (in small segments) and the files collection is its metadata.
You see how many chunks:
> use test switched to db test > db.fs.chunks.count()mongofiles is a binary that uploads files to MongoDB.
Why would you want to store files in a database in the first place? There can be valid reasons for doing it, the best one is reducing complexity. However this comes at some expense, it might be worth it. Let's say you have 10Gb of data in files like PDFs, images, etc....
The major advantages to doing this are:
The major disadvantages are:
Unless really trying to keep complexity down or have some other special use case, store files like this on Amazon S3 or equivalent.
mongodump, mongorestore operate using BSON. mongoimport, mongoexport deal in JSON.
Creates dump subdirectory with one deeper subdirectory per database. In each, there are collection.bson, containing the actual data and collection.metadata.json, which is readable, and contains the metadata.
Document-level dump. Option --oplog will dump the oplog too. Other options --host host, --port port, etc.
Could instead us MongoDB fsync to lock out writes, then Unix rsync instead.
Really, though, rely on well formed and behaved replicas to ensure back-up.
$ mongod --auth $ mongo > use admin switched to db admin > db.addUser( 'sid', 'sid' ) { "user" : "sid", "readOnly" : false, "pwd" : "2cf3ac4bac67006e2ba5795b15b954bb", "_id" : ObjectId("5066116413c1801431eb4800") } > show dbs Fri Sep 28 15:07:45 uncaught exception: listDatabases failed:{ "errmsg" : "need to login", "ok" : 0 } > exit bye ~/mongo-training/data $ mongo --port 30000 -u sid -p sid admin MongoDB shell version: 2.2.0 connecting to: 127.0.0.1:30000/admin > show dbs admin 0.203125GB config 0.203125GB digg 0.203125GB local (empty) test 0.203125GB training 0.203125GB twitter 0.453125GB
Now we're here as admin user. Add another, but you must add it to a chosen database. Thereafter, you log in with that user (for example):
> use twitter > db.addUser( 'dan', 'dan', true ) { "user" : "dan", "readOnly" : true, "pwd" : "d41d78804bfdccc043d535435c39db94", "_id" : ObjectId("506612051107781f30ddf871") } > exit bye ~/mongo-training/data $ mongo --port 30000 -u dan -p dan twitter
See this stuff...
> db twitter > db.system.users.find() { "_id" : ObjectId("5066116413c1801431eb4800"), "user" : "sid", "readOnly" : false, "pwd" : "2cf3ac4bac67006e2ba5795b15b954bb" } { "_id" : ObjectId("506612051107781f30ddf871"), "user" : "dan", "readOnly" : true, "pwd" : "d41d78804bfdccc043d535435c39db94" }
For inter-shard communication, launch shards using
$ mongod ... --keyfile key.txt ...
key.txt contains password for the admin user; must be same across all shards.
http://www.10gen.com/presentations/mongosv-2011/mongodbs-new-aggregation-framework
http://docs.mongodb.org/manual/applications/aggregation*
http://api.mongodb.org/wiki/current/Aggregation.html
* The aggregation pipeline begins with the collection articles and selects the author and tags fields using the $project aggregation operator. The $unwind operator produces one output document per tag. Finally, the $group operator pivots these fields.