Transcription of Notes on MongoDB
A full-featured MongoDB sample covering embedded array functionality, written
from the CRUD point-of-view, may be found here .
Mongo is nicely accessed via Morphia , a sort-of
object-relational manager (ORM) that is very lightweight, for Mongo doesn't
need much of this.
Myriad miscellaneous notes
Some stuff is from me, from the course I took from 10-gen, from synthesizing
[email protected] or from ripping off voices like Scott
Hernandez, Jenna Deboisblanc and others directly.
You should quickly notice that MongoDB is manipulated using cute little
JavaScript commands. It's what's enclosed between { } .
The continuation character is the ellipsis: ... . You can just
keep typing, but it won't close unless and until you've completed the
JavaScript syntactically.
What are tables in relational databases are called "collections" in
MongoDB.
What are rows in relational databases tables are called "documents" in
MongoDB.
Lost? Want to say something like describe table
? Use
findOne()
. It will show you a random document in the collection.
To pretty-print a JSON, append forEach( printjson )
to a
query statement.
Datafiles in MongoDB start at 64Mb and double in size with each additional
datafile (made necessary by exhaustion of the original allocation), up to
2Gb. It's possible therefore for the allocated size to exceed actual data
Logging statements are piped to stdout by default. For more
verbose logging launch dæmon with increasing numbers of v s
(mongod -vvvvv
).
There is a configuration file that's very desirable for real Mongo use.
It's well worth looking into.
db.help();
can be useful in a pince
The best help, however, is
http://www.mongodb.org , use the
Search engine.
Collection stats. Referring to...
> db.runCommand( { collStats : "users", scale : 1024 * 1024 } )
storageSize is the amount of space allocated to the collection,
which is not the size or number of files in the system. Size refers to the
size of the collection. Neither of the two includes index size.
The data size is not related to the number of chunks. The compact()
command does not reduce the disk space of the collection, but rather
defragments and condenses the data on disk.
There is a ReST interface to MongoDB that can be turned on when launching
via the --rest option. For example, to get a list of
collections, the URL is
http://127.0.0.1:28017/test/system/namespaces/ .
How may databases should be created? In MongoDB version 2.2 (the next
stable release) there is database-level locking (currently there's a global
write lock), which may be a reason to create multiple databases.
How many shards should be created? This question depends upon use cases and
specific system metrics are necessary to provide specific advice. You need to
consider expected write volume as well as data size. If your data fits in RAM,
you'll be able to handle more inserts per second than if you're forced to go to
disk. It's probably a good idea to limit the data on a single node to 70% of
available disk space (filling the disk to 100% should be avoided at all costs).
The concept of configuration server in MongooDB relates to sharding
only and is several special mongod instances that maintain the sharded
cluster metadata in a configuration database. A sharded cluster operates with a
group of three configuration servers using a two-phease commit process to ensure
immediate consistency and reliability. (These mongod processes have
nothing to do with replication, replica sets or replica nodes.) See more about
this
here .
For sharding, see here
and here .
How many replicas should be maintained? Three nodes per a replica set is
recommended for data durability. If you expect a high number of reads (much
greater than writes), additional replicas will help to spread the load.
Additional replica members will increase fault tolerance, which may be important
for your application. In addition, you may find it beneficial to have a single
replica lag behind the rest as a way to jump back in time (if, for example, a
bug in your application corrupts the data on your primary).
The recommended minimum of three is comprised of two full and one arbiter, or
three full nodes.
Break up data across multiple databases in MongoDB? Breaking up your data
into separate databases makes the data more portable, makes it easier to
store the data on separate disks, and gives you the option of specifying
different authentication schemes or backup strategies. In MongoDB 2.2 (the
next stable release), write locks are distributed by database rather than a
global lock, which will improve concurrency.
Breaking your data into separate clusters affords the same or more
flexibility, but there's an overhead associated with creating a new cluster.
Each cluster should have its own replica set for data redundancy.
To embed or not to embed? That is, documents inside other documents such
as an array of addresses in a user account? Eliot Horowitz says:
One-to-many relationships may be good to embed.
If the possible list of entries embedded is unbounded and large
(thousands) then linking to documents in a separate collection might be
better (i.e.: _id s from collection "accounts" might be used in
collection "adresses" to bind address documents to accounts).
Benchmark a few different variants to see what performs best in your
case.
Modification of _id after established isn't tolerated:
> db.Accounts.update( { "name" : "Jimmy" },
... { "$set" : { "_id" : ObjectId( "000000000000000000000001" ) } } );
Mod on _id not allowed
But, you can force Mongo to use your value for _id :
> db.accounts.insert( { "_id" : ObjectId( "000000000000000000000001" ), "name" : "Jimmy" } );
> db.accounts.find( { "name" : "Jimmy" } );
{ "_id" : ObjectId("000000000000000000000001"), "name" : "Jimmy" }
MongoDB Java driver error codes. The driver throws different classes of
exceptions depending on the error's context, likely IOException or
one of:
Data integrity. MongoDB MMS can help in finding problems where corruption
stops certain nodes from responding in a timely fashion. It will e-mail you
immediately. Journaling helps plus the usual back-up system, taking snapshots on
a daily basis. See
http://mongodb.org/display/DOCS/Backups . See also
http://www.mongodb.org/display/DOCS/Durability+and+Repair , in a
replica set:
http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts ,
also this thread:
http://groups.google.com/forum/?fromgroups#!topic/mongodb-user/R3bB06Z0n-c
Limiting the database size. If this is important, try the trick
outlined here:
http://souptonuts.sourceforge.net/quota_tutorial.html .
It's possible to design one's schema using embedded documents, non-embedded
(i.e.: separate documents) or a bucket (hybrid) structure. There's an
excellent and short post about this
here .
Help mapping from SQL, there is an
SQL to Mongo Mapping Chart .
MongoDB service start (on Ubuntu). This is done:
$ service mongodb start
However, it may not "take" as you see if you look for the process. This is
because it got shut down badly and there is a lock file. Remove this lock
file thus:
$ rm /var/lib/mongodb/mongod.lock
Solution to getting MongoDB logging to come into our log files. This can
be had if using Slf4j. See
http://stackoverflow.com/questions/869945/how-to-send-java-util-logging-to-log4j .
To reach MongoDB via HTML, add 1000 to the port on which it's running.
If your local host is running Mongo, use http://localhost:28017 .
Some links require the ReST service to run, accomplish this by launching with
--rest .
Commercial, inter-node SSL support for MongoDB is had at
10gen Customer Downloads
and the price for this, very steep, can be seen in the "Enterprise" column
here .
Many and more great links...
Upstart
When you install MongoDB using the Debian package, it establishes itself as a
service via Upstart which isn't what you want if you're running the local
installation as a replica.
Whatever the reason for your interest in this matter, note that the script that
governs the Upstart nature of MongoDB is /etc/init/mongodb.conf ,
not to be confused with /etc/mongodb.conf , what's used to configure how
MongoDB starts (not that it starts at all and is stoppable, etc. which is what
the Upstart configuration file does).
# Ubuntu upstart file at /etc/init/mongodb.conf
limit nofile 20000 20000
kill timeout 300 # wait 300s between SIGTERM and SIGKILL.
pre-start script
mkdir -p /var/lib/mongodb/
mkdir -p /var/log/mongodb/
end script
start on runlevel [2345]
stop on runlevel [06]
script
ENABLE_MONGODB="yes"
if [ -f /etc/default/mongodb ]; then
. /etc/default/mongodb
fi
if [ "x$ENABLE_MONGODB" = "xyes" ]; then
exec start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /etc/mongodb.conf
fi
end script
A good link discussing this is
ubuntu: start(upstart) second instance of mongodb .
Quick Start
Installation
See http://www.javahotchocolate.com/tutorials/mongodb.html .
Start up the console...
...and look around including see what databases are available, switching focus
to a database (use ), examing a collection, forcing JSON output to be
formatted, etc. (Some vertical white space inserted for clarity.)
$ mongo
MongoDB shell version: 2.0.5
connecting to: test
> show dbs
accountmgrdb 0.203125GB
local (empty)
morphia_example 0.203125GB
my_database 0.203125GB
russ_trystuff_db 0.203125GB
test 0.203125GB
yourdb 0.203125GB
> use accountmgrdb
switched to db accountmgrdb
> show collections
Accounts
system.indexes
> db.Accounts.findOne();
{
"_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
"email" : "[email protected] ",
"password" : "passpass",
"firstname" : "René",
"lastname" : "de St. Exupéry",
"fullname" : "René de St. Exupéry",
"phone" : "33 (0) 3.29.90.66.65",
"mobile" : "33 (0) 3.29.90.66.65",
"fax" : "33 (0) 3.29.90.66.63"
}
> db.Accounts.find( { "firstname" : "René" } );
{ "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected] ", \
"password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
"fullname" : "René de St. Exupéry", \
"phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }
> db.Accounts.find( { "firstname" : "René" } ).forEach( printjson);
{
"_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
"email" : "[email protected] ",
"password" : "passpass",
"firstname" : "René",
"lastname" : "de St. Exupéry",
"fullname" : "René de St. Exupéry",
"phone" : "33 (0) 3.29.90.66.65",
"mobile" : "33 (0) 3.29.90.66.65",
"fax" : "33 (0) 3.29.90.66.63"
}
CRUD
Now let's have some real, useful fun...
Create
...a new user or two:
> db.Account.insert ( { "email":"[email protected] ", "password":"do 'em every time",
..."firstname":"Jack" } );
Now, to make certain the new account was added...
> db.Accounts.find ( { "firstname":"Jack" } ).forEach( printjson );
{
"_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
"email" : "[email protected] ",
"password" : "do 'em every time",
"firstname" : "Jack"
}
Let's add a second account for grins...
> db.Accounts.insert ( { "email":"[email protected] ", "password":"don't hurt me",
..."firstname":"Bea" } );
Read
...or locate stuff that might be in the database.
> db.Accounts.find ( { "firstname":"Jack" } ).forEach( printjson );
{
"_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
"email" : "[email protected] ",
"password" : "do 'em every time",
"firstname" : "Jack"
}
If you wish to show all accounts whose e-mail addresses end in ".uk" use a
regular expression! (Gotta love that, eh?)
> db.Accounts.find ( { "email": /[.]uk$/ } ).forEach( printjson );
{
"_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
"email" : "[email protected] .uk ",
"password" : "do 'em every time",
"firstname" : "Jack"
}
{
"_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
"email" : "beatrice.pansy@ladies-club.uk ",
"password" : "don't hurt me",
"firstname" : "Bea"
}
Find all documents:
> db.Accounts.find ( { } );
{ "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected] ", \
"password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
"fullname" : "René de St. Exupéry", \
"phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }
{ "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"), "email" : "[email protected] ", \
"password" : "do 'em every time", "firstname" : "Jack" }
{ "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected] ", \
"firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }
See field (i.e.: column in SQL) particulars only for query
results. This abbreviates the document returned to only those fields that
are to be used. This is vaguely reminiscent of SQL JOIN .
> db.Addresses.find ( { }, { "addresstype" : 1 } );
{ "_id" : ObjectId("4fc7a6b1e4b022644086cff6"), "addresstype" : 1 }
{ "_id" : ObjectId("4fc7c92be4b0cd36353c4a02"), "addresstype" : 2 }
Find document by subdocument.
Imagine a collection of documents each with a subdocument named data like:
> db.tuples.findOne ();
{
"_id" : ObjectId("502fb6a9674c381db9e9249a"),
"rats" : "large mice",
"x" : 1,
"data" : {
"this" : "uh-huh",
"that" : "oh-oh",
"other" : "poo-poo-pee-doo"
}
}
Search for such a document by matching exactly one or more tuples. Here are
two possible queries:
> db.tuples.find ( { "data.this":"uh-huh", "data.that":"oh-oh" } );
{ "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
"data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }
> db.tuples.find ( { $and : { "data.this":"uh-huh" }, { "data.that":"oh-oh" } ] } );
{ "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
"data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }
See list of query operators .
Update
Find Bea's document (record) and add in her last name. Then, find and display
the whole document.
> db.Accounts.update ( { "firstname":"Bea" }, { $set: { "lastname":"pansy" } } );
> db.Accounts.find ( { "firstname":"Bea" } ).forEach( printjson );
{
"_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
"email" : "[email protected] ",
"firstname" : "Bea",
"lastname" : "pansy",
"password" : "don't hurt me"
}
See list of update operators .
Delete
Remove a document from the collection. The empty command prompt caret shows
Mongo's answer to a failed query (nothing).
> db.Accounts.remove ( { "firstname":"Jack" } );
> db.Accounts.find ( { "firstname":"Jack" } ).forEach( printjson );
>
Query operators
MongoDB queries are clever in that they are more or less "query by example".
Along with $set , these are possible operators for doing update
operations:
$gt
$gte
$lt
$lte
$ne
$in
$nin
$mod
$regex
$options
$all
$size
$exists
$type
$not
$or
$nor
$elemMatch
$where
Update operators
Along with $set , these are possible operators for doing update
operations:
$unset
$inc
$push
$pushAll
$pull
$pullAll
$pop
$addToSet
$rename
$bit
Update terminology
Upsert means to create a document where none existed to be
updated (or merely update as instructed).
multiupdate s are updates fired on all documents that match the
query.
Deletion options
After using a database, here's how to drop a) the database, b) a collection,
c) a document (DELETE FROM Account WHERE...
).
> db.dropDatabase()
> db.Account.drop()
> db.Account.remove( { ... } )
sort()
How to sort query results: a) ascending order (1 ),
b) descending order (-1 ):
> db.Accounts.find().sort ( { "email": 1 } ).forEach( printjson );
{
"_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
"email" : "[email protected] ",
"firstname" : "Bea",
"lastname" : "pansy",
"password" : "don't hurt me"
}
{
"_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
"email" : "[email protected] ",
"password" : "do 'em every time",
"firstname" : "Jack"
}
{
"_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
"email" : "[email protected] ",
"password" : "passpass",
"firstname" : "René",
"lastname" : "de St. Exupéry",
"fullname" : "René de St. Exupéry",
"phone" : "33 (0) 3.29.90.66.65",
"mobile" : "33 (0) 3.29.90.66.65",
"fax" : "33 (0) 3.29.90.66.63"
}
> db.Accounts.find().sort ( { "email": -1 } ).forEach( printjson );
{
"_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
"email" : "[email protected] ",
"password" : "passpass",
"firstname" : "René",
"lastname" : "de St. Exupéry",
"fullname" : "René de St. Exupéry",
"phone" : "33 (0) 3.29.90.66.65",
"mobile" : "33 (0) 3.29.90.66.65",
"fax" : "33 (0) 3.29.90.66.63"
}
{
"_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
"email" : "[email protected] ",
"password" : "do 'em every time",
"firstname" : "Jack"
}
{
"_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
"email" : "[email protected] ",
"firstname" : "Bea",
"lastname" : "pansy",
"password" : "don't hurt me"
}
Indices (indexes)
Indexing is a way to improve performance when data are well known. When
encountering performance issues, poorly designed indices are usually to
blame.
Look at the output from explain()
.
> db.Accounts.find ( { "firstname":"Bea" } );
{ "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected] ", \
"firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }
> db.Accounts.find ( { "firstname":"Bea" } ).explain ();
{
"cursor" : "BasicCursor",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
[THIS IS A BAD EXAMPLE BECAUSE OUR DATA ARE NEITHER RICH NOR NUMEROUS.]
Create an index thus:
> db.Accounts.ensureIndex ( { "email": 1 } );
There are multikey indices for fields containing arrays; each entry in the
array appearing in the index, and compound indices where two fields are
indexed on. Often, to get the best performance, a compound index is
desirable, e.g.: querying the list of a user's tweets sorted by creation date.
The Java side
This is sort of thrown together, simplistic and tentative. I may come back to do
something a little better.
Download the MongoDB Java driver and Javadoc JARs from
here . Adjust the version as necessary.
The current Java driver (Javadoc) documentation is usually found at:
http://api.mongodb.org/java/current/ .
Here's our POJO...
package com.acme.accountmgr;
import org.bson.types.ObjectId;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
public class Account
{
ObjectId id;
String email;
String password;
String firstname;
String lastname;
String fullname;
public Account() { }
public Account ( DBObject bson )
{
BasicDBObject b = ( BasicDBObject ) bson;
this.id = ( ObjectId ) b.get( "id" );
this.email = ( String ) b.get( "email" );
this.password = ( String ) b.get( "password" );
this.firstname = ( String ) b.get( "firstname" );
this.lastname = ( String ) b.get( "lastname" );
this.fullname = ( String ) b.get( "fullname" );
}
public String getEmail() { return this.email; }
public void setEmail( String email ) { this.email = email; }
public String getPassword() { return this.password; }
public void setPassword( String password ) { this.password = password; }
public String getFirstname() { return this.firstname; }
public void setFirstname( String firstname ) { this.firstname = firstname; }
public String getLastname() { return this.lastname; }
public void setLastname( String lastname ) { this.lastname = lastname; }
public String getFullname() { return this.fullname; }
public void setFullname( String fullname ) { this.fullname = fullname; }
}
The notes earlier were all console work; the Java driver is available of course.
This code assumes that database we were playing with.
package com.acme.accountmgr;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.List;
import org.bson.types.ObjectId;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;
import com.acme.accountmgr.Account;
public class MongoDemo
{
public MongoDemo()
{
try
{
Mongo mongo = new Mongo( "localhost", 27017 );
DB db = mongo.getDB( "accountmgrdb" );
DBCollection account = db.getCollection( "Accounts" );
}
catch( UnknownHostException e )
{
log.error( "MongoDB host not found", e );
}
catch( MongoException e )
{
log.error( "Runtime error attempting MongoDB connection", e );
}
}
CRUD
Create
public void create( Account account )
{
BasicDBObject document = new BasicDBObject();
document.put( "email", account.getEmail() );
document.put( "password", account.getPassword() );
document.put( "firstname", account.getFirstname() );
document.put( "lastname", account.getLastname() );
document.put( "fullname", account.getFullname() );
this.account.insert( document );
}
Read
public List< Account > read( String property, String value )
{
BasicDBObject query = new BasicDBObject();
DBCursor cursor;
query.put( property, value );
cursor = collection.find( query );
while( cursor.hasNext() )
{
DBObject object = cursor.next();
list.add( new Account( object ) );
}
return list;
}
Update
public void update( Account existing, Account replacement )
{
BasicDBObject document = new BasicDBObject();
document.put( "_id", existing.getId() );
document.put( "email", ( replacement.getEmail() == null ) ? existing.getEmail() : replacement.getEmail() );
document.put( "password", ( replacement.getPassword() == null ) ? existing.getFullname() : replacement.getPassword() );
document.put( "firstname", ( replacement.getFirstname() == null ) ? existing.getEmail() : replacement.getFirstname() );
document.put( "lastname", ( replacement.getLastname() == null ) ? existing.getLastname() : replacement.getLastname() );
document.put( "fullname", ( replacement.getFullname() == null ) ? existing.getFirstname() : replacement.getFullname() );
collection.update( existing, replacement );
}
Delete
public void delete( Account account )
{
BasicDBObject delete = new BasicDBObject().append( "_id", account.getId() );
collection.remove( delete );
}
}
JARs
$or in Java
How to embed $or , etc. in queries. Here, we're looking for a
document in which a is either 10 or 5. First, we create the factors
with values 10 and 5. Then, we create an operation that will OR them. We get
a query ready.
Next, we add the factors one at a time to the OR operation. Then, we tuck them
into the query.
public boolean lookForTensOrFives( ObjectId oid )
{
BasicDBObject factor1 = new BasicDBObject();
BasicDBObject factor2 = new BasicDBObject();
BasicDBList or = new BasicDBList();
BasicDBObject query = new BasicDBObject();
factor1.put( "a", 10 );
factor2.put( "a", 5 );
or.add( factor1 );
or.add( factor2 );
query.put( "$or", or );
DBCursor cursor = col.find( query );
while( cursor.hasNext() )
{
DBObject found = cursor.next();
// as many times as we get here, 'found' is a document that matches!
}
}
ObjectId used as OIDs
When sorting out ObjectId s, between String and
ObjectId , try the following. The point is that if you don't
know if the thing coming in is a string or an oid, this helper will
ensure it's what Mongo wants (_id , etc.).
import org.bson.type.ObjectId;
...
ObjectId makeObjectId( Object id )
{
if( id instanceof String )
return new ObjectId( ( String ) id );
else if( id instanceof ObjectId )
return ( ObjectId ) id;
throw new RuntimeException( "Cannot convert " + id + " to an ObjectId" );
}
void method( String someOid, ObjectId anotherOid )
{
ObjectId oidA = makeObjectId( someOid );
ObjectId oidB = makeObjectId( anotherOid );
...
Common errors
Since JSON makes copious use of double-quoting and one sees double-quotes all
over the place, it's easy to get lulled into looking in the wrong place for a
failed query. For example, let's say you're representing some object type as an
integer, but it shows up in some method as a string (for whatever reason),
you may not notice that you have to pass it to BasicDBObject.put() as
an Integer when stepping through the debugger.
void method( String type )
{
BasicDBObject query = new BasicDBObject();
query.put( "type", Integer.parseInt( type ) );
DBCursor cursor = collection.find( query );
...
Schema solutions: arrays
What's done in a JOIN in SQL might be done in the same document in
MongoDB since the schema is so fluid. Here are various renderings of addresses
in a user account.
Array
This is probably what I'd prefer since I like to tout an addresstype .
{
"_id" : "4fc5520ae4b0aa302dd16e0c",
"email" : "[email protected] ",
"password" : "passpass",
"addresses" :
[
{
"addresstype" : 3,
"fullname" : "Yosemite Sam Tucker",
"street1" : "PO Box 32",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : false
},
{
"addresstype" : 1,
"fullname" : "Yosemite Sam Tucker",
"street1" : "1321 Hollywood Blvd",
"street2" : "(back lot)",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : false
}
]
}
Subdocument
This works only if our interface makes use of strings such as "homeaddresses",
"shippingaddresses", etc. It's nicer to look at in the Mongo console, but
presents maybe no other benefit since users won't have gazillions of addresses
anyway.
{
"_id" : "4fc5520ae4b0aa302dd16e0c",
"email" : "[email protected] ",
"password" : "passpass",
"homeaddresses" :
[
{
"fullname" : "Yosemite Sam Tucker",
"street1" : "1321 Hollywood Blvd",
"street2" : "(back lot)",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : true
}
]
"shippingaddresses" :
[
{
"fullname" : "Yosemite Sam Tucker",
"street1" : "PO Box 32",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : true
},
{
"fullname" : "Grandma Tucker",
"street1" : "2234 Cowtown Lane",
"city" : "Hastings",
"state" : "ne",
"country" : "us",
"postalcode" : "68901",
"isdefault" : false
},
]
}
Exploring $set updates
Here's some exploring of update . I used additional vertical space to
make things clearer. The two updates done here do different things. When
$set is used, it adds the new construct to what's there. When not,
it replaces all, but the _id .
> use funstuff
switched to db funstuff
> db.fun.insert( { _id : 123, "fun" : "things" } );
> db.fun.findOne()
{ "_id" : 123, "fun" : "things" }
> db.fun.update( { _id:123 }, { $set: { hello: "world" } } );
> db.fun.findOne()
{ "_id" : 123, "fun" : "things", "hello" : "world" }
> db.fun.remove( { _id : 123 } )
> db.fun.insert( { _id : 123, "fun" : "things" } );
> db.fun.update( { _id:123 }, { hello: "world" } );
> db.fun.findOne()
{ "_id" : 123, "hello" : "world" }
More exploring $set updates
Here'a rather more complex exploration with interleaved Java code (that,
at first at least, wasn't tested even for syntax).
// What's going on in Enchiladas...
> db.Enchiladas.findOne();
{
"_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
"email" : "[email protected] ",
"password" : "passpass",
"isdefault" : false
}
// Initialize 'sam' with the bucket that interests us.
> var sam = db.Enchiladas.findOne( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") } );
BasicDBObject query = new BasicDBObject();
query.put( "_id", new ObjectId( "4fccc8dde4b0d5de2eeab3c5" ) );
DBCursor cursor = collection.find( query );
DBObject sam = null;
while( cursor.hasNext() )
{
sam = cursor.next();
break;
}
// sam "points" at his account!
// Create a new field in document 'sam' to hold the address:
> sam.address = { "addresstype":2, "fullname":"Yosemite Sam Tucker", "street1":"1321 Hollywood Blvd", \
"street2":"(back lot)", "city":"Culver City", "state":"ca", "country":"us", "postalcode":"90211", \
"isdefault":false }
{
"addresstype" : 2,
"fullname" : "Yosemite Sam Tucker",
"street1" : "1321 Hollywood Blvd",
"street2" : "(back lot)",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : false
}
BasicDBObject address = new BasicDBObject();
address.put( "addresstype", 2 );
address.put( "fullname", "Yosemite Sam Tucker" );
address.put( "street1", "1321 Hollywood Blvd" );
address.put( "street2", "(back lot)" );
address.put( "city", "Culver City" );
address.put( "state", "ca" );
address.put( "country", "us" );
address.put( "postalcode", "90211" );
address.put( "isdefault", false );
// this will replace what's sam with what's address: we don't want that!
collection.update( sam, address );
// update fodder here (how to do "$set"...)
// the statements as if in Mongo console (JavaScript) are built progressively...
BasicDBObject newsam = new BasicDBObject().append( "address", address );
BasicDBObject augmented = new BasicDBObject().append( "$set", newsam );
collection.update( sam, augmented );
// Here's the update adding an address to Sam's bucket:
> db.Enchiladas.update( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") }, sam );
// Now, when we look at all we've got, we see different stuff in one bucket than in another.
> db.Enchiladas.find( { } ).forEach( printjson );
{
"_id" : ObjectId("4fcccadbe4b0d5de2eeab3c6"),
"email" : "[email protected] ",
"password" : "passpass",
"isdefault" : false
}
{
"_id" : ObjectId("4fcccb686ccd0f44d66a18a4"),
"email" : "poop.abc.com",
"ipaddress" : "192168.0.9",
"password" : "passpass"
}
{
"_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
"email" : "[email protected] ",
"password" : "passpass",
"isdefault" : false,
"address" : {
"addresstype" : 2,
"fullname" : "Yosemite Sam Tucker",
"street1" : "1321 Hollywood Blvd",
"street2" : "(back lot)",
"city" : "Culver City",
"state" : "ca",
"country" : "us",
"postalcode" : "90211",
"isdefault" : false
}
}
// find all the documents...
DBCursor cursor = collection.find( query );
while( cur.hasNext() )
System.out.println( cursor.next() );
Quick and dirty Mongo set-up code
I have project named TryIt in which I prototype things quickly if I
wish to experiment. Here's a class in it. It might be referenced from other
notes on this page.
package experiment;
import java.net.UnknownHostException;
import com.mongodb.DBCollection;
import com.mongodb.Mongo;
public class MongoSetup
{
Mongo mongo = null;
String database;
public MongoSetup()
{
setup();
}
public MongoSetup( String database )
{
setup();
this.database = database;
}
private void setup()
{
try
{
mongo = new Mongo();
}
catch( UnknownHostException e )
{
System.out.println( );
}
}
public String getDatabase() { return this.database; }
public void setDatabase( String database ) { this.database = database; }
public DBCollection getCollection( String collection )
{
return this.getCollection( this.database, collection );
}
public DBCollection getCollection( String database, String collection )
{
return mongo.getDB( database ).getCollection( collection );
}
}
Exploring arrays...
This example is from
MongoDB -> Home -> Drivers -> Java Language Center -> Java Types
public static void main( String[] args )
{
MongoSetup mongo = new MongoSetup( "funstuff" );
ArrayList< Serializable > x = new ArrayList< Serializable >();
x.add( 1 );
x.add( 2 );
x.add( new BasicDBObject( "foo", "bar" ) );
x.add( 4 );
BasicDBObject doc = new BasicDBObject( "odd-array", x );
DBCollection collection = mongo.getCollection( "array_demo" );
collection.insert( doc );
}
The Java snippet above created the Mongo console experience below.
> use funstuff
switched to db funstuff
> show collections
array_demo
system.indexes
> db.array_demo.findOne();
{
"_id" : ObjectId("4fce18d55a374a574039b45b"),
"odd-array" : [
1,
2,
{
"foo" : "bar"
},
4
]
}
What's going on?
A (Java-shabby) array is created for the purpose of demonstrating wild
arrays embedded in a Mongo document. A Mongo document is created and the
array embedded as odd-array before being inserted into the database
collection shown.
How to add an array to a MongoDB document in Java...
That is, a complex object array.
public class IdentityType
{
private String identity;
private String type;
public IdentityType() { }
public IdentityType( String identity, String type ) { this.identity = identity; this.type = type; }
public String getIdentity() { return identity; }
public void setIdentity( String identity ) { this.identity = identity; }
public String getType() { return type; }
public void setType( String type ) { this.type = type; }
public String toString()
{
StringBuilder sb = new StringBuilder();
sb.append( "{\n" );
sb.append( " identity: " + this.identity + "\n" );
sb.append( " type: " + this.type + "\n" );
sb.append( "\n}" );
return sb.toString();
}
}
This is the relevant POJO code:
private List< IdentityType > idtypes = new ArrayList< IdentityType >();
public List< IdentityType > getIdtypes() { return this.idtypes; }
public void setIdtypes( List< IdentityType > types ) { this.idtypes = types; }
public void addIdtype( IdentityType type ) { this.idtypes.add( type ); }
Here's the trip from POJO to MongoDB document:
public DBObject getBsonFromPojo()
{
if( getIdtypes().size() > 0 )
{
List< BasicDBObject > list = new ArrayList< BasicDBObject >();
for( IdentityType idt : getIdtypes() )
{
BasicDBObject idtype = new BasicDBObject();
idtype.put( "identity", idt.getIdentity() );
idtype.put( "type", idt.getType() );
list.add( idtype );
}
document.put( "idtypes", list );
}
return document;
}
It's an easier trip back from MongoDB to POJO:
public void makePojoFromBson( DBObject bson )
{
BasicDBObject b = ( BasicDBObject ) bson;
...
setIdtypes( ( List< IdentityType > ) b.get( "idtypes" ) );
}
The $ (positional) operator for updating array elements
The $ (dollar sign) can be used to represent the position of the
matched array item in the query, or first half of an update operation.
Imagine a document like (there happens to be only one in this collection):
> db.accounts.findOne();
{
"_id" : ObjectId("4fcf8b055a3770c10a741edb"),
"addresses" : [
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
"type" : 1,
"street" : "123 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
},
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
"type" : 3,
"street" : "789 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
}
],
"name" : "Jack"
}
You wish to re-type the second of the two addresses from 3 to
2 . First, create a query that will identify that address.
> db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
{
"_id" : ObjectId("4fcf8b055a3770c10a741edb"),
"addresses" : [
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
"type" : 1,
"street" : "123 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
},
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
"type" : 3 ,
"street" : "789 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
}
],
"name" : "Jack"
}
With the right address getting isolated, you can now use the positional
operator to set its type field to 2 .
> db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
... { "$set" : { "addresses.$.type" : 2 } } );
That did it. You can now reuse the query to determine that it actually
happened.
> db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
{
"_id" : ObjectId("4fcf8b055a3770c10a741edb"),
"addresses" : [
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
"type" : 1,
"street" : "123 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
},
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
"city" : "Bedford Falls",
"state" : "NJ",
"street" : "789 My Street",
"type" : 2
}
],
"name" : "Jack"
}
To change two or more fields, you only add more comma-separated tuples
to the $set :
> db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
... { "$set" : { "addresses.$.type" : 3, "addresses.$.city" : "Potterville" } } );
> db.accounts.findOne();
{
"_id" : ObjectId("4fcf8b055a3770c10a741edb"),
"addresses" : [
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
"type" : 1,
"street" : "123 My Street",
"city" : "Bedford Falls",
"state" : "NJ"
},
{
"_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
"city" : "Potterville" ,
"state" : "NJ",
"street" : "789 My Street",
"type" : 3
}
],
"city" : "Potterville",
"name" : "Jack"
}
In Java...
In Java, some of this above would be like this. Just as above, notice the
dots and dollar signs. Incidentally, if only one of these fields, say
street , were to change, the others just wouldn't be passed (as is
obvious from checking to see if something's in there in the first place).
private static void update( ObjectId accountoid, Address address )
{
BasicDBObject match = new BasicDBObject();
match.put( "_id", accountoid );
match.put( "addresses.oid", address.getOid() );
BasicDBObject addressSpec = new BasicDBObject();
Integer type = address.getType();
String temp;
if( ( type = address.getType() ) != null )
addressSpec.put( "addresses.$.type", type );
if( ( temp = address.getStreet() ) != null )
addressSpec.put( "addresses.$.street", temp );
if( ( temp = address.getCity() ) != null )
addressSpec.put( "addresses.$.city", temp );
if( ( temp = address.getState() ) != null )
addressSpec.put( "addresses.$.state", temp );
BasicDBObject update = new BasicDBObject();
update.put( "$set", addressSpec );
collection.update( match, update );
}
MongoDB semantics
Here is some semantic fall-out from MongoDB terminology and things we say about
MongoDB.
Sharding —Where data is split between more than one replica set.
What is in one shard isn't in another. Sharding in MongoDB must be carefully
configured, it doesn't come for free, you must do a lot of extra work to achieve
it. Among other reasons to shard, sharding can be used to solve issues of
geographic collation of data and scaling of that data.
MongoDB configuration server —This is a special instance of the
mongod dæmon that maintains shared-cluster metadata to give to
instances of mongos . It's the "how-to" section of the sharding
mongos brain. There should be three of these since the MongoDB is dead
in the water without at least one in good health. A configuration server (also
called a "config server") can only mean sharding.
Replica set —A collection of replica nodes. A MongoDB shard must
have one of these, but a replica set doesn't need to be in a shard to stand on
its own. A replica set ensures that data is written to more than one node
(place)—effectively duplicating it or better. Note that as soon as you say
multiple replica sets, you are necessarily referring to a sharded configuration.
Replica node —A single instance of the mongod dæmon
running usually alone on a VM or host.
mongod —This is the basic MongoDB dæmon. In a sense,
it just i MongoDB.
mongos —This is a special dæmon that connects an
application to a MongoDB sharding set-up and controls reading and writing to the
appropriate shard for the data concerned. It uses information from a special
mongod erected as a MongoDB configuration server. mongos can
only mean sharding.
WriteConcern
See
http://www.littlelostmanuals.com/2011/11/overview-of-basic-mongodb-java-write.html .
Explore also "MongoDB tagging."
A better much later treatment exists as a subsection on write concerns to my
MongoDB Error-handling Notes .
(no write-concern arguement)
Writes to driver which must send potentially over wire to reach mongod .
WriteConcern.SAFE
Returns after operation known to have reached mongod .
WriteConcern.JOURNAL_SAFE
Returns after operation known to have reached mongod and
written to its journal.
WriteConcern.MAJORITY
Like SAFE , but returns after operation has been written to a
simple majority of nodes in the replica set.
WriteConcern.FSYNC_SAFE
Returns after operation has been written to the server data file.
Persist new records like user accounts, addresses and payment methods,
etc. with
collection.save( account, WriteConcern.FSYNC_SAFE );
--takes a comparatively long time.
Persist updates to addresses and payment methods with
collection.save( new_address/new_payment, WriteConcern.FSYNC_SAFE );
--because these are really new operations (the old one is "forgotten" and
left in place). Use
collection.merge( old_address/old_payment, WriteConcern.SAFE );
to update the old entity with the “forgotten” flag.
Voting to replace a primary...
With respect to a replica set in Mongo, if the primary and/or other nodes are
lost, you must have a "quorum" of voting nodes in order to elect a new primary
and to retain full transactional status, i.e.: reading and
writing. If you don't have a quorum, in many cases you can continue supporting
reads, but no writes.
A quorum (my terminology) is "at least 51% or more of the original number
of nodes in the replica set". 10gen doesn't use this obvious word, but they
should: in a voting body, a quorum is the smallest number of members
that can make a decision in the absence of others.
Also, a voting Mongo replica set quorum must consist of an odd number of members.
So, if we start with a primary and four secondaries, that makes 5. Lose the
primary and we have 4 left. That's an even number which doesn't work. We would
need an arbiter too, to break the tie. I don't think arbiters count as members
(for calculating quorums), but when voting, an arbiter does count as a member.
So, an arbiter should be added, I think.
Note: There is nothing wrong with arbiters; they're practically free being
only mongod s requiring virtually no disk and precious little memory.
In a second case, if we started with a primary and three secondaries, that would
make 4 total. Lose the primary and there are 3 voting members; I think that
might be enough to elect a new primary.
Locking in MongoDB...
...is done at the database level beginning in 2.2. Someday, it's slated to be
more granular still, at the collection level.
In MongoDB locks aren't really locks in the RDBMS sense, but more mutexes that a
process takes while in a critical section of work being done.
A lock isn't held across multiple documents (rows) as it would be in RDBMS; the
duration of the lock is measured in microseconds.
Coming from RDBMS, one shouldn't expect that locks will be a limiting factor in
MongoDB because locks can be used tens of thousands of times per second for
writes (and reads).
Colorizing the MongoDB interactive shell...
You know that you can configure a few things using what are called "rc" files,
typically kept in your home folder. You've seen them:
.exrc, .vimrc, .bashrc
etc.
So, you shouldn't be surprised that someone came up with a very fun and useful
way of injecting color into your MongoDB interactive shell session. Enter
Tyler Brock who replaces the (for now at least) zero-length .mongorc.js
file with his own. You do have to be running, at very oldest, MongoDB 2.2.x.
You can git (pun intended) his stuff and set it up for the next time you run
the MongoDB shell by doing the following:
Pick a subdirectory where you'd like to drop his stuff. You'll be
updating it, if there's ever need, the same way you'd ever update sources
controlled by Git. I put mine under ~/dev .
Do this:
$ cd ~/dev
$ git clone [email protected] :TylerBrock/mongo-hacker.git
$ cd ~
$ ln -s ~/dev/mongo-hacker/mongo_hacker.js .mongorc.js
Then just launch (relaunch) MongoDB to see color when you do stuff:
$ mongo
If ever there's reason to do it, update mongodb-hacker:
$ cd ~/dev/mongo-hacker
$ git pull origin master
If you decide this is evil and you no longer want to be part of it, do this:
$ rm -rf ~/dev/mongo-hacker
$ rm ~/.mongorc.js
Enjoy the ride. Here's something you might see. I don't care for all the colors,
but there's probably a way to change that. I also don't like the long prompt
he's added; I'll definitely smoke that. (Just edit mongo_hacker.js , look
for "prompt" and comment out the whole paragraph of code.)
Benchmarking MongoDB...
An example.
package com.mongodb;
import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.MongoClient;
import java.net.UnknownHostException;
public class PerfTest
{
public static void main( String[] args ) throws UnknownHostException
{
MongoClient m = new MongoClient();
DBCollection c = m.getDB( "test" ).getCollection( "PerfTest" );
/* Add this in to insert 500 documents before running the test:
* c.drop();
* for( int i = 0; i < 500; i++ )
* c.insert( new BasicDBObject( "_id", i ) );
*/
c.findOne();
DBCursor cursor = c.find();
long startTime = System.nanoTime();
try
{
while( cursor.hasNext() )
cursor.next();
}
finally
{
cursor.close();
}
long estimatedTime = System.nanoTime() - startTime;
double seconds = ( double ) estimatedTime / 1000000000.0;
System.out.println( "Done in " + seconds );
}
}
MongoDB 2.6 webinar notes
Index maintenance
Inconvenient to add new indices to existing collections, especially if big.
Now possible to add it in the background.
Auto-cancelation of operations by posting a maximum time in milliseconds for
any operation, granular.
Write commands delivered to the server (inserts, updates and deletes). All
operations now deliverable in bulk, by some order, etc. Enables asynchronous
communication with server.
Power of 2 allocation enabled by default resulting easier predictability of
storage requirements.
Developers
Improvements to query system.
Index intersection, query introspection.
Integrated text search. Beta in 2.4, released in 2.6 and integrated.
New update operators $multiply, $min, $max. Now testable and extensible to add
these whereas in the past it was very hard.
Aggregation pipeline enhancements, since 2.4, but in 2.6 unlocks large data
sets. (unlimited result set size vs. 16Mb)
Enterprise security
Authentication:
Kerberos (2.4), LDAP (2.6), x.509 (2.6).
Authorization:
User-defined roles for DBs and collections.
Encryption:
Mixed-mode SSL. Obfuscation: Field-level redaction via aggregation framework.
Auditing:
Trails can be written to separate file or system log.
MMS
Monitoring, of course.
Back-up