Transcription of Notes on MongoDB

A full-featured MongoDB sample covering embedded array functionality, written from the CRUD point-of-view, may be found here. Mongo is nicely accessed via Morphia, a sort-of object-relational manager (ORM) that is very lightweight, for Mongo doesn't need much of this.

Myriad miscellaneous notes

Some stuff is from me, from the course I took from 10-gen, from synthesizing [email protected] or from ripping off voices like Scott Hernandez, Jenna Deboisblanc and others directly.

You should quickly notice that MongoDB is manipulated using cute little JavaScript commands. It's what's enclosed between { }.

The continuation character is the ellipsis: ... . You can just keep typing, but it won't close unless and until you've completed the JavaScript syntactically.

What are tables in relational databases are called "collections" in MongoDB.

What are rows in relational databases tables are called "documents" in MongoDB.

Lost? Want to say something like describe table? Use findOne(). It will show you a random document in the collection.

To pretty-print a JSON, append forEach( printjson ) to a query statement.

Datafiles in MongoDB start at 64Mb and double in size with each additional datafile (made necessary by exhaustion of the original allocation), up to 2Gb. It's possible therefore for the allocated size to exceed actual data

Logging statements are piped to stdout by default. For more verbose logging launch dæmon with increasing numbers of vs (mongod -vvvvv).

There is a configuration file that's very desirable for real Mongo use. It's well worth looking into.

db.help(); can be useful in a pince

The best help, however, is http://www.mongodb.org, use the Search engine.
Collection stats. Referring to...
```
    > db.runCommand( { collStats : "users", scale : 1024 * 1024 } )
    
```
storageSize is the amount of space allocated to the collection, which is not the size or number of files in the system. Size refers to the size of the collection. Neither of the two includes index size.

The data size is not related to the number of chunks. The compact() command does not reduce the disk space of the collection, but rather defragments and condenses the data on disk.
There is a ReST interface to MongoDB that can be turned on when launching via the --rest option. For example, to get a list of collections, the URL is http://127.0.0.1:28017/test/system/namespaces/.
How may databases should be created? In MongoDB version 2.2 (the next stable release) there is database-level locking (currently there's a global write lock), which may be a reason to create multiple databases.
How many shards should be created? This question depends upon use cases and specific system metrics are necessary to provide specific advice. You need to consider expected write volume as well as data size. If your data fits in RAM, you'll be able to handle more inserts per second than if you're forced to go to disk. It's probably a good idea to limit the data on a single node to 70% of available disk space (filling the disk to 100% should be avoided at all costs).
The concept of configuration server in MongooDB relates to sharding only and is several special mongod instances that maintain the sharded cluster metadata in a configuration database. A sharded cluster operates with a group of three configuration servers using a two-phease commit process to ensure immediate consistency and reliability. (These mongod processes have nothing to do with replication, replica sets or replica nodes.) See more about this here.
For sharding, see here and here.
How many replicas should be maintained? Three nodes per a replica set is recommended for data durability. If you expect a high number of reads (much greater than writes), additional replicas will help to spread the load. Additional replica members will increase fault tolerance, which may be important for your application. In addition, you may find it beneficial to have a single replica lag behind the rest as a way to jump back in time (if, for example, a bug in your application corrupts the data on your primary).

The recommended minimum of three is comprised of two full and one arbiter, or three full nodes.
Break up data across multiple databases in MongoDB? Breaking up your data into separate databases makes the data more portable, makes it easier to store the data on separate disks, and gives you the option of specifying different authentication schemes or backup strategies. In MongoDB 2.2 (the next stable release), write locks are distributed by database rather than a global lock, which will improve concurrency.

Breaking your data into separate clusters affords the same or more flexibility, but there's an overhead associated with creating a new cluster. Each cluster should have its own replica set for data redundancy.
To embed or not to embed? That is, documents inside other documents such as an array of addresses in a user account? Eliot Horowitz says:
- One-to-many relationships may be good to embed.
- If the possible list of entries embedded is unbounded and large (thousands) then linking to documents in a separate collection might be better (i.e.: _ids from collection "accounts" might be used in collection "adresses" to bind address documents to accounts).
- Benchmark a few different variants to see what performs best in your case.

Modification of _id after established isn't tolerated:

    > db.Accounts.update( { "name" : "Jimmy" },
    ... { "$set" : { "_id" : ObjectId( "000000000000000000000001" ) } } );
    Mod on _id not allowed

But, you can force Mongo to use your value for _id:

    > db.accounts.insert( { "_id" : ObjectId( "000000000000000000000001" ), "name" : "Jimmy" } );
    > db.accounts.find( { "name" : "Jimmy" } );
    { "_id" : ObjectId("000000000000000000000001"), "name" : "Jimmy" }

MongoDB Java driver error codes. The driver throws different classes of exceptions depending on the error's context, likely IOException or one of:

Data integrity. MongoDB MMS can help in finding problems where corruption stops certain nodes from responding in a timely fashion. It will e-mail you immediately. Journaling helps plus the usual back-up system, taking snapshots on a daily basis. See http://mongodb.org/display/DOCS/Backups. See also http://www.mongodb.org/display/DOCS/Durability+and+Repair, in a replica set: http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts, also this thread: http://groups.google.com/forum/?fromgroups#!topic/mongodb-user/R3bB06Z0n-c

Limiting the database size. If this is important, try the trick outlined here: http://souptonuts.sourceforge.net/quota_tutorial.html.

It's possible to design one's schema using embedded documents, non-embedded (i.e.: separate documents) or a bucket (hybrid) structure. There's an excellent and short post about this here.

Help mapping from SQL, there is an SQL to Mongo Mapping Chart.

MongoDB service start (on Ubuntu). This is done:

    $ service mongodb start

However, it may not "take" as you see if you look for the process. This is because it got shut down badly and there is a lock file. Remove this lock file thus:

    $ rm /var/lib/mongodb/mongod.lock

Solution to getting MongoDB logging to come into our log files. This can be had if using Slf4j. See http://stackoverflow.com/questions/869945/how-to-send-java-util-logging-to-log4j.

To reach MongoDB via HTML, add 1000 to the port on which it's running. If your local host is running Mongo, use http://localhost:28017. Some links require the ReST service to run, accomplish this by launching with --rest.

Commercial, inter-node SSL support for MongoDB is had at 10gen Customer Downloads and the price for this, very steep, can be seen in the "Enterprise" column here.

Many and more great links...

MongoDB basics for everyone! (a six-part series, pretty good).
Vimeo presentation on Mongo Java driver.
MongoDB Gotchas & How to Avoid Them.
Java Development with MongoDB, a good slide presentation.
Super article on MongoDB replica lag: MongoDB: Replication Lag and the Facts of Life.
Fogotten admin password.
6 Rules of Thumb for MongoDB Schema Design
How to Implement Robust and Scalable Transactions across Documents with MongoDB

Upstart

When you install MongoDB using the Debian package, it establishes itself as a service via Upstart which isn't what you want if you're running the local installation as a replica.

Whatever the reason for your interest in this matter, note that the script that governs the Upstart nature of MongoDB is /etc/init/mongodb.conf, not to be confused with /etc/mongodb.conf, what's used to configure how MongoDB starts (not that it starts at all and is stoppable, etc. which is what the Upstart configuration file does).

    # Ubuntu upstart file at /etc/init/mongodb.conf

    limit nofile 20000 20000

    kill timeout 300 # wait 300s between SIGTERM and SIGKILL.

    pre-start script
      mkdir -p /var/lib/mongodb/
      mkdir -p /var/log/mongodb/
    end script

    start on runlevel [2345]
    stop on runlevel [06]

    script
      ENABLE_MONGODB="yes"
      if [ -f /etc/default/mongodb ]; then
        . /etc/default/mongodb
      fi
      if [ "x$ENABLE_MONGODB" = "xyes" ]; then
        exec start-stop-daemon --start --quiet --chuid mongodb --exec  /usr/bin/mongod -- --config /etc/mongodb.conf
      fi
    end script

A good link discussing this is ubuntu: start(upstart) second instance of mongodb.

Quick Start

Installation

See http://www.javahotchocolate.com/tutorials/mongodb.html.

Start up the console...

...and look around including see what databases are available, switching focus to a database (use), examing a collection, forcing JSON output to be formatted, etc. (Some vertical white space inserted for clarity.)

    $ mongo
    MongoDB shell version: 2.0.5
    connecting to: test

    > show dbs
    accountmgrdb      0.203125GB
    local (empty)
    morphia_example   0.203125GB
    my_database       0.203125GB
    russ_trystuff_db  0.203125GB
    test              0.203125GB
    yourdb            0.203125GB

    > use accountmgrdb
    switched to db accountmgrdb

    > show collections
    Accounts
    system.indexes

    > db.Accounts.findOne();
    {
        "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
        "email" : "[email protected]",
        "password" : "passpass",
        "firstname" : "René",
        "lastname" : "de St. Exupéry",
        "fullname" : "René de St. Exupéry",
        "phone" : "33 (0) 3.29.90.66.65",
        "mobile" : "33 (0) 3.29.90.66.65",
        "fax" : "33 (0) 3.29.90.66.63"
    }

    > db.Accounts.find( { "firstname" : "René" } );
    { "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected]", \
        "password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
        "fullname" : "René de St. Exupéry", \
        "phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }

    > db.Accounts.find( { "firstname" : "René" } ).forEach( printjson);
    {
        "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
        "email" : "[email protected]",
        "password" : "passpass",
        "firstname" : "René",
        "lastname" : "de St. Exupéry",
        "fullname" : "René de St. Exupéry",
        "phone" : "33 (0) 3.29.90.66.65",
        "mobile" : "33 (0) 3.29.90.66.65",
        "fax" : "33 (0) 3.29.90.66.63"
    }

CRUD

Now let's have some real, useful fun...

Create

...a new user or two:

    > db.Account.insert( { "email":"[email protected]", "password":"do 'em every time",
    ..."firstname":"Jack" } );

Now, to make certain the new account was added...

    > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
        "email" : "[email protected]",
        "password" : "do 'em every time",
        "firstname" : "Jack"
    }

Let's add a second account for grins...

    > db.Accounts.insert( { "email":"[email protected]", "password":"don't hurt me",
    ..."firstname":"Bea" } );

Read

...or locate stuff that might be in the database.

    > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
        "email" : "[email protected]",
        "password" : "do 'em every time",
        "firstname" : "Jack"
    }

If you wish to show all accounts whose e-mail addresses end in ".uk" use a regular expression! (Gotta love that, eh?)

    > db.Accounts.find( { "email": /[.]uk$/ } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
        "email" : "[email protected].uk",
        "password" : "do 'em every time",
        "firstname" : "Jack"
    }
    {
        "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
        "email" : "beatrice.pansy@ladies-club.uk",
        "password" : "don't hurt me",
        "firstname" : "Bea"
    }

Find all documents:

    > db.Accounts.find( { } );
    { "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected]", \
        "password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
        "fullname" : "René de St. Exupéry", \
        "phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }
    { "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"), "email" : "[email protected]", \
        "password" : "do 'em every time", "firstname" : "Jack" }
    { "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected]", \
        "firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }

See field (i.e.: column in SQL) particulars only for query results. This abbreviates the document returned to only those fields that are to be used. This is vaguely reminiscent of SQL JOIN.

    > db.Addresses.find( { }, { "addresstype" : 1 } );
    { "_id" : ObjectId("4fc7a6b1e4b022644086cff6"), "addresstype" : 1 }
    { "_id" : ObjectId("4fc7c92be4b0cd36353c4a02"), "addresstype" : 2 }

Find document by subdocument.

Imagine a collection of documents each with a subdocument named data like:

    > db.tuples.findOne();
    {
        "_id" : ObjectId("502fb6a9674c381db9e9249a"),
        "rats" : "large mice",
        "x" : 1,
        "data" : {
            "this" : "uh-huh",
            "that" : "oh-oh",
            "other" : "poo-poo-pee-doo"
        }
    }

Search for such a document by matching exactly one or more tuples. Here are two possible queries:

    > db.tuples.find( { "data.this":"uh-huh", "data.that":"oh-oh" } );
    { "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
        "data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }
    > db.tuples.find( { $and : { "data.this":"uh-huh" }, { "data.that":"oh-oh" } ] } );
    { "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
        "data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }

See list of query operators.

Update

Find Bea's document (record) and add in her last name. Then, find and display the whole document.

    > db.Accounts.update( { "firstname":"Bea" }, { $set: { "lastname":"pansy" } } );
    > db.Accounts.find( { "firstname":"Bea" } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
        "email" : "[email protected]",
        "firstname" : "Bea",
        "lastname" : "pansy",
        "password" : "don't hurt me"
    }

See list of update operators.

Delete

Remove a document from the collection. The empty command prompt caret shows Mongo's answer to a failed query (nothing).

    > db.Accounts.remove( { "firstname":"Jack" } );
    > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
    >

Query operators

MongoDB queries are clever in that they are more or less "query by example".

Along with $set, these are possible operators for doing update operations:

$gt
$gte
$lt
$lte
$ne
$in
$nin
$mod
$regex
$options
$all
$size
$exists
$type
$not
$or
$nor
$elemMatch
$where

Update operators

Along with $set, these are possible operators for doing update operations:

$unset
$inc
$push
$pushAll
$pull
$pullAll
$pop
$addToSet
$rename
$bit

Update terminology

Upsert means to create a document where none existed to be updated (or merely update as instructed).

multiupdates are updates fired on all documents that match the query.

Deletion options

After using a database, here's how to drop a) the database, b) a collection, c) a document (DELETE FROM Account WHERE...).

    > db.dropDatabase()
    > db.Account.drop()
    > db.Account.remove( { ... } )

sort()

How to sort query results: a) ascending order (1), b) descending order (-1):

    > db.Accounts.find().sort( { "email": 1 } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
        "email" : "[email protected]",
        "firstname" : "Bea",
        "lastname" : "pansy",
        "password" : "don't hurt me"
    }
    {
        "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
        "email" : "[email protected]",
        "password" : "do 'em every time",
        "firstname" : "Jack"
    }
    {
        "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
        "email" : "[email protected]",
        "password" : "passpass",
        "firstname" : "René",
        "lastname" : "de St. Exupéry",
        "fullname" : "René de St. Exupéry",
        "phone" : "33 (0) 3.29.90.66.65",
        "mobile" : "33 (0) 3.29.90.66.65",
        "fax" : "33 (0) 3.29.90.66.63"
    }
    > db.Accounts.find().sort( { "email": -1 } ).forEach( printjson );
    {
        "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
        "email" : "[email protected]",
        "password" : "passpass",
        "firstname" : "René",
        "lastname" : "de St. Exupéry",
        "fullname" : "René de St. Exupéry",
        "phone" : "33 (0) 3.29.90.66.65",
        "mobile" : "33 (0) 3.29.90.66.65",
        "fax" : "33 (0) 3.29.90.66.63"
    }
    {
        "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
        "email" : "[email protected]",
        "password" : "do 'em every time",
        "firstname" : "Jack"
    }
    {
        "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
        "email" : "[email protected]",
        "firstname" : "Bea",
        "lastname" : "pansy",
        "password" : "don't hurt me"
    }

Indices (indexes)

Indexing is a way to improve performance when data are well known. When encountering performance issues, poorly designed indices are usually to blame.

Look at the output from explain().

    > db.Accounts.find( { "firstname":"Bea" } );
    { "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected]", \
        "firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }
    > db.Accounts.find( { "firstname":"Bea" } ).explain();
    {
        "cursor" : "BasicCursor",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {
        }
    }

[THIS IS A BAD EXAMPLE BECAUSE OUR DATA ARE NEITHER RICH NOR NUMEROUS.]

Create an index thus:

    > db.Accounts.ensureIndex( { "email": 1 } );

There are multikey indices for fields containing arrays; each entry in the array appearing in the index, and compound indices where two fields are indexed on. Often, to get the best performance, a compound index is desirable, e.g.: querying the list of a user's tweets sorted by creation date.

The Java side

This is sort of thrown together, simplistic and tentative. I may come back to do something a little better.

Download the MongoDB Java driver and Javadoc JARs from here. Adjust the version as necessary.

The current Java driver (Javadoc) documentation is usually found at: http://api.mongodb.org/java/current/.

Here's our POJO...

package com.acme.accountmgr;

import org.bson.types.ObjectId;

import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;

public class Account
{
    ObjectId id;
    String   email;
    String   password;
    String   firstname;
    String   lastname;
    String   fullname;

    public Account() { }

    public Account ( DBObject bson )
    {
        BasicDBObject b = ( BasicDBObject ) bson;

        this.id        = ( ObjectId ) b.get( "id" );
        this.email     = ( String )   b.get( "email" );
        this.password  = ( String )   b.get( "password" );
        this.firstname = ( String )   b.get( "firstname" );
        this.lastname  = ( String )   b.get( "lastname" );
        this.fullname  = ( String )   b.get( "fullname" );
    }

    public String getEmail() { return this.email; }
    public void setEmail( String email ) { this.email = email; }

    public String getPassword() { return this.password; }
    public void setPassword( String password ) { this.password = password; }

    public String getFirstname() { return this.firstname; }
    public void setFirstname( String firstname ) { this.firstname = firstname; }

    public String getLastname() { return this.lastname; }
    public void setLastname( String lastname ) { this.lastname = lastname; }

    public String getFullname() { return this.fullname; }
    public void setFullname( String fullname ) { this.fullname = fullname; }
}

The notes earlier were all console work; the Java driver is available of course. This code assumes that database we were playing with.

package com.acme.accountmgr;

import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.List;

import org.bson.types.ObjectId;

import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;

import com.acme.accountmgr.Account;

public class MongoDemo
{
    public MongoDemo()
    {
        try
        {
            Mongo        mongo   = new Mongo( "localhost", 27017 );
            DB           db      = mongo.getDB( "accountmgrdb" );
            DBCollection account = db.getCollection( "Accounts" );
        }
        catch( UnknownHostException e )
        {
            log.error( "MongoDB host not found", e );
        }
        catch( MongoException e )
        {
            log.error( "Runtime error attempting MongoDB connection", e );
        }
    }

CRUD

Create

    public void create( Account account )
    {
        BasicDBObject document = new BasicDBObject();

        document.put( "email",     account.getEmail() );
        document.put( "password",  account.getPassword() );
        document.put( "firstname", account.getFirstname() );
        document.put( "lastname",  account.getLastname() );
        document.put( "fullname",  account.getFullname() );

        this.account.insert( document );
    }

Read

    public List< Account > read( String property, String value )
    {
        BasicDBObject query  = new BasicDBObject();
        DBCursor      cursor;

        query.put( property, value );

        cursor = collection.find( query );

        while( cursor.hasNext() )
        {
            DBObject object = cursor.next();
            list.add( new Account( object ) );
        }

        return list;
    }

Update

    public void update( Account existing, Account replacement )
    {
        BasicDBObject document = new BasicDBObject();

        document.put( "_id", existing.getId() );
        document.put( "email",     ( replacement.getEmail() == null )     ? existing.getEmail()     : replacement.getEmail() );
        document.put( "password",  ( replacement.getPassword() == null )  ? existing.getFullname()  : replacement.getPassword() );
        document.put( "firstname", ( replacement.getFirstname() == null ) ? existing.getEmail()     : replacement.getFirstname() );
        document.put( "lastname",  ( replacement.getLastname() == null )  ? existing.getLastname()  : replacement.getLastname() );
        document.put( "fullname",  ( replacement.getFullname() == null )  ? existing.getFirstname() : replacement.getFullname() );

        collection.update( existing, replacement );
    }

Delete

    public void delete( Account account )
    {
        BasicDBObject delete = new BasicDBObject().append( "_id", account.getId() );
        collection.remove( delete );
    }
}

JARs

mongo-2.7.3.jar

$or in Java

How to embed $or, etc. in queries. Here, we're looking for a document in which a is either 10 or 5. First, we create the factors with values 10 and 5. Then, we create an operation that will OR them. We get a query ready.

Next, we add the factors one at a time to the OR operation. Then, we tuck them into the query.

public boolean lookForTensOrFives( ObjectId oid )
{
    BasicDBObject factor1 = new BasicDBObject();
    BasicDBObject factor2 = new BasicDBObject();
    BasicDBList   or      = new BasicDBList();
    BasicDBObject query   = new BasicDBObject();

    factor1.put( "a", 10 );
    factor2.put( "a", 5 );

    or.add( factor1 );
    or.add( factor2 );

    query.put( "$or", or );

    DBCursor cursor = col.find( query );

    while( cursor.hasNext() )
    {
        DBObject found = cursor.next();
        // as many times as we get here, 'found' is a document that matches!
    }
}

ObjectId used as OIDs

When sorting out ObjectIds, between String and ObjectId, try the following. The point is that if you don't know if the thing coming in is a string or an oid, this helper will ensure it's what Mongo wants (_id, etc.).

    import org.bson.type.ObjectId;
    ...

        ObjectId makeObjectId( Object id )
        {
            if( id instanceof String )
                return new ObjectId( ( String ) id );
            else if( id instanceof ObjectId )
                return ( ObjectId ) id;

            throw new RuntimeException( "Cannot convert " + id + " to an ObjectId" );
        }

        void method( String someOid, ObjectId anotherOid )
        {
            ObjectId oidA = makeObjectId( someOid );
            ObjectId oidB = makeObjectId( anotherOid );
            ...

Common errors

Since JSON makes copious use of double-quoting and one sees double-quotes all over the place, it's easy to get lulled into looking in the wrong place for a failed query. For example, let's say you're representing some object type as an integer, but it shows up in some method as a string (for whatever reason), you may not notice that you have to pass it to BasicDBObject.put() as an Integer when stepping through the debugger.

    void method( String type )
    {
        BasicDBObject query = new BasicDBObject();

        query.put( "type", Integer.parseInt( type ) );

        DBCursor cursor = collection.find( query );
        ...

Schema solutions: arrays

What's done in a JOIN in SQL might be done in the same document in MongoDB since the schema is so fluid. Here are various renderings of addresses in a user account.

Array

This is probably what I'd prefer since I like to tout an addresstype.

    {
        "_id" : "4fc5520ae4b0aa302dd16e0c",
        "email" : "[email protected]",
        "password" : "passpass",
        "addresses" :
        [
            {
                "addresstype" : 3,
                "fullname" : "Yosemite Sam Tucker",
                "street1" : "PO Box 32",
                "city" : "Culver City",
                "state" : "ca",
                "country" : "us",
                "postalcode" : "90211",
                "isdefault" : false
            },
            {
                "addresstype" : 1,
                "fullname" : "Yosemite Sam Tucker",
                "street1" : "1321 Hollywood Blvd",
                "street2" : "(back lot)",
                "city" : "Culver City",
                "state" : "ca",
                "country" : "us",
                "postalcode" : "90211",
                "isdefault" : false
            }
        ]
    }

Subdocument

This works only if our interface makes use of strings such as "homeaddresses", "shippingaddresses", etc. It's nicer to look at in the Mongo console, but presents maybe no other benefit since users won't have gazillions of addresses anyway.

    {
        "_id" : "4fc5520ae4b0aa302dd16e0c",
        "email" : "[email protected]",
        "password" : "passpass",
        "homeaddresses" :
        [
            {
                "fullname" : "Yosemite Sam Tucker",
                "street1" : "1321 Hollywood Blvd",
                "street2" : "(back lot)",
                "city" : "Culver City",
                "state" : "ca",
                "country" : "us",
                "postalcode" : "90211",
                "isdefault" : true
            }
        ]
        "shippingaddresses" :
        [
            {
                "fullname" : "Yosemite Sam Tucker",
                "street1" : "PO Box 32",
                "city" : "Culver City",
                "state" : "ca",
                "country" : "us",
                "postalcode" : "90211",
                "isdefault" : true
            },
            {
                "fullname" : "Grandma Tucker",
                "street1" : "2234 Cowtown Lane",
                "city" : "Hastings",
                "state" : "ne",
                "country" : "us",
                "postalcode" : "68901",
                "isdefault" : false
            },
        ]
    }

Exploring $set updates

Here's some exploring of update. I used additional vertical space to make things clearer. The two updates done here do different things. When $set is used, it adds the new construct to what's there. When not, it replaces all, but the _id.

  > use funstuff
  switched to db funstuff

  > db.fun.insert( { _id : 123, "fun" : "things" } );
  > db.fun.findOne()
  { "_id" : 123, "fun" : "things" }

  > db.fun.update( { _id:123 }, { $set: { hello: "world" } } );
  > db.fun.findOne()
  { "_id" : 123, "fun" : "things", "hello" : "world" }

  > db.fun.remove( { _id : 123 } )
  > db.fun.insert( { _id : 123, "fun" : "things" } );

  > db.fun.update( { _id:123 }, { hello: "world" } );
  > db.fun.findOne()
  { "_id" : 123, "hello" : "world" }

More exploring $set updates

Here'a rather more complex exploration with interleaved Java code (that, at first at least, wasn't tested even for syntax).

    // What's going on in Enchiladas...
    > db.Enchiladas.findOne();
    {
        "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
        "email" : "[email protected]",
        "password" : "passpass",
        "isdefault" : false
    }

    // Initialize 'sam' with the bucket that interests us.
    > var sam = db.Enchiladas.findOne( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") } );
      
        BasicDBObject query = new BasicDBObject();

        query.put( "_id", new ObjectId( "4fccc8dde4b0d5de2eeab3c5" ) );

        DBCursor cursor = collection.find( query );
        DBObject sam    = null;

        while( cursor.hasNext() )
        {
            sam = cursor.next();
            break;
        }

        // sam "points" at his account!



    // Create a new field in document 'sam' to hold the address:
    > sam.address = { "addresstype":2, "fullname":"Yosemite Sam Tucker", "street1":"1321 Hollywood Blvd", \
        "street2":"(back lot)", "city":"Culver City", "state":"ca", "country":"us", "postalcode":"90211", \
        "isdefault":false }
    {
        "addresstype" : 2,
        "fullname" : "Yosemite Sam Tucker",
        "street1" : "1321 Hollywood Blvd",
        "street2" : "(back lot)",
        "city" : "Culver City",
        "state" : "ca",
        "country" : "us",
        "postalcode" : "90211",
        "isdefault" : false
    }
      
        BasicDBObject address = new BasicDBObject();

        address.put( "addresstype", 2 );
        address.put( "fullname", "Yosemite Sam Tucker" );
        address.put( "street1", "1321 Hollywood Blvd" );
        address.put( "street2", "(back lot)" );
        address.put( "city", "Culver City" );
        address.put( "state", "ca" );
        address.put( "country", "us" );
        address.put( "postalcode", "90211" );
        address.put( "isdefault", false );

        // this will replace what's sam with what's address: we don't want that!
        collection.update( sam, address );

        // update fodder here (how to do "$set"...)
        // the statements as if in Mongo console (JavaScript) are built progressively...
        BasicDBObject newsam    = new BasicDBObject().append( "address", address );
        BasicDBObject augmented = new BasicDBObject().append( "$set", newsam );
        collection.update( sam, augmented );


    // Here's the update adding an address to Sam's bucket:
    > db.Enchiladas.update( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") }, sam );

    // Now, when we look at all we've got, we see different stuff in one bucket than in another.
    > db.Enchiladas.find( { } ).forEach( printjson );
    {
        "_id" : ObjectId("4fcccadbe4b0d5de2eeab3c6"),
        "email" : "[email protected]",
        "password" : "passpass",
        "isdefault" : false
    }
    {
        "_id" : ObjectId("4fcccb686ccd0f44d66a18a4"),
        "email" : "poop.abc.com",
        "ipaddress" : "192168.0.9",
        "password" : "passpass"
    }
    {
        "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
        "email" : "[email protected]",
        "password" : "passpass",
        "isdefault" : false,
        "address" : {
            "addresstype" : 2,
            "fullname" : "Yosemite Sam Tucker",
            "street1" : "1321 Hollywood Blvd",
            "street2" : "(back lot)",
            "city" : "Culver City",
            "state" : "ca",
            "country" : "us",
            "postalcode" : "90211",
            "isdefault" : false
        }
    }
      
        // find all the documents...
        DBCursor cursor = collection.find( query );

        while( cur.hasNext() )
            System.out.println( cursor.next() );

Quick and dirty Mongo set-up code

I have project named TryIt in which I prototype things quickly if I wish to experiment. Here's a class in it. It might be referenced from other notes on this page.

package experiment;

import java.net.UnknownHostException;

import com.mongodb.DBCollection;
import com.mongodb.Mongo;

public class MongoSetup
{
    Mongo  mongo = null;
    String database;

    public MongoSetup()
    {
        setup();
    }

    public MongoSetup( String database )
    {
        setup();
        this.database = database;
    }

    private void setup()
    {
        try
        {
            mongo = new Mongo();
        }
        catch( UnknownHostException e )
        {
            System.out.println( );
        }
    }

    public String getDatabase() { return this.database; }
    public void setDatabase( String database ) { this.database = database; }

    public DBCollection getCollection( String collection )
    {
        return this.getCollection( this.database, collection );
    }

    public DBCollection getCollection( String database, String collection )
    {
        return mongo.getDB( database ).getCollection( collection );
    }
}

Exploring arrays...

This example is from MongoDB -> Home -> Drivers -> Java Language Center -> Java Types

public static void main( String[] args )
{
    MongoSetup mongo = new MongoSetup( "funstuff" );

    ArrayList< Serializable > x = new ArrayList< Serializable >();

    x.add( 1 );
    x.add( 2 );
    x.add( new BasicDBObject( "foo", "bar" ) );
    x.add( 4 );

    BasicDBObject doc = new BasicDBObject( "odd-array", x );

    DBCollection collection = mongo.getCollection( "array_demo" );

    collection.insert( doc );
}

The Java snippet above created the Mongo console experience below.

    > use funstuff
    switched to db funstuff
    > show collections
    array_demo
    system.indexes
    > db.array_demo.findOne();
    {
        "_id" : ObjectId("4fce18d55a374a574039b45b"),
        "odd-array" : [
            1,
            2,
            {
                "foo" : "bar"
            },
            4
        ]
    }

What's going on?

A (Java-shabby) array is created for the purpose of demonstrating wild arrays embedded in a Mongo document. A Mongo document is created and the array embedded as odd-array before being inserted into the database collection shown.

How to add an array to a MongoDB document in Java...

That is, a complex object array.

public class IdentityType
{
    private String identity;
    private String type;

    public IdentityType() { }
    public IdentityType( String identity, String type ) { this.identity = identity; this.type = type; }

    public String getIdentity() { return identity; }
    public void setIdentity( String identity ) { this.identity = identity; }
    public String getType() { return type; }
    public void setType( String type ) { this.type = type; }

    public String toString()
    {
        StringBuilder sb = new StringBuilder();

        sb.append( "{\n" );
        sb.append( "  identity: " + this.identity + "\n" );
        sb.append( "  type:     " + this.type + "\n" );
        sb.append( "\n}" );

        return sb.toString();
    }
}

This is the relevant POJO code:

    private List< IdentityType > idtypes = new ArrayList< IdentityType >();
    public List< IdentityType >  getIdtypes()                             { return this.idtypes; }
    public void                  setIdtypes( List< IdentityType > types ) { this.idtypes = types; }
    public void                  addIdtype( IdentityType type )           { this.idtypes.add( type ); }

Here's the trip from POJO to MongoDB document:

    public DBObject getBsonFromPojo()
    {
        if( getIdtypes().size() > 0 )
        {
            List< BasicDBObject > list = new ArrayList< BasicDBObject >();

            for( IdentityType idt : getIdtypes() )
            {
                BasicDBObject idtype = new BasicDBObject();

                idtype.put( "identity", idt.getIdentity() );
                idtype.put( "type", idt.getType() );
                list.add( idtype );
            }

            document.put( "idtypes", list );
        }

        return document;
    }

It's an easier trip back from MongoDB to POJO:

    public void makePojoFromBson( DBObject bson )
    {
        BasicDBObject b = ( BasicDBObject ) bson;

        ...
        setIdtypes( ( List< IdentityType > ) b.get( "idtypes" ) );
    }

The $ (positional) operator for updating array elements

The $ (dollar sign) can be used to represent the position of the matched array item in the query, or first half of an update operation.

Imagine a document like (there happens to be only one in this collection):

    > db.accounts.findOne();
    {
      "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
      "addresses" : [
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
          "type" : 1,
          "street" : "123 My Street",
          "city" : "Bedford Falls",
          "state" : "NJ"
        },
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
          "type" : 3,
          "street" : "789 My Street",
          "city" : "Bedford Falls",
          "state" : "NJ"
        }
      ],
      "name" : "Jack"
    }

You wish to re-type the second of the two addresses from 3 to 2. First, create a query that will identify that address.

    > db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
    ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
    {
      "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
      "addresses" : [
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
          "type" : 1,
          "street" : "123 My Street",
          "city" : "Bedford Falls",
          "state" : "NJ"
        },
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
          "type" : 3,
          "street" : "789 My Street",
          "city" : "Bedford Falls",
          "state" : "NJ"
        }
      ],
      "name" : "Jack"
    }

With the right address getting isolated, you can now use the positional operator to set its type field to 2.

    > db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
    ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
    ... { "$set" : { "addresses.$.type" : 2 } } );

That did it. You can now reuse the query to determine that it actually happened.

    > db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
    ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
    {
      "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
      "addresses" : [
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
          "type" : 1,
          "street" : "123 My Street",
          "city" : "Bedford Falls",
          "state" : "NJ"
        },
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
          "city" : "Bedford Falls",
          "state" : "NJ",
          "street" : "789 My Street",
          "type" : 2
        }
      ],
      "name" : "Jack"
    }

To change two or more fields, you only add more comma-separated tuples to the $set:

    > db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
    ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
    ... { "$set" : { "addresses.$.type" : 3, "addresses.$.city" : "Potterville" } } );
    > db.accounts.findOne();
    {
	    "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
	    "addresses" : [
		    {
			    "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
			    "type" : 1,
			    "street" : "123 My Street",
			    "city" : "Bedford Falls",
			    "state" : "NJ"
		    },
		    {
			    "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
			    "city" : "Potterville",
			    "state" : "NJ",
			    "street" : "789 My Street",
			    "type" : 3
		    }
	    ],
	    "city" : "Potterville",
	    "name" : "Jack"
    }

In Java...

In Java, some of this above would be like this. Just as above, notice the dots and dollar signs. Incidentally, if only one of these fields, say street, were to change, the others just wouldn't be passed (as is obvious from checking to see if something's in there in the first place).

private static void update( ObjectId accountoid, Address address )
{
    BasicDBObject match = new BasicDBObject();
    match.put( "_id", accountoid );
    match.put( "addresses.oid", address.getOid() );

    BasicDBObject addressSpec = new BasicDBObject();
    Integer type = address.getType();
    String temp;

    if( ( type = address.getType() ) != null )
        addressSpec.put( "addresses.$.type", type );
    if( ( temp = address.getStreet() ) != null )
        addressSpec.put( "addresses.$.street", temp );
    if( ( temp = address.getCity() ) != null )
        addressSpec.put( "addresses.$.city", temp );
    if( ( temp = address.getState() ) != null )
        addressSpec.put( "addresses.$.state", temp );

    BasicDBObject update = new BasicDBObject();
    update.put( "$set", addressSpec );

    collection.update( match, update );
}

MongoDB semantics

Here is some semantic fall-out from MongoDB terminology and things we say about MongoDB.

Sharding —Where data is split between more than one replica set. What is in one shard isn't in another. Sharding in MongoDB must be carefully configured, it doesn't come for free, you must do a lot of extra work to achieve it. Among other reasons to shard, sharding can be used to solve issues of geographic collation of data and scaling of that data.

MongoDB configuration server —This is a special instance of the mongod dæmon that maintains shared-cluster metadata to give to instances of mongos. It's the "how-to" section of the sharding mongos brain. There should be three of these since the MongoDB is dead in the water without at least one in good health. A configuration server (also called a "config server") can only mean sharding.

Replica set —A collection of replica nodes. A MongoDB shard must have one of these, but a replica set doesn't need to be in a shard to stand on its own. A replica set ensures that data is written to more than one node (place)—effectively duplicating it or better. Note that as soon as you say multiple replica sets, you are necessarily referring to a sharded configuration.

Replica node —A single instance of the mongod dæmon running usually alone on a VM or host.

mongod —This is the basic MongoDB dæmon. In a sense, it just i MongoDB.

mongos —This is a special dæmon that connects an application to a MongoDB sharding set-up and controls reading and writing to the appropriate shard for the data concerned. It uses information from a special mongod erected as a MongoDB configuration server. mongos can only mean sharding.

WriteConcern

See http://www.littlelostmanuals.com/2011/11/overview-of-basic-mongodb-java-write.html. Explore also "MongoDB tagging."

A better much later treatment exists as a subsection on write concerns to my MongoDB Error-handling Notes.

(no write-concern arguement)	Writes to driver which must send potentially over wire to reach `mongod`.
`WriteConcern.SAFE`	Returns after operation known to have reached `mongod`.
`WriteConcern.JOURNAL_SAFE`	Returns after operation known to have reached `mongod` and written to its journal.
`WriteConcern.MAJORITY`	Like `SAFE`, but returns after operation has been written to a simple majority of nodes in the replica set.
`WriteConcern.FSYNC_SAFE`	Returns after operation has been written to the server data file.

Persist new records like user accounts, addresses and payment methods, etc. with

collection.save( account, WriteConcern.FSYNC_SAFE );

--takes a comparatively long time.

Persist updates to addresses and payment methods with

collection.save( new_address/new_payment, WriteConcern.FSYNC_SAFE );

--because these are really new operations (the old one is "forgotten" and left in place). Use

collection.merge( old_address/old_payment, WriteConcern.SAFE );

to update the old entity with the “forgotten” flag.

Voting to replace a primary...

With respect to a replica set in Mongo, if the primary and/or other nodes are lost, you must have a "quorum" of voting nodes in order to elect a new primary and to retain full transactional status, i.e.: reading and writing. If you don't have a quorum, in many cases you can continue supporting reads, but no writes.

A quorum (my terminology) is "at least 51% or more of the original number of nodes in the replica set". 10gen doesn't use this obvious word, but they should: in a voting body, a quorum is the smallest number of members that can make a decision in the absence of others.

Also, a voting Mongo replica set quorum must consist of an odd number of members.

So, if we start with a primary and four secondaries, that makes 5. Lose the primary and we have 4 left. That's an even number which doesn't work. We would need an arbiter too, to break the tie. I don't think arbiters count as members (for calculating quorums), but when voting, an arbiter does count as a member. So, an arbiter should be added, I think.

Note: There is nothing wrong with arbiters; they're practically free being only mongods requiring virtually no disk and precious little memory.

In a second case, if we started with a primary and three secondaries, that would make 4 total. Lose the primary and there are 3 voting members; I think that might be enough to elect a new primary.

Locking in MongoDB...

...is done at the database level beginning in 2.2. Someday, it's slated to be more granular still, at the collection level.

In MongoDB locks aren't really locks in the RDBMS sense, but more mutexes that a process takes while in a critical section of work being done.

A lock isn't held across multiple documents (rows) as it would be in RDBMS; the duration of the lock is measured in microseconds.

Coming from RDBMS, one shouldn't expect that locks will be a limiting factor in MongoDB because locks can be used tens of thousands of times per second for writes (and reads).

Colorizing the MongoDB interactive shell...

You know that you can configure a few things using what are called "rc" files, typically kept in your home folder. You've seen them:

    .exrc, .vimrc, .bashrc

etc.

So, you shouldn't be surprised that someone came up with a very fun and useful way of injecting color into your MongoDB interactive shell session. Enter Tyler Brock who replaces the (for now at least) zero-length .mongorc.js file with his own. You do have to be running, at very oldest, MongoDB 2.2.x.

You can git (pun intended) his stuff and set it up for the next time you run the MongoDB shell by doing the following:

Pick a subdirectory where you'd like to drop his stuff. You'll be updating it, if there's ever need, the same way you'd ever update sources controlled by Git. I put mine under ~/dev.

Do this:

    $ cd ~/dev
    $ git clone [email protected]:TylerBrock/mongo-hacker.git
    $ cd ~
    $ ln -s ~/dev/mongo-hacker/mongo_hacker.js .mongorc.js

Then just launch (relaunch) MongoDB to see color when you do stuff:
```
    $ mongo
```

If ever there's reason to do it, update mongodb-hacker:

    $ cd ~/dev/mongo-hacker
    $ git pull origin master

If you decide this is evil and you no longer want to be part of it, do this:
```
    $ rm -rf ~/dev/mongo-hacker
    $ rm ~/.mongorc.js
```

Enjoy the ride. Here's something you might see. I don't care for all the colors, but there's probably a way to change that. I also don't like the long prompt he's added; I'll definitely smoke that. (Just edit mongo_hacker.js, look for "prompt" and comment out the whole paragraph of code.)

Benchmarking MongoDB...

An example.

package com.mongodb;

import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.MongoClient;

import java.net.UnknownHostException;

public class PerfTest
{
	public static void main( String[] args ) throws UnknownHostException
	{
		MongoClient  m = new MongoClient();
		DBCollection c = m.getDB( "test" ).getCollection( "PerfTest" );

		/* Add this in to insert 500 documents before running the test:
		 *     c.drop();
		 *     for( int i = 0; i < 500; i++ )
		 *         c.insert( new BasicDBObject( "_id", i ) );
		 */
		c.findOne();

		DBCursor cursor = c.find();
		long startTime = System.nanoTime();

		try
		{
			while( cursor.hasNext() )
				cursor.next();
		}
		finally
		{
			cursor.close();
		}

		long   estimatedTime = System.nanoTime() - startTime;
		double seconds       = ( double ) estimatedTime / 1000000000.0;

		System.out.println( "Done in " + seconds );
	}
}

MongoDB 2.6 webinar notes

Index maintenance

Inconvenient to add new indices to existing collections, especially if big. Now possible to add it in the background.

Auto-cancelation of operations by posting a maximum time in milliseconds for any operation, granular.

Write commands delivered to the server (inserts, updates and deletes). All operations now deliverable in bulk, by some order, etc. Enables asynchronous communication with server.

Power of 2 allocation enabled by default resulting easier predictability of storage requirements.

Developers

Improvements to query system.

Index intersection, query introspection.

Integrated text search. Beta in 2.4, released in 2.6 and integrated.

New update operators $multiply, $min, $max. Now testable and extensible to add these whereas in the past it was very hard.

Aggregation pipeline enhancements, since 2.4, but in 2.6 unlocks large data sets. (unlimited result set size vs. 16Mb)

Enterprise security

Authentication:
Kerberos (2.4), LDAP (2.6), x.509 (2.6).
Authorization:
User-defined roles for DBs and collections.
Encryption:
Mixed-mode SSL. Obfuscation: Field-level redaction via aggregation framework.
Auditing:
Trails can be written to separate file or system log.

MMS

Monitoring, of course.

Back-up

Takes 5 minutes to set up
Can use S3 with control over deleting back copies, etc.

And also automation from a web-based interface. That is, automating set-up of new databases. In other words, MongoDB's answer to Chef. Useful if you don't want to walk the Chef road.