MongoDB with Python: a quick introduction (I)



Here are some basic steps for data manipulation in MongoDB using Python.

Download pymongo
pymongo is a native Python driver for MongoDB.
The PyMongo distribution contains tools for working with MongoDB.

(1) Installing PyMongo is very simple if you have setuptools installed. To install setuptools you need to:
(a) Download the egg file for your version of python: get it here.
(b) After downloaded, execute the egg as if it were an actual shell scipt:
$ sudo sh setuptools-0.6c11-py2.6.egg

(2) With setuptools installed, you can install pymongo using:
$ sudo easy_install pymongo
Searching for pymongo
Best match: pymongo 2.0.1
Processing pymongo-2.0.1-py2.6-linux-i686.egg
pymongo 2.0.1 is already the active version in easy-install.pth

Using /usr/local/lib/python2.6/dist-packages/pymongo-2.0.1-py2.6-linux-i686.egg
Processing dependencies for pymongo
Finished processing dependencies for pymongo

(b) or you can Install from source
$ git clone git://github.com/mongodb/mongo-python-driver.git pymongo
$ cd pymongo/
$ python setup.py install

To test whether the installation was successful, try to import pymongo package into python without raising an exception:
jdoe@lambda:$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import pymongo
>>>

Connect to the MongoDB server and check that you're connected to the local host in the default port.
>>> from pymongo import Connection
>>> connection = Connection()            -- create a connection with the default server/port
>>> connection                           -- print connection details
Connection('localhost', 27017')

-- You can explicitly specify host and tcp port where the mongoDB service you want to connect is running.  
>>> connection = Connection('192.117.47.23', 20120)
>>>

Connect to a database
Once connected to the database server, you need to connect to a specific mongodb database.
>>> connection.database_names()       --- list the available databases in the server
[u'mynewdb', u'local', u'test']
>>>
>>> db = connection['mynewdb']        --- connects to 'mynewdb'
>>>  
>>> db.name                           --- list name of database you're connected to
u'mynewdb'
>>>

Access database collections
Collections can be thought as analogous to tables in relational databases. To see existing collections in the database:
>>> db.collection_names()           --- list existing collections
[u'mycollection', u'system.indexes', u'things', u'comments']
>>>
>>> things = db['things']
>>>
>>> things.name                     --- print collection name
u'things'
>>>
>>> things.database                 --- database that holds the collection
Database(Connection('localhost', 27017), u'mynewdb')
>>>
>>> things.count()                  --- get the number of existing documents in the collection
5


  • Manipulating data in MongoDB with CRUD operations: Create, Retrieve, Update, Delete
  • These are the atomic operations used to manipulate the data.
  • These are method calls equivalent to DML statments in relational databases (Insert, Select, Update, Delete).
  • Comparing data manipulating operations in a relational table and in a MongoDB collection:
Relational Database MongoDB
Table BLOG (author, post, tags, date) Collection BLOG (Columns not statically defined)
INSERT statement
SQL> INSERT into BLOG
Values ("joe", v_post, "MongoDB, Python", sysdate)
>>> post = { "author": "joe",
        "text": "Blogging about MongoDB",
        "tags": ["MongoDB", "Python"],
        "date": datetime.datetime.utcnow()}
>>> db.blog.insert(post)
SELECT statement
SQL> SELECT * from BLOG
Where author = "joe"
>>> db.blog.find({"Author": "joe"})
UPDATE statement
SQL> Update BLOG set tags = "MongoDB, Python"
     where author = "joe"
>>> db.blog.update({"author":"joe"},
        { "$set": ["MongoDB", "Python"]})
DELETE statement
SQL> DELETE from BLOG where author = "joe"
>>> db.blog.remove({"author":"joe"})

Creating a new collection
Databases and Collections in MongoDB are created only when the first data is inserted.
$ ipython
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
Type "copyright", "credits" or "license" for more information.
...
In [2]: import pymongo                  --- import pymongo package
In [3]: from pymongo import Connection
In [4]: from bson import ObjectId

In [5]: connection = Connection()
In [6]: connection
Out[6]: Connection('localhost', 27017)  --- connected to localhost, in the default TCP port
In [7]: connection.database_names()     --- list existing databases
Out[7]: [u'test', u'local']

In [8]: db = connection['blogdb']       --- connect to a new database. 
                                        --- It will be created when the first object is inserted.

In [9]: post = { "author": "John", 
...          "text": "Blogging about MongoDB"};

In [9]: db.posts.insert(post);                      --- The first insert creates the new collection 'posts'
Out[9]: ObjectId('...')
In [10]: db.collection_names()
[u'system.indexes', u'posts']


Note: Collections can also be organized in namespaces, defined using a dot notation. For example, you could create two collections named: book.info and book.authors.

Inserting a document in a collection
  • In MongoDB documents within a collection do not have all to have the same number and type of fields ("columns"). In other words, schemas in MongoDB are dynamic, and can vary within one collection.
  • PyMongo uses dictionary objects to represent JSON-style documents.
  • To add a new document to a collection, using ipython:
In [9]: post = { 
   ...:     'author': 'Joann',
   ...:     'text': 'Just finished reading the Book of Nights'}

In [10]: db.posts.insert(post)        --- Method call to create a new document (post)
Out[10]: ObjectId('4eb99ad5a9e15833b1000000')

In [17]: for post in db.posts.find():   --- listing all documents in the posts collection.
             post
   ....:     
   ....:     
Out[18]: 
{u'_id': ObjectId('4eb99ad5a9e15833b1000000'),
 u'author': u'Joann',
 u'text': u'Just finished reading the Book of Nights'}
  • Note that you don't need to specify the "_id" field when inserting a new document into a collection.
  • The document identifier is automatically generated by the database and is unique across the collection.
  • You can also execute bulk inserts:
In [13]: many_posts = [{'author': 'David',
   ....:                'text' : "David's Blog"},
   ....:               {'author': 'Monique',
   ....:                'text' : 'My photo blog'}]

In [14]: db.posts.insert(many_posts)
Out[14]: [ObjectId('4eb9bcada9e15809f3000000'), ObjectId('4eb9bcada9e15809f3000001')]

In [15]: for post in db.posts.find():
   ....:     post
   ....:     
   ....:     
Out[15]: 
{u'_id': ObjectId('4eb99ad5a9e15833b1000000'),
 u'author': u'Joann',
 u'text': u'Just finished reading the Book of Nights'}
Out[15]: 
{u'_id': ObjectId('4eb9bcada9e15809f3000000'),
 u'author': u'David',
 u'text': u"David's Blog"}
Out[15]: 
{u'_id': ObjectId('4eb9bcada9e15809f3000001'),
 u'author': u'Monique',
 u'text': u'My photo blog'}

Selecting (reading) documents inside collections
  • Data in MongoDB is represented by structures of key-value pairs, using JSON-style documents.
  • Let's query the collection "things" and ask for ONE document in that collection. Use the find_one() method.
>>> things.find_one()                             --- returns the first document in the collection
{u'_id': ObjectId('4eb787821b02fd09c403b219'), u'name': u'mongo'}

Here it returned a document containing two fields (key-value pairs): 
  "_id": ObjectId('4eb787821b02fd09c403b219')  --- (an identifier for the document), and 
  "name": 'mongo'                              --- a "column" "name" with its associated value, the string 'mongo'.

We can also define criteria for the query. For example,
(a) return one document with field "name" equal to "mongo"
>>> things.find_one({"name":"mongo"});
{u'_id': ObjectId('4eb787821b02fd09c403b219'), u'name': u'mongo'}
>>>

(b) return one document with field "name" equal to "book"
>>> things.find_one({"name":"book"});
{u'keywords': [u'NoSQL', u'MongoDB', u'PyMongo'], u'date': datetime.datetime(2011, 11, 7, 19, 47, 44, 722000), u'_id': ObjectId(',,,'), u'name': u'book', u'title': u'Mastering MongoDB'}

Note: The dynamic nature of the mongoDB database schemas can be seen in the results of the queries above. Here the collection "things" has two documents with different number of fields ("columns") and datatypes: 
 {"name":"mongo"}
 {"name":"book, "title": "Mastering MongoDB", "Keywords":["NoSQL", "MongoDB", "PyMongo"], "date": datetime.datetime(2011, 11, 7, 19, 47, 44, 722000)} 

Querying more than one document
A query returns a cursor pointing to all the documents that matched the query criteria.
To see these documents you need to iteract through the cursor elements:
>>> for thing in things.find():
...     thing
... 
{u'_id': ObjectId('...'), u'name': u'mongo'}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 1.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 2.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 3.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 4.0}
{u'keywords': [u'NoSQL', u'MongoDB', u'PyMongo'], u'date': datetime.datetime(...), u'_id': ObjectId('...'), u'name': u'book', u'title': u'Mastering MongoDB'}
{u'keywords': [u'programming', u'Python', u'MongoDB'], u'date': datetime.datetime(...), u'_id': ObjectId('...'), u'name': u'book', u'title': u'Python and MongoDB'}
{u'name': u'book', u'title': u'Python Notes', u'keywords': [u'programming', u'Python'], u'year': 2011, u'date': datetime.datetime(...), u'_id': ObjectId('4...')}

-- Alternatively, you can explicitly define a cursor variable: 
>>> cursor = things.find()
>>> for x in cursor:
...     x
... 
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 1.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 2.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 3.0}
{u'x': 4.0, u'_id': ObjectId('...'), u'j': 4.0}
{u'keywords': [u'NoSQL', u'MongoDB', u'PyMongo'], u'date': datetime.datetime(...), u'_id': ObjectId('...'), u'name': u'book', u'title': u'Mastering MongoDB'}
{u'keywords': [u'programming', u'Python', u'MongoDB'], u'date': datetime.datetime(...), u'_id': ObjectId('...'), u'name': u'book', u'title': u'Python and MongoDB'}
{u'name': u'book', u'title': u'Python Notes', u'keywords': [u'programming', u'Python'], u'year': 2011, u'date': datetime.datetime(...), u'_id': ObjectId('...')}
>>> 


You can also return only some of the document fields. (Similar to a SQL query that returns only a subset of the table columns).
>>> for thing in things.find({"name":"book"}, {"keywords": 1}):
...     thing
... 
{u'keywords': [u'NoSQL', u'MongoDB', u'PyMongo'], u'_id': ObjectId('...')}
{u'keywords': [u'programming', u'Python', u'MongoDB'], u'_id': ObjectId('...')}
{u'keywords': [u'programming', u'Python'], u'_id': ObjectId('...')}
>>> 

Updating documents in collections
  • MongoDB supports atomic updates in document fields as well as more traditional updates for replacing an entire document.
  • Use the update() method to entirely replace the document matching criteria with a new document.
  • If you want to modify only some attributes of a document, you need to use one of the $set modifier.
  • update() usually takes two parameters:
    • the first select the documents that will be updated (similar to the WHERE clause on SQL);
    • the second parameter contains the new values for the document attributes.


Example: insert a new document in the blog collection, and update the tag values.
(1) Insert a new document in the blog collection

>>>new_post = { "author": "Monique", 
...       "text": "Sharding in MongoDB",
...       "tags": ["MongoDB"],
...       "date": datetime.datetime.utcnow()};
>>>
>>> db.blog.insert(new_post)
ObjectId('...')
>>> 

(2) list documents in the collection
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(2011, 11, 7, 22, 10, 43, 77000), u'text': u'Blogging about MongoDB', u'_id': ObjectId('...'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
{u'date': datetime.datetime(2011, 11, 8, 1, 5, 32, 604000), u'text': u'Sharding in MongoDB', u'_id': ObjectId('...'), u'author': u'Monique', u'tags': [u'MongoDB']}
>>> 

Now, update the post where the author was Monique.
(1) substitute the document for an entirely new document
>>> db.blog.update({"author":"Monique"}, { "author": "Monique", "text": "Sharding in MongoDB", "tags": ["MongoDB", "scalability"], "date": datetime.datetime.utcnow()});
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(2011, 11, 7, 22, 10, 43, 77000), u'text': u'Blogging about MongoDB', u'_id': ObjectId('...'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
{u'date': datetime.datetime(2011, 11, 8, 1, 8, 43, 416000), u'text': u'Sharding in MongoDB', u'_id': ObjectId('...'), u'author': u'Monique', u'tags': [u'MongoDB', u'scalability']}
>>> 

Note that the previous update replaced the previous document entirely, even if all you needed to do was to add one new tag to the tags field of the document. If you call the update method and pass only the new values for the tags attribute, the resulting update will be incorrect:
>>> db.blog.update({"author":"Monique"}, { "tags": ["MongoDB", "scalability"]});
>>>
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(...), u'text': u'Blogging about MongoDB', u'_id': ObjectId('...'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
{u'_id': ObjectId('...'), u'tags': [u'MongoDB']}                  --- updated document
>>> 

(2) Another way to update only some fields of a document, is to use the $set update modifier.
  • The $set modifier works like the SET clause on an SQL Update statement, with which you can specify the columns that will be updated
>>> db.blog.update({"author":"Monique"}, { "$set": {"tags": ["MongoDB","Scalability"]}});
>>>
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(...), u'text': u'Blogging about MongoDB', u'_id': ObjectId('...'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
{u'date': datetime.datetime(...), u'text': u'Sharding in MongoDB', u'_id': ObjectId('...'), u'tags': [u'MongoDB', u'Scalability'], u'author': u'Monique'}
>>> 

(3) Since the "tags"field is an array, you can more efficiently use the $push update modifier.
  • $push appends value to field, if field is an existing array, otherwise sets field to the array [value] if field is not present.
>>> db.blog.update({"author":"Monique"}, { "$push": {"tags":"Python"}});
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(2011, 11, 7, 22, 10, 43, 77000), u'text': u'Blogging about MongoDB', u'_id': ObjectId('4eb857b3a9e158609c000004'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
{u'date': datetime.datetime(2011, 11, 8, 1, 5, 32, 604000), u'text': u'Sharding in MongoDB', u'_id': ObjectId('4eb88081a9e158609c000005'), u'tags': [u'MongoDB', u'Scalability', u'Python'], u'author': u'Monique'}
>>> 


Deleting documents from collections
To delete a document from a collection use the method remove, passing as parameter a document field that either (a) uniquely identifies the document you want to delete or (b) identifies the set of documents you want to delete.s
>>> db.blog.remove({"author":"Monique"})
>>> for post in db.blog.find():
...     post
... 
{u'date': datetime.datetime(2011, 11, 7, 22, 10, 43, 77000), u'text': u'Blogging about MongoDB', u'_id': ObjectId('4eb857b3a9e158609c000004'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']}
>>>


No comments:

Post a Comment