Here are some basic steps for data manipulation in MongoDB using Python.
- Manipulating data in MongoDB with CRUD operations: Create, Retrieve, Update, Delete
- These are method calls equivalent to DML statments in relational databases:
- (
Insert, Select, Update, Delete
). - CRUD operations are atomic operations.
- An atomic operation has an all-or-nothing semantic. This means, for example, that update methods that affect various document attributes have to be valid for all of them, or will fail.
- However, MongoDB does not support traditional locking and complex transactions.
- Differently of traditional relational databases, Insert, Update and Delete operations in MongoDB work in a "fire and forget" model:
- The client issue the method call but do not wait for confirmation on whether or when the statement was actually executed in the database. Once the call is placed over the network, the application returns.
- You may need to explicitly check (with another call:
getLastErrorObj()
) the status of the operation. (see example below) - When a
GetLastErrorObj()
command is issued it forces the data to be sent over the wire to the server and it returns to the client an error document if there was an error in the last operation.
- Comparing data manipulating operations in a relational table and in a MongoDB collection:
Relational Database | MongoDB |
---|---|
Table BLOG (author, post, tags, date) | Collection BLOG (Columns not statically defined) |
INSERT statement SQL> INSERT into BLOG | >>> post = { "author": "joe", |
SELECT statement SQL> SELECT * from BLOG | >>> db.blog.find({"Author": "joe"}) |
UPDATE statement SQL> Update BLOG set tags = "MongoDB, Python" | >>> db.blog.update({"author":"joe"}, |
DELETE statement SQL> DELETE from BLOG where author = "joe" | >>> db.blog.remove({"author":"joe"}) |
Creating a new collection |
$ ipython Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) Type "copyright", "credits" or "license" for more information. ... In [2]: import pymongo --- import pymongo package In [3]: from pymongo import Connection In [4]: from bson import ObjectId In [5]: connection = Connection() In [6]: connection Out[6]: Connection('localhost', 27017) --- connected to localhost, in the default TCP port In [7]: connection.database_names() --- list existing databases Out[7]: [u'test', u'local'] In [8]: db = connection['blogdb'] --- connect to a new database. --- It will be created when the first object is inserted. In [9]: post = { "author": "John", ... "text": "Blogging about MongoDB"}; In [9]: db.posts.insert(post); --- The first insert creates the new collection 'posts' Out[9]: ObjectId('...') In [10]: db.collection_names() [u'system.indexes', u'posts']Notes:
- Within the database, there are several namespaces.
- In fact, each collection is one namespace, including some system namespaces ($freelist, indexes).
- You can further organize different collection within a single namespace by prepending each collection name with a common namespace name.
- For example, the collections
book.info
andbook.authors
are in the same namespacebook
.
Inserting a document in a collection |
- In MongoDB documents within a collection do not have all to have the same number and type of fields ("columns"). In other words, schemas in MongoDB are dynamic, and can vary within one collection.
- PyMongo uses dictionary objects to represent JSON-style documents.
- To add a new document to a collection, using ipython:
In [9]: post = { ...: 'author': 'Joann', ...: 'text': 'Just finished reading the Book of Nights'} In [10]: db.posts.insert(post) --- Method call to create a new document (post) Out[10]: ObjectId('4eb99ad5a9e15833b1000000') In [17]: for post in db.posts.find(): --- listing all documents in the posts collection. post ....: ....: Out[18]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'}
- Note that you don't need to specify the "_id" field when inserting a new document into a collection.
- The document identifier is automatically generated by the database and is unique across the collection.
ObjectID("4bface1a2231316e04f3c434") 4bface1a -- timestamp of object creation (4 bytes) 223131 -- derived from client machine's IP (3 bytes) 6e04 -- derived from client's process id (2 bytes) f3c434 -- counter (3 bytes)
In [1]: post = {'author': 'David', ....: 'text' : "David's Blog"} In [2]: db.posts.insert(post) Out[2]: ObjectId('4eb9bcada9e15809f3000000') In [3]: print post["_id"].generation_time 2011-11-08 23:35:09+00:00 |
- You can also execute bulk inserts:
In [13]: many_posts = [{'author': 'David', ....: 'text' : "David's Blog"}, ....: {'author': 'Monique', ....: 'text' : 'My photo blog'}] In [14]: db.posts.insert(many_posts) Out[14]: [ObjectId('4eb9bcada9e15809f3000000'), ObjectId('4eb9bcada9e15809f3000001')] In [15]: for post in db.posts.find(): ....: post ....: ....: Out[15]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'} Out[15]: {u'_id': ObjectId('4eb9bcada9e15809f3000000'), u'author': u'David', u'text': u"David's Blog"} Out[15]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique', u'text': u'My photo blog'}
Selecting (reading) documents inside collections |
- Data in MongoDB is represented by structures of key-value pairs, using JSON-style documents.
- The query language used by MongoDB is essentially "example-based," plus some modifiers.
- Query results are delivered via cursors.
-- The find method below returns a cursor to variable c. -- The method query all documents in which attribute x equals to 20, and returns 10, starting with the 11th one. var c = db.collection.find({x:20}).skip(10).limit(10) c.next() -- iteract through the cursor with .next() c.next()
find_one()
method.In [48]: db.posts.find_one() --- returns the first document in the collection Out[48]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'}
We can also define criteria for the query. For example,
(a) return one document with field "author" equal to "Joann"
In [49]: db.posts.find_one({'author':'Joann'}) Out[49]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'}
Querying more than one document
A query returns a cursor pointing to all the documents that matched the query criteria.
To see these documents you need to iteract through the cursor elements:
In [51]: for post in db.posts.find(): post ....: ....: Out[52]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'} Out[52]: {u'_id': ObjectId('4eb9bcada9e15809f3000000'), u'author': u'David', u'text': u"David's Blog"} Out[52]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique', u'text': u'My photo blog'} -- Alternatively, you can explicitly define a cursor variable: In [53]: cursor = db.posts.find() In [54]: for x in cursor: ....: x ....: ....: Out[54]: {u'_id': ObjectId('4eb99ad5a9e15833b1000000'), u'author': u'Joann', u'text': u'Just finished reading the Book of Nights'} Out[54]: {u'_id': ObjectId('4eb9bcada9e15809f3000000'), u'author': u'David', u'text': u"David's Blog"} Out[54]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique', u'text': u'My photo blog'}
You can also return only some of the document fields. (Similar to a SQL query that returns only a subset of the table columns).
In [64]: for post in db.posts.find({}, {'text': 1}): post ....: ....: Out[65]: {u'_id': ObjectId('...'), u'text': u'Just finished reading the Book of Nights'} Out[65]: {u'_id': ObjectId('...'), u'text': u"David's Blog"} Out[65]: {u'_id': ObjectId('...'), u'text': u'My photo blog'}
Updating documents in collections |
- MongoDB supports atomic updates in document fields as well as more traditional updates for replacing an entire document.
- Use the
update()
method to entirely replace the document matching criteria with a new document. - If you want to modify only some attributes of a document, you need to use one of the
$set
modifier. update()
usually takes two parameters:- the first select the documents that will be updated (similar to the WHERE clause on SQL);
- the second parameter contains the new values for the document attributes.
Now, update the post where the author was Monique.
(1) substitute the document for an entirely new document
In [68]: post = db.posts.find_one({'author': 'Monique'}) In [69]: post Out[69]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique', u'text': u'My photo blog'} In [70]: post['author'] += 'Berger' # expand the contents of 'author' In [71]: post['edited'] = True # add a new field In [72]: post # the variable now has an extra field Out[72]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'MoniqueBerger', 'edited': True, u'text': u'My photo blog'} In [73]: db.posts.save(post) # update the entire document. Out[73]: ObjectId('4eb9bcada9e15809f3000001')
(2) Another way to update only some fields of a document, is to use the
$set
update modifier. - The
$set
modifier works like the SET clause on an SQL Update statement, with which you can specify the columns that will be updated
In [95]: db.posts.update({'author': 'Monique Berger'}, { '$set': {'text': 'My GREAT photo blog'}}) # updating In [96]: post = db.posts.find_one({'author': 'Monique Berger'}) In [97]: post Out[97]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique Berger', u'edited': True, u'text': u'My GREAT photo blog'}
You can add a new field and increment its value with
$push
.Bellow, a comment and a reply are added to the document.
In [18]: comment = { '_id' : ObjectId(), ....: 'by' : 'David', ....: 'text' : 'Post some photos, then' ....: } In [19]: comment Out[19]: {'_id': ObjectId('4eba1d3ba9e1580f1c000000'), 'by': 'David', 'text': 'Post some photos, then'} In [23]: db.posts.update({'_id': post['_id']}, {'$push' : {'comments': comment}, '$inc' : {'nComments' : 1}}) In [29]: reply = { ....: '_id': ObjectId(), ....: 'inReplyTo': comment['_id'], ....: 'by' : 'Marcel', ....: 'text' : 'Yes, post some photos!' ....: } In [30]: db.posts.update({'_id' : post['_id']}, ....: {'$push': {'comments': reply}, '$inc': {'nComments': 1}}) In [31]: post = db.posts.find_one({'comments.by' : 'David'}) In [32]: post Out[32]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique Berger', u'comments': [{u'_id': ObjectId('4eba1d3ba9e1580f1c000000'), u'by': u'David', u'text': u'Post some photos, then'}, {u'_id': ObjectId('4eba1ea7a9e1580f1c000001'), u'by': u'Marcel', u'inReplyTo': ObjectId('4eba1d3ba9e1580f1c000000'), u'text': u'Yes, post some photos!'}], u'edited': True, u'nComments': 2, u'text': u'My GREAT photo blog'}Note that
$push
appends value to an existing array. If the array in the field do not exist,
$push
creates the field and the array with one element.In this case,
$push
created the field "comments" and inserted a comment. Subsequently, a reply comment is also inserted in the field, with the value of number of comments to the post incremented.
# Updating the comment:
In [34]: db.posts.update({'_id': post['_id'], 'comments._id': comment['_id']}, ....: { '$set' : {'comments.$.text': 'Post some photos and write about them!'}}) In [35]: post = db.posts.find_one({'comments.by' : 'David'}) In [36]: post Out[36]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique Berger', u'comments': [{u'_id': ObjectId('4eba1d3ba9e1580f1c000000'), u'by': u'David', u'text': u'Post some photos and write about them!'}, {u'_id': ObjectId('4eba1ea7a9e1580f1c000001'), u'by': u'Marcel', u'inReplyTo': ObjectId('4eba1d3ba9e1580f1c000000'), u'text': u'Yes, post some photos!'}], u'edited': True, u'nComments': 2, u'text': u'My GREAT photo blog'}
# Now you may want to remove the reply to the comment on Monique's post:
In [38]: db.posts.update({'_id': post['_id']}, ....: {'$pull': {'comments' : {'_id': reply['_id']}}, '$inc': {'nComments': -1}}) In [39]: post = db.posts.find_one({'comments.by' : 'David'}) In [40]: post Out[40]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique Berger', u'comments': [{u'_id': ObjectId('4eba1d3ba9e1580f1c000000'), u'by': u'David', u'text': u'Post some photos and write about them!'}], u'edited': True, u'nComments': 1, u'text': u'My GREAT photo blog'} # Now updating tags and tag statistics in the post: # adding tags to the post; creating a tags collection; maintain tag count; # upsert option: update if the record that matches already exists. If not, create the object and then apply update. In [41]: for tag in ['photos', 'python', 'mongodb']: ....: db.posts.update({'_id': post['_id']}, { '$addToSet': { 'tags': tag}}) ....: db.tags.update({'_id': tag}, { '$inc':{'count': 1}}, upsert=True) ....: ....: In [42]: post = db.posts.find_one({'tags' : 'photos'}) In [43]: post Out[43]: {u'_id': ObjectId('4eb9bcada9e15809f3000001'), u'author': u'Monique Berger', u'comments': [{u'_id': ObjectId('4eba1d3ba9e1580f1c000000'), u'by': u'David', u'text': u'Post some photos and write about them!'}], u'edited': True, u'nComments': 1, u'tags': [u'photos', u'python', u'mongodb'], u'text': u'My GREAT photo blog'} # listing the tags collection: In [45]: for tag in db.tags.find(): tag ....: ....: Out[46]: {u'_id': u'photos', u'count': 1} Out[46]: {u'_id': u'python', u'count': 1} Out[46]: {u'_id': u'mongodb', u'count': 1}
Deleting documents from collections |
remove
, passing as parameter a document field that either (a) uniquely identifies the document you want to delete or (b) identifies the set of documents you want to delete.s>>> db.blog.remove({"author":"Monique"}) >>> for post in db.blog.find(): ... post ... {u'date': datetime.datetime(2011, 11, 7, 22, 10, 43, 77000), u'text': u'Blogging about MongoDB', u'_id': ObjectId('4eb857b3a9e158609c000004'), u'author': u'John', u'tags': [u'MongoDB', u'NoSQL', u'Python']} >>>
On Atomic Operations |
- In MongoDB, only operations executed against a single document can be performed as atomic operations.
- An atomic operation is a set of operations that can be combined in such a way that the entire set appear to be a single operation to the rest of the system. An operation is atomic when:
- 1. No other process knows about the changes being made until the entire set of operations has completed.
- 2. if one of the operations fails, the entire set fails.
- In traditional RDBMSs, atomicity is guaranteed through locks.
During an update operation, for example, all rows that need to be modified are locked and either all modifications succeed or none will. - No other transaction can see the locked rows while the update process is being performed.
- MongoDB, however, does not support locking or transactions (begin-commit-rollback).
- In this way, while atomic operations are guaranteed to succeed, MongoDB cannot prevent race conditions in which data is changed by user A while user B's update in the same data is being performed.
- The result in this case is that, although user B's atomic update completes successfully, it ultimately modified the wrong data: it modified the value set by User A, and not the value User B had initially read.