Showing posts with label dbms. Show all posts
Showing posts with label dbms. Show all posts

Installing MongoDB on Ubuntu

Recently I have been trying MongoDB and other NoSQL databases for a project.
So I decided to compile some of my notes on posts here.

MongoDB: Download, Install and Configuration

Installing MongoDB on Ubuntu
To install MongoDB on Ubuntu, you can use the packages made available by 10gen, following the steps below:

(1) add a line to your /etc/apt/sources.list
deb dist 10gen

(2) Add the 10gen GPG key, or apt will disable the repository (apt uses encryption keys to verify if the repository is trusted and disables untrusted ones).
jdoe@quark:/etc/apt$ sudo apt-key adv --keyserver --recv 7F0CEB10
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /etc/apt/secring.gpg --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver --recv 7F0CEB10
gpg: requesting key 7F0CEB10 from hkp server
gpg: key 7F0CEB10: public key "Richard Kreuter " imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)

(3) To install the package, update the sources and then install:
$ sudo apt-get update
$ sudo apt-get install mongodb-10gen

(4) Create directory for datafiles and database logs.
MongoDB by default try to store datafiles in /data/db.
If the directory does not exist, the server will fail to start unless you explicitly assign a different, existing location for the datafiles.
For security reasons, make sure the directory is created as a non-root user.
(a) You can create the default directory: 
$ sudo mkdir -p  /data/db/
S sudo chown `id -u` /data/db

(b) you can choose to store datafiles somewhere else. If you choose this, make sure to specify the datafile location with the --dbpath option when starting the MongoDB server.
$ sudo mkdir -p  
S sudo chown `id -u` 

(5) You can test the installation, by calling the mongodb shell
jdoe@quark:~$ mongo
MongoDB shell version: 2.0.1
connecting to: test

The installation creates the mongodb user and install files according to the default configuration below:
Installed architecture
  • binaries installed on /usr/bin
jdoe@quark:/usr/bin$ ls -l mongo*
-rwx... mongo            -- database shell
-rwx... mongod           -- mongodb daemon. This is the core database process
-rwx... mongodump        -- hotbackups. creates a binary representation of the entire database, collections or collection objects
-rwx... mongoexport      -- exports a collection to JSON or CSV
-rwx... mongofiles       -- tool for using GridFS, a mechanism for manipulating large files in MongoDB
-rwx... mongoimport      -- imports a JSON/CSV/TSV file into a MongoDB
-rwx... mongorestore     -- restores the output of mongodump
-rwx... mongos           -- sharding controller. Provides automatic load balancing and partitioning
-rwx... mongostat        -- show usage statistics (numbers and percentuals) on a running monodb instance 
-rwx... mongotop         -- provide read/write statistics on collections and namespaces in a mongodb instance
  • Configuration file installed on /etc/mongodb.conf
  • database files will be created in: dbpath=/var/lib/mongodb
  • log files will be created in : logpath=/var/log/mongodb/mongodb.log

Starting up and Stopping MongoDB
  • mongod is MongoDB core database process. It can be manually started to run in the foreground or as a daemon.
  • MongoDB is a database server: it runs in the foreground or background and waits for connections from the user.
  • There are a number of options with which mongod can be initialized.
  • The startup options fall into general, replication, master/slave, replica set and sharding categories. Some of the startup options are:
--port      num   TCP port which mongodb will use
--maxConns  num   max # of simultaneous connections
--logpath   path  log file path
--logappend       append instead of overwrite log file
--fork            fork server process (daemon)
--auth            authenticate users
--dbpath    path  directory for datafiles
--directoryperdb  each database will be stored in its own directory
--shutdown        shutdowns server

(a) start mongodb running in the foreground in a terminal. Data stored in /mongodb/data. mongodb uses default port 27017.
(You need to create the /mongodb/data first).
jdoe@quark:~$ mkdir -p /mongodb/data
jdoe@quark:~$ mongod --dbpath /mongodb/data  
Sun Nov  6 19:05:09 [initandlisten] options: { dbpath: "/mongodb/data" }
Sun Nov  6 19:05:09 [websvr] admin web console waiting for connections on port 28017
Sun Nov  6 19:05:09 [initandlisten] waiting for connections on port 27017

jdoe@quark:~$ ps -ef | grep mongo
jdoe   20519 16142  0 19:05 pts/1    00:00:10 mongod --dbpath /mongodb/data
jdoe   20566 20034  0 19:49 pts/2    00:00:00 grep mongo

jdoe@quark:~$ ls -l /mongodb/data
total 4
-rwxr-xr-x 1 jdoe jdoe 6 2011-11-06 19:05 mongod.lock

(b) start mongodb as a daemon, running on TCP port 20012. Data stored in /mongodb/data. Logs on /mongodb/log.
jdoe@quark:~$ mongod --fork --port 20012 --dbpath /mongodb/data/ --logpath /mongodb/logs/mongodblog --logappend 
forked process: 20655
jdoe@quark:~$ all output going to: /mongodb/logs/mongodblog

Alternatively, you can start/stop mongoDB by:
jdoe@quark:~$ sudo start mongodb
mongodb start/running, process 2824

jdoe@quark:~$ sudo stop mongodb
mongodb stop/waiting

Stopping MongoDB

  1. Contro-C will do it, if the server is running on the foreground. Mongo waits until all ongoing operations complete and then exits.
  2. Alternatively:
(a) call mongod with --shutdown option
jdoe@quark:~$ mongod --dbpath /mongodb/data --shutdown
killing process with pid: 20746
(b) use database shell (mongo)
(Here note the confusing output of the db.shutdownServer() call. Although the messages suggest failure, the database is shutdown as expected).
jdoe@quark:~$ mongo quark:20012
MongoDB shell version: 2.0.1
connecting to: quark:20012/test
> use admin
switched to db admin
> db.shutdownServer()
Sun Nov  6 20:23:59 DBClientCursor::init call() failed
Sun Nov  6 20:23:59 query failed : admin.$cmd { shutdown: 1.0 } to: quark:20012
server should be down...

Structured Storage approaches

(Extracts from a good discussion posted by James Hamilton in his blog. You can check the original article and comments here.)

Although a couple of years old, James Hamilton provided a good requisite-based breakdown of data storage systems. These are some of his points:
  • The world of structured storage extends far beyond relational (Oracle, DB2, SQL Server, MySQL, NoSQL, etc) systems.
  • Many applications do not need the rich programming model of relational systems and some are better seviced by lighter-weight, easier-to-administers, and easier-to-scale solutions.

  • Structured storage approaches can be classified based on customer major requirements.
  • These are Feature-first, scale-first, simple structured storage and purpose-optimized stores.

(1) Feature-First
  • Traditional Relational database management systems (RDBMS) are the structured storage system of choice here.
  • Driven by requirements for Enterprise financial systems, human resource systems, customer relationship management systems (FIN, HR, CRMs)
    • Examples here include Oracle, MySQL, SQL Server, PostgreSQL, Sybase, DB2.
  • Cloud solutions here include:

(2) Scale-First
  • This is the domain of very high scale website (i.e. facebook, Gmail, Amazon, Yahoo, etc)
  • Scaling capabilities are more important than more features and none could run on a single rdbms.
  • The problem here is that the full relational database model (including joins, aggregations, use of stored procedures) is difficult to scale (especially in distributed contexts).
  • Distributing data across tens to thousands of rdbms instances and still maintain support for the distributed data as if it were under a single rdbms engine is difficult.
  • As an alternative, very high scale may be supported with the use of key-value store solutions. These include HBase, Amazon SimpleDB (cloud-based), Project Valdemort, Cassandra, Hypertable, etc

(3) Simple Structure storage
  • Applications that have a structure storage requirement but do not need features, cost and complexity of RDBMSs neither have very high scalability requirements.
  • Some implementations include:
  • Facebook: email inbox search (Cassandra)
  • Amazon: retail shopping card (Dynamo)
  • Berkeey DB

(4) Purpose-Optimized stores
  • Mike Stonebraker argued that the existing commercial RDBMS offerings do not meet the needs of many important market segments
  • Some special purpose real-time, stream processing solutions (StreamBase, Vertica, VoltDB) have beat the RDBMS benchmart by +30x...

Mike Stonebraker, One Size fits all