Although a couple of years old, James Hamilton provided a good requisite-based breakdown of data storage systems. These are some of his points:
- The world of structured storage extends far beyond relational (Oracle, DB2, SQL Server, MySQL, NoSQL, etc) systems.
- Many applications do not need the rich programming model of relational systems and some are better seviced by lighter-weight, easier-to-administers, and easier-to-scale solutions.
- Structured storage approaches can be classified based on customer major requirements.
- These are Feature-first, scale-first, simple structured storage and purpose-optimized stores.
(1) Feature-First
- Traditional Relational database management systems (RDBMS) are the structured storage system of choice here.
- Driven by requirements for Enterprise financial systems, human resource systems, customer relationship management systems (FIN, HR, CRMs)
- Examples here include Oracle, MySQL, SQL Server, PostgreSQL, Sybase, DB2.
- Cloud solutions here include:
- Amazon RDS (Relational Database Service) is a cloud-based solution that basically makes availble from the cloud the functionality of Oracle or MySQL databases.
- Microsoft SQL Azure is in the same line.
- Oracle Public Cloud, just launched at the 2011 Oracle OpenWorld
(2) Scale-First
- This is the domain of very high scale website (i.e. facebook, Gmail, Amazon, Yahoo, etc)
- Scaling capabilities are more important than more features and none could run on a single rdbms.
- The problem here is that the full relational database model (including joins, aggregations, use of stored procedures) is difficult to scale (especially in distributed contexts).
- Distributing data across tens to thousands of rdbms instances and still maintain support for the distributed data as if it were under a single rdbms engine is difficult. As an alternative, very high scale may be supported with the use of key-value store solutions. These include HBase, Amazon SimpleDB (cloud-based), Project Valdemort, Cassandra, Hypertable, etc
(3) Simple Structure storage
- Applications that have a structure storage requirement but do not need features, cost and complexity of RDBMSs neither have very high scalability requirements.
- Some implementations include:
- Facebook: email inbox search (Cassandra)
- Amazon: retail shopping card (Dynamo)
- Berkeey DB
(4) Purpose-Optimized stores
- Mike Stonebraker argued that the existing commercial RDBMS offerings do not meet the needs of many important market segments
- Some special purpose real-time, stream processing solutions (StreamBase, Vertica, VoltDB) have beat the RDBMS benchmart by +30x...
Readings:
Mike Stonebraker, One Size fits all