Structured Storage approaches

(Extracts from a good discussion posted by James Hamilton in his blog. You can check the original article and comments here.)

Although a couple of years old, James Hamilton provided a good requisite-based breakdown of data storage systems. These are some of his points:
  • The world of structured storage extends far beyond relational (Oracle, DB2, SQL Server, MySQL, NoSQL, etc) systems.
  • Many applications do not need the rich programming model of relational systems and some are better seviced by lighter-weight, easier-to-administers, and easier-to-scale solutions.

  • Structured storage approaches can be classified based on customer major requirements.
  • These are Feature-first, scale-first, simple structured storage and purpose-optimized stores.

(1) Feature-First
  • Traditional Relational database management systems (RDBMS) are the structured storage system of choice here.
  • Driven by requirements for Enterprise financial systems, human resource systems, customer relationship management systems (FIN, HR, CRMs)
    • Examples here include Oracle, MySQL, SQL Server, PostgreSQL, Sybase, DB2.
  • Cloud solutions here include:

(2) Scale-First
  • This is the domain of very high scale website (i.e. facebook, Gmail, Amazon, Yahoo, etc)
  • Scaling capabilities are more important than more features and none could run on a single rdbms.
  • The problem here is that the full relational database model (including joins, aggregations, use of stored procedures) is difficult to scale (especially in distributed contexts).
  • Distributing data across tens to thousands of rdbms instances and still maintain support for the distributed data as if it were under a single rdbms engine is difficult.
  • As an alternative, very high scale may be supported with the use of key-value store solutions. These include HBase, Amazon SimpleDB (cloud-based), Project Valdemort, Cassandra, Hypertable, etc

(3) Simple Structure storage
  • Applications that have a structure storage requirement but do not need features, cost and complexity of RDBMSs neither have very high scalability requirements.
  • Some implementations include:
  • Facebook: email inbox search (Cassandra)
  • Amazon: retail shopping card (Dynamo)
  • Berkeey DB

(4) Purpose-Optimized stores
  • Mike Stonebraker argued that the existing commercial RDBMS offerings do not meet the needs of many important market segments
  • Some special purpose real-time, stream processing solutions (StreamBase, Vertica, VoltDB) have beat the RDBMS benchmart by +30x...

Mike Stonebraker, One Size fits all

No comments:

Post a Comment