Blog

The Future of Relational Databases (and NoSQL)

Robin Schumacher of Datastax recently penned a thought provoking article entitled “Why NoSQL Can Be Safer than an RDBMS”. In it, he responds to an article by Sean Doherty of EnterpriseDB entitled “The Future of Enterprise Data: RDBMS Will Be There”.

My first thought was, here-we-go-again. But I quickly realized that these weren’t quick predictions of the demise of the RDBMS or NoSQL, both authors were making the case that RDBMS and NoSQL would live forever. But, they both made some other very interesting statements. Some of them brought a smile and some of them had me scratching my head, a lot.

For example, Sean Doherty says,

But, here’s what happens when new technologies like these appear. Database companies review and evaluate the new technology, determine its value and viability, and then integrate it into their relational database products, giving the customer the best of all worlds.

I believe this is true and have written in detail about this earlier (see The NoSQL vs. SQL hoopla, another turn of the screw!)

Robin claims that

[Referring to the incorporation of new innovations into traditional RDBMS] I don’t see this happening with a NoSQL technology like Cassandra because the fundamental architecture of relational engines cannot support the same use cases as Cassandra. RDBMSs were built to scale up not out for both reads and writes. They are best at handling structured data, offer high but not continuous availability, and are lousy at easily entering, distributing and synchronizing data that is widely dispersed from a geographical standpoint.

He is correct in stating that RDBMS’ were originally designed for scale-up, but they were designed with both read and write workloads in mind. It is however entirely incorrect to contend that relational databases are not suitable for scale-out, and geographically distributed scale-out.

As a very pertinent example of this, we just demonstrated the ability to ingest data at the rate of a million rows a second into a distributed set of MySQL database nodes being controlled by the ParElastic ® Database Virtualization Engine ™. And all the data is inserted using standard SQL INSERT statements and can be queried at all time with standard SQL statements. ParElastic virtualizes the underlying databases and makes them appear as a single database to the application/user. Read more about that here. Yes, that is 1,000,000 rows per second into a standard fully ACID set of MySQL databases.

With a schema that relaxes the same limitations that are required for Cassandra to demonstrate its geographical scale-out, a scale-out RDBMS system can achieve the exact same characteristics of scale-out.

Cassandra achieves its scale-out capabilities (in part) because of relaxation of constraints on data consistency. The same relaxation, when applied to a distributed RDBMS based infrastructure, produces similar scale-out benefits!

It is also a complete mischaracterization (good marketing material though) to claim that RDBMS rely on an older master/slave design. Here is what Robin has to say in this regard:

This is why data in Cassandra can actually be safer than in an RDBMS, which relies on an older master/slave design that can’t support a true active-everywhere environment.

 

Multi-master replication is a well-known and well use concept with traditional relational databases

It is therefore a blatant fallacy that RDBMS cannot support active-everywhere environments. The exact same laws of Physics that apply to RDBMS also apply to NoSQL solutions. People often forget that the CAP Theorem (or Brewer’s Conjecture) applies equally to NoSQL solutions as it does to RDBMS; it applies to any distributed data store. If your data is partitionable and you don’t have foreign keys that span shards, you can relax consistency in a RDBMS as well!

Conclusion

George Santayana said,"Those who cannot remember the past are condemned to repeat it". This is surprisingly appropriate in the present context.

As Relational Databases evolve and better address some of the issues of scalability, people will no longer have to look to NoSQL solutions.

Robin writes that

[Referring to two customers] They started out with an RDBMS for their application and quickly hit scaling and performance walls that an RDBMS couldn’t overcome. Enter NoSQL and Cassandra. Today, each company is handling big data use cases with ease.

Those customers would be able to use a simple scalable RDBMS based solution and the necessity to adopt NoSQL will go away.

NoSQL solutions are “special purpose solutions” and as we saw with OLAP and Object Databases in days past, they will be relegated to a niche as RDBMS become more capable. I agree with Sean that in the end, NoSQL (a special purpose solution) will in fact go the way of ODBMS, XML databases and OLAP/MDX.

 

Comments

The NoSQL guys are trying to support full blown ACID, as opposed to picking two of the Eric Brewer's CAP theorem, as also support a standard SQL interface. Hortonworks has this project Stinger which is to convert Hive to standard SQL (http://hortonworks.com/labs/stinger/). They recently announced intent to support ACID (http://hortonworks.com/blog/adding-acid-to-apache-hive/).

It is not easy to just change an ecosystem of ETL, reporting tools which are tuned to standard SQL and relational model for decades to get used to a totally new model.

Companies like FB and LinkedIn can always get used to this kind of stuff because they develop their own database and they are the customers too.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.