In the earlier post on NoSQL and the #nonsensql hoopla, we talked about general purpose and specialized data management solutions.

In technology, both specialized and general-purpose solutions have their place. Specialized solutions can be highly optimized for a particular use case, but tend to add complexity or performance penalties for other use cases. General-purpose solutions may not offer optimal performance for all use cases, but typically provide good performance for most use cases.

In the long run, thanks to Moore’s Law, general-purpose technology solutions have a big advantage. The systems that at one time required extreme performance and could only be accomplished with specialized solutions can be handled by general purpose solutions as systems become fast enough to handle them.

This can be seen in the evolution of data management systems. In the database realm, Relational databases epitomize generalization. The application need only know of the logical grouping of data elements (columns in a table) and express questions in a standard query language (structured query language or SQL).

NoSQL solutions like flat file (CICS/ISAM) applications, object databases and cubes before them require the application to know not only the physical representation of data but also maintain the integrity of the data. These solutions epitomize specialization where the application has an intimate knowledge of the data tier that has been optimized for the application at hand.



The picture above illustrates this spectrum (click on the picture for a larger image).

Specialization has larger consequences than just having to program to a custom API, or having to find people with specific talent, or having vendor lock in. Once you specialize your data representation to a specific use case, should you ever need to come up with a query that doesn’t meet this representation your application begins to fall apart.

Consider the case of this blog, hosted on an Open Source Content Management System (CMS), WordPress which stores all its content in a MySQL database. The underlying representation of the blog could just as well have been a document store as each blog post is nothing more than a document. Each comment would be stored right next to the blog post. I’m positive that one could make such a data store perform better than the general purpose RDBMS we use.

However, when one wanted to include the Tag Cloud module, performance would largely fall off a cliff as each attempt to refresh the tag cloud would read the entire database!

General-purpose solutions provide much better performance over a wide range of applications and are much easier to adapt and evolve as the demands of the application evolve. That is why we keep returning to the general purpose relational solution once we have our short infatuation with the “shiny new object”.

We did it with object databases, we did it with OLAP cubes and now history is repeating itself with NoSQL!