Blog

The parallels between NoSQL and self-inflicted torture

It never ceases to amaze me how much pain and contortion people who use NoSQL databases need to go through.

Consider (for example) this blog post on the MongoHQ blog.

http://blog.mongohq.com/improve-performance-by-removing-query-logic/

Here's the executive summary for you (in relational database speak).

Q: I have an entity that tracks documents and a document has an attribute called publication status and this includes such things as 'draft', 'waiting proof', 'approved', 'published' and so on. I would like to get a list of documents that are still unpublished and that includes documents that have statuses of 'draft', 'waiting proof' or 'approved' (but not the status 'published'). Write me a program that would get these.

A: Denormalize the schema and add a new column called "unpublished" and introduce logic into your application that will set the unpublished field (a boolean) to be TRUE when the document publication status is one of the various not published values and then write a query that would search the table with the restriction:

AND UNPUBLISHED IS TRUE

And dear reader, you do this because the "performance" is better.

Here's the SQL variant based on a simple one table structure.

CREATE TABLE DOCUMENTS ( ... STATUS CHAR(30), ... );

SELECT ... FROM DOCUMENTS WHERE STATUS NOT IN ( 'published' );

Or maybe you need to enumerate the statuses for some reason such as that there are various 'published' statues.

SELECT ... FROM DOCUMENTS WHERE STATUS IN ( 'draft', 'waiting proof', 'approved' );

And you could create an index on STATUS and the select would be really quick. On a small 3 node ParElastic configuration running on Amazon AWS with 71 million documents, with each document record being approximately 850 bytes, I ran this query and sent the results off to a file in under 0.5s.

So here's the accounting for those of you who are keeping score at home.

Description

NoSQL

SQL

1. Write the obvious NoSQL program to do this as described in the MongoDB blog post (the thing called "a slow query in Mon­goDB")

a while

0 min

2.  Find that the obvious NoSQL program is slow and attempt to debug it

a while

0 min

3. Find the MongoDB blog post on the subject

a while

0 min

4. Write the code to manage the new field

a while

0 min

5. Write the code to query the new field

quick

0 min

6. Debug the program when someone else changes it and the magic field is not set right (database people call this anomalies when they discuss normalization)

a while

0 min

7. Write a simple query against a database

 

30 sec

8. Relaxing on the beach knowing that the program will perform properly …

0 min

a while

Tell me again, why is this NoSQL way of thinking a good idea?

Comments

That hard part is the sort - as Chris' said, see my blog at ilearnasigoalong.blogspot.com for details.

See Ilearnasigoalong.blogspot.co.UK for an explanation of how this problem and solution are exactly the same in an RDBMS and this isn't a NoSQL issue.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.