Recently when I was at Percona Live, I sat through nine sessions on sharding from big and successful companies like Twitter and Pinterest. And there is no doubt about it that sharding was an integral part of their success so far.
Recently when I was at Percona Live, I sat through nine sessions on sharding from big and successful companies like Twitter and Pinterest. And there is no doubt about it that sharding was an integral part of their success so far.
In the previous blog post, I presented an illustration of one of the issues that you find with sharding where the application needs to include code to do things that the database should do.
Here is a common problem that people face with sharded databases, illustrated with a simple schema. Assume that a database contains sales data, one row per sale. And salesmen were organized into regions, North, South, East and West. [...]
In the last blog post I described the differences between the terms “Parallel Database” and “Sharding”. In this post I’d like to illustrate some of the complexity that sharding introduces to the application.
The parallel database architecture has several benefits over a sharding approach. In a parallel database, [...]
Recently, as I have described the ParElastic architecture, I have been asked how Sharding is different from a Parallel Database. They are similar concepts, the block diagram looks similar and the confusion is understandable.
It occurs to me that the best answer to the question is this,
A ‘parallel database’ is a database architecture, sharding is
an application architecture.
Put slightly differently, parallelism is a database architecture choice (another choice being Symmetric Multiprocessing or SMP). From the perspective of the database client, what you see is a single database. The fact that data is partitioned and that a collection of servers work collaboratively to process queries is a aspect of the working of the parallel database. A query submitted to a parallel database targets all the data and the result stream is the “answer” to the query. [...]
In January, I attended a presentation by Prof. Donald Kossmann (Professor at ETH Zurich) entitled “Predictable Performance for Unpredictable Workloads”. In this presentation, Prof. Kossmann described SwissBox, a highly unconventional “database appliance designed to process thousands of concurrent queries and updates with bounded query response times and strict data freshness guarantees”.
One of the examples that was presented was the dramatic change in traffic faced by Amadeus a global computerized reservation system, after the eruption of Mt. Eyjafjallajökull in Iceland. [...]