Last week, I was at Percona Live in San Francisco. One of the things that struck me was the number of sessions devoted to “sharding” at this event and other similar events. By my count, there were 7 sessions with sharding in their name and others that discussed it as part of broader scalability topics. Database scalability is clearly a big problem. While many people are proud of their sharding implementations, let’s face it; Sharding Sucks!

If you are wondering what this horrible picture is, it’s an example of a simple sharding based solution.
Ideally, you want the application developer to be focusing on the big red box on the very top, the application. Instead sharding forces developers to expend an enormous amount of their effort in developing and maintaining complex and fragile sharding infrastructure (the gray area between the application and the database).
Even if you use one of the many “sharding-in-a-box” solutions out there, life is just a little bit better. It still sucks!
The application still needs to deal with the fact that the data is silo’ed. Queries still must be directed to the correct shard or shards. Results from multiple shards still must be cobbled together and returned to the application.
In some cases, the application has to take care of some duplicate elimination and sorting! If you happen to need to combine data from multiple shards for a logical join, application code needs to be written to handle it. This sucks!
If all the developers in your company have to attend all those sessions at Percona Live to understand sharding, don’t you want a simpler solution!
Applications should not have to deal with all this complexity. They should pose questions to the data tier in a simple, standard language and receive results that they can work on. Anything less than that is a hack!
Scale-out parallel databases like Netezza or Teradata solve these problems elegantly. They present the application with a simple database interface and transparently handle all the inner working and implications of horizontal scale out. This is why we architected ParElastic as a true parallel database built on unmodified rdbms servers and not as just yet another sharding library. Sharding Sucks!



Add new comment