Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters

With the beginning of the 21st century, web applications requirements dramatically increased in scale. Applications like social networks, ecommerce, and media sharing, started to generate lots of data traffic, and companies started to track this valuable data. The database systems responsible for storing all this information had to scale in order to handle the huge load. With the emergence of cloud computing, scaling out a database system has became an affordable solution, making data sharding a viable scalability option. But to benefit from data sharding, database designers have to identify the best manner to distribute data among the nodes of shared cluster. This paper discusses database sharding distribution models, specifically a technique known as hash partitioning. Our objective is to catalog in the format of a Database Scalability Pattern the best practice that consists in sharding the data among the nodes of a database cluster using the hash partitioning technique to nicely balance the load between the database servers. This way, we intend to make the mapping between the scenario and its solution publicly available, helping developers to identify when to adopt the pattern instead of other sharding techniques.

[1]  Florin Radulescu,et al.  MongoDB vs Oracle -- Database Comparison , 2012, 2012 Third International Conference on Emerging Intelligent Data and Web Technologies.

[2]  Yi Jin,et al.  Research on the improvement of MongoDB Auto-Sharding in cloud environment , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[3]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[4]  Ke Yin,et al.  Application research on a persistent technique based on Hibernate , 2010, 2010 International Conference On Computer Design and Applications.

[5]  Oliver Kopp,et al.  Non-functional data layer patterns for Cloud applications , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[6]  Daniel C. Zilio,et al.  Partitioning Key Selection for a Shared-nothing Parallel Database System , 1994 .

[7]  Martin Fowler,et al.  Patterns of Enterprise Application Architecture , 2002 .

[8]  Peter Sommerlad,et al.  Security Patterns: Integrating Security and Systems Engineering , 2006 .

[9]  Brian Adler Best Practices Building Scalable Applications In the Cloud , 2011 .

[10]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[11]  Erki Eessaar,et al.  On Pattern-Based Database Design and Implementation , 2008, 2008 Sixth International Conference on Software Engineering Research, Management and Applications.

[12]  Cyril S. Ku,et al.  Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[13]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[14]  Gregor Hohpe,et al.  Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions , 2003 .

[15]  Frank Leymann,et al.  An architectural pattern language of cloud-based applications , 2011, PLoP '11.

[16]  Ian Abramson,et al.  Oracle Database 11g A Beginner's Guide , 2008 .

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  Carolyn E. Begg,et al.  Database Systems: A Practical Approach to Design, Implementation and Management , 1998 .

[19]  Michael Stonebraker,et al.  10 rules for scalable performance in 'simple operation' datastores , 2011, Commun. ACM.

[20]  Kuljit Kaur,et al.  Performance analysis of reusable software systems , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[21]  M. Hafiz A collection of privacy design patterns , 2006, PLoP '06.

[22]  M. Blasgen Database Systems , 1982, Science.

[23]  日本オラクル,et al.  Oracle Database 11gセキュリティガイド , 2007 .

[24]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.