The Basics of Database Sharding

Channel: BrentOzar   |   2013/05/01
Play Video
1
The Basics of Database Sharding
The Basics of Database Sharding
::2013/05/01::
Play Video
2
An MPP database for the cloud using the LAMP stack and Shard-Query
An MPP database for the cloud using the LAMP stack and Shard-Query
::2014/03/25::
Play Video
3
Building a scalable row, column or document-style store with MySQL and Shard-Query
Building a scalable row, column or document-style store with MySQL and Shard-Query
::2013/11/19::
Play Video
4
Surge 2010 ~ Why Some Architects Almost Never Shard Their Applications
Surge 2010 ~ Why Some Architects Almost Never Shard Their Applications
::2011/12/21::
Play Video
5
Introduction to Shard Query the MPP distributed query engine for MySQL  - PART 1
Introduction to Shard Query the MPP distributed query engine for MySQL - PART 1
::2013/10/14::
Play Video
6
15a  Building a Sharded Environment
15a Building a Sharded Environment
::2014/02/04::
Play Video
7
O
O'Reilly Webcast: How Sharding Works
::2011/02/05::
Play Video
8
Video 02   MySQL Architecture Part 1
Video 02 MySQL Architecture Part 1
::2014/01/19::
Play Video
9
Scaling Rails Applications: Scaling Your Database / Part 2 (Episode #17)
Scaling Rails Applications: Scaling Your Database / Part 2 (Episode #17)
::2013/02/25::
Play Video
10
Scaling Your Database in the Cloud
Scaling Your Database in the Cloud
::2012/06/20::
Play Video
11
Typical deployment diagram
Typical deployment diagram
::2013/01/02::
Play Video
12
Running an E-Commerce Database In The Cloud
Running an E-Commerce Database In The Cloud
::2013/11/20::
Play Video
13
Distributed Database - Replication, Query Processing and Concurrency Control
Distributed Database - Replication, Query Processing and Concurrency Control
::2013/10/14::
Play Video
14
Stock Trading and Analysis using Oracle NoSQL Database
Stock Trading and Analysis using Oracle NoSQL Database
::2013/07/15::
Play Video
15
Dayz - First Look & Dying From Starvation Like a Noob
Dayz - First Look & Dying From Starvation Like a Noob
::2013/12/26::
Play Video
16
Scaling Solr with SolrCloud , Rafal Kuć, Sematext
Scaling Solr with SolrCloud , Rafal Kuć, Sematext
::2013/11/28::
Play Video
17
Victoria Dudin: Scaling MySQL with Transparent Database Sharding
Victoria Dudin: Scaling MySQL with Transparent Database Sharding
::2012/03/19::
Play Video
18
FOSDEM 2009 MySQL High Availability Solutions
FOSDEM 2009 MySQL High Availability Solutions
::2009/03/25::
Play Video
19
Open Mic Night
Open Mic Night
::2012/09/24::
Play Video
20
Administering and Monitoring SolrCloud Clusters, Rafal Kuć, Consultant Software Engineer, Sematext
Administering and Monitoring SolrCloud Clusters, Rafal Kuć, Consultant Software Engineer, Sematext
::2013/12/13::
Play Video
21
Garantia Data - The in-memory NOSQL Cloud
Garantia Data - The in-memory NOSQL Cloud
::2012/07/24::
Play Video
22
How should scalability of MySQL in the cloud work? |  Marc Shrewood- - SkySQL  |  MySQLconf
How should scalability of MySQL in the cloud work? | Marc Shrewood- - SkySQL | MySQLconf
::2011/06/06::
Play Video
23
MySQL scalability without sharding! Joins 10-100x faster!
MySQL scalability without sharding! Joins 10-100x faster!
::2012/08/21::
Play Video
24
#bbuzz: Ryan Zezeski "Yokozuna, Scaling Solr with Riak"
#bbuzz: Ryan Zezeski "Yokozuna, Scaling Solr with Riak"
::2013/06/19::
Play Video
25
Brahe - Mass scale flexible indexing
Brahe - Mass scale flexible indexing
::2013/05/22::
Play Video
26
CUBRID Manager - Part I - CUBRID Trainings Channel
CUBRID Manager - Part I - CUBRID Trainings Channel
::2010/05/10::
Play Video
27
Lessons Learned: Refactoring a Solr-Based API App, Torsten Koester, smatch.com/ Shopping24
Lessons Learned: Refactoring a Solr-Based API App, Torsten Koester, smatch.com/ Shopping24
::2013/11/11::
Play Video
28
O
O'Reilly Webcast: How Pinterest Architected and Built Their Sharded MySQL Datastore
::2012/07/08::
Play Video
29
Monitoring and Managing a Complex Data Cloud
Monitoring and Managing a Complex Data Cloud
::2013/02/01::
Play Video
30
MongoDB Episode 1 - Sharding - Part 3
MongoDB Episode 1 - Sharding - Part 3
::2012/02/09::
Play Video
31
Meetup: Building and scaling web applications
Meetup: Building and scaling web applications
::2013/11/12::
Play Video
32
Ashutosh Bapat: Scaling out by distributing and replicating data in Postgres XC
Ashutosh Bapat: Scaling out by distributing and replicating data in Postgres XC
::2013/08/25::
Play Video
33
MySQL 5.7 - What sh!t we done broke & what we want you to do instead
MySQL 5.7 - What sh!t we done broke & what we want you to do instead
::2014/01/23::
Play Video
34
Big Data Scalability - Under 20 Seconds: Why Sharding Works
Big Data Scalability - Under 20 Seconds: Why Sharding Works
::2012/09/28::
Play Video
35
Successful and Cost Effective Data Warehouse... The MySQL Way
Successful and Cost Effective Data Warehouse... The MySQL Way
::2010/04/16::
Play Video
36
Luka Lesson - The Future Ancients
Luka Lesson - The Future Ancients
::2013/11/01::
Play Video
37
SolrCloud: Example A
SolrCloud: Example A
::2011/07/05::
Play Video
38
How To Make A Web Based Proxy (Glype v1.4.4) Easy Installation
How To Make A Web Based Proxy (Glype v1.4.4) Easy Installation
::2013/03/25::
Play Video
39
Building a Near Real-time Search Engine and Analytics for logs using Solr
Building a Near Real-time Search Engine and Analytics for logs using Solr
::2013/05/21::
Play Video
40
Jon Hoffman - MongoDB Days 2013 - theCUBE
Jon Hoffman - MongoDB Days 2013 - theCUBE
::2013/06/23::
Play Video
41
WEBINAR: The Easy Way to Manage your Sharded MongoDB Setup on Any Cloud
WEBINAR: The Easy Way to Manage your Sharded MongoDB Setup on Any Cloud
::2012/04/29::
Play Video
42
Spider:  Sharding for the Masses
Spider: Sharding for the Masses
::2009/08/26::
Play Video
43
Surge 2010 ~ Panel Discussion: "SQL vs NoSQL"
Surge 2010 ~ Panel Discussion: "SQL vs NoSQL"
::2011/12/21::
Play Video
44
Surge 2010 ~ Availability, the Cloud and Everything
Surge 2010 ~ Availability, the Cloud and Everything
::2011/12/21::
Play Video
45
Surge 2010 ~ From disaster to stability: scaling challenges of my.opera.com
Surge 2010 ~ From disaster to stability: scaling challenges of my.opera.com
::2011/12/21::
Play Video
46
Store Hierarchial Data
Store Hierarchial Data
::2013/09/11::
Play Video
47
Surge 2010 ~ The most common MySQL scalability mistakes, and how to avoid them.
Surge 2010 ~ The most common MySQL scalability mistakes, and how to avoid them.
::2011/12/21::
Play Video
48
Tekkit Lite Ep 3: I
Tekkit Lite Ep 3: I'm sharding
::2013/07/17::
Play Video
49
Alex Alexander  MySQL MHA vs Continuent Tungsten Comparison Use Full Screen Mode Please
Alex Alexander MySQL MHA vs Continuent Tungsten Comparison Use Full Screen Mode Please
::2012/05/03::
Play Video
50
Elasticsearch Einführung - Teil 5 - Replication & Sharding
Elasticsearch Einführung - Teil 5 - Replication & Sharding
::2013/04/24::
NEXT >>
RESULTS [51 .. 101]
From Wikipedia, the free encyclopedia
Jump to: navigation, search

A database shard is a horizontal partition in a database or search engine. Each individual partition is referred to as a shard or database shard.

Database architecture[edit]

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

There are numerous advantages to this partitioning approach. Since the tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, which means that the database performance can be spread out over multiple machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.[1]

In practice, sharding is far more complex. Although it has been done for a long time by hand-coding (especially where rows have an obvious grouping, as per the example above), this is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. Consistent hashing is one form of automatic sharding to spread large loads across multiple smaller services and servers.[2][3]

Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a shard approach may also be useful.

Shards compared to horizontal partitioning[edit]

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which table a particular row will be found, without first needing to search the index, e.g., the classic example of the 'CustomersEast' and 'CustomersWest' tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The hoped-for gains in efficiency would be lost, if querying the database required both instances to be queried, just to retrieve a simple dimension table. Beyond partitioning, sharding thus splits large partitionable tables across the servers, while smaller tables are replicated as complete units.

This is also why sharding is related to a shared nothing architecture—once sharded, each shard can live in a totally separate logical schema instance / physical database server / data center / continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.

This makes replication across multiple servers easy (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers would otherwise be a bottleneck.

There is also a requirement for some notification and replication mechanism between schema instances, so that the unpartitioned tables remain as closely synchronized as the application demands. This is a complex choice in the architecture of sharded systems: approaches range from making these effectively read-only (updates are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding) and many options in between.

Support for shards[edit]

CUBRID
CUBRID supports sharding from version 9.0
dbShards
CodeFutures dbShards is a product dedicated to database shards.[4]
eXtreme Scale
eXtreme Scale is a cross-process in-memory key/value datastore (a variety of NoSQL datastore). It uses sharding to achieve scalability across processes for both data and MapReduce-style parallel processing.[5]
Hibernate ORM
Hibernate Shards provides support for shards, although there has been little activity since 2007.[6][7]
IBM Informix
IBM supports sharding in Informix since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing the mix of regular relational tables with NoSQL collections, still supporting sharding, failover and ACID properties.[8][9]
MongoDB
MongoDB supports sharding from version 1.6
MySQL Cluster
Auto-Sharding: Database is automatically and transparently partitioned across low cost commodity nodes, allowing scale-out of read and write queries, without requiring changes to the application.[10]
Plugin for Grails
Grails supports sharding using the Grails Sharding Plugin.[11]
Ruby ActiveRecord
Octopus works as a database sharding and replication extension for the ActiveRecord ORM.
ScaleBase's Data Traffic Manager
ScaleBase's Data Traffic Manager is a software product dedicated to automating MySQL database sharding without requiring changes to applications.[12]
Solr Search Server
Solr enterprise search server provides sharding capabilities.[13]
Spanner
Spanner is Google's global scale distributed database that shards data across multiple Paxos state machines to scale to "millions of machines across hundreds of datacenters and trillions of database rows".[14]
SQLAlchemy ORM
SQLAlchemy is an object-relational mapper for the Python programming language that provides sharding capabilities.[15]
SQL Azure
Microsoft supports sharding in SQL Azure through "federations".[16][17]

Disadvantages of sharding[edit]

Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate. The introduced complexity of database sharding causes the following potential problems:

  • Increased complexity of SQL - Increased bugs because the developers have to write more complicated SQL to handle sharding logic.
  • Sharding introduces complexity - The sharding software that partitions, balances, coordinates, and ensures integrity can fail.
  • Single point of failure - Corruption of one shard due to network/hardware/systems problems causes failure of the entire table.
  • Failover servers more complex - Failover servers must themselves have copies of the fleets of database shards.
  • Backups more complex - Database backups of the individual shards must be coordinated with the backups of the other shards.
  • Operational complexity added - Adding/removing indexes, adding/deleting columns, modifying the schema becomes much more difficult.

These historical complications of do-it-yourself sharding are now being addressed by independent software vendors who provide autosharding solutions.

See also[edit]

References[edit]

  1. ^ Rahul Roy (July 28, 2008). "Shard - A Database Design". 
  2. ^ Facebook. "Consistent Hashing". 
  3. ^ Ries, Eric. "Sharding for Startups". 
  4. ^ "dbShards product overview". 
  5. ^ http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r1/index.jsp?topic=%2Fcom.ibm.websphere.extremescale.over.doc%2Fcxsovwork.html
  6. ^ "Hibernate Shards". 2/ 8/2007. 
  7. ^ "Hibernate Shards". 
  8. ^ "New Grid queries for Informix". 
  9. ^ "NoSQL support in Informix". 
  10. ^ "MySQL Cluster Features & Benefits". 2012-11-23. 
  11. ^ "Grails Sharding Plugin". 
  12. ^ "ScaleBase's Data Traffic Manager product architecture overview". 
  13. ^ "Distributed Search". 
  14. ^ Corbett, James C; Dean, Jeffrey; Epstein, Michael; Fikes, Andrew; Frost, Christopher; Furman, JJ; Ghemawat, Sanjay; Gubarev, Andrey; Heiser, Christopher; Hochschild, Peter; Hsieh, Wilson; Kanthak, Sebastian; Kogan, Eugene; Li, Hongyi; Lloyd, Alexander; Melnik, Sergey; Mwaura, David; Nagle, David; Quinlan, Sean; Rao, Rajesh; Rolig, Lindsay; Saito, Yasushi; Szymaniak, Michal; Taylor, Christopher; Wang, Ruth; Woodford, Dale. "Spanner: Google’s Globally-Distributed Database". Proceedings of OSDI 2012. Google. Retrieved 24 February 2014. 
  15. ^ "Basic example of using the SQLAlchemy Sharding API.". 
  16. ^ "Federations in SQL Azure: Database Solutions with Unlimited Scalability". 
  17. ^ "Federations in SQL Azure". 

Informix JSON data sharding

Wikipedia content is licensed under the GFDL License

Mashpedia enables any individual or company to promote their own Youtube-hosted videos or Youtube Channels, offering a simple and effective plan to get them in front of our engaged audience.

Want to learn more? Please contact us at: hello@mashpedia.com

Powered by YouTube
LEGAL
  • Mashpedia © 2014