Key-Range Partitions

Partition data in sorted key ranges to efficiently handle range queries.

Problem

To split data across a set of cluster nodes, each data item needs to be mapped to a node. If users want to query a range of keys, specifying only the start and end key, all partitions will need to be queried for the values to be acquired. Querying every partition for a single request is far from optimal.

Take a key-value store as an example. We can store the author names using hash-based mapping, as in Fixed Partitions.

KeysHashPartition †Node
alice13329981961369446064419793803145191220800
bob6347973842901524673835900045302204729111
mary3772485630403578937249017108484324112651
philip8398096373121616050667119639833941886622

† Partition = Hash % Number of partitions (9)

If a user wants to get values for a range of names—say, beginning with letter "a" to "f"—there's no way to know which partitions we should fetch data from if the hash of the key is being used to map keys to partitions. All partitions need to be queried to get the values required.

Solution

Create logical partitions for keys ranged in a sorted order. The partitions can then be mapped to cluster nodes. To query a range of data, the client can get all partitions that contain keys from a given range and query only those specific partitions to get the values required.

for more details go to Chapter 20 of the online ebook at oreilly.com

This pattern is part of Patterns of Distributed Systems

23 November 2023