Understanding Custom Partitioning and Data Colocation
Custom partitioning and data colocation can be used separately or in conjunction with one another.
Use custom partitioning to group like entries into region buckets within a region. By default, Geode assigns new entries to buckets based on the entry key’s hash code. With custom partitioning, you can assign your entries to buckets in whatever way you want.
You can generally get better performance if you use custom partitioning to group similar data within a region. For example, a query run on all accounts created in January runs faster if all January account data is hosted by a single member. Grouping all data for a single customer can improve performance of data operations that work on customer data. Data aware function execution also takes advantage of custom partitioning.
With custom partitioning, you have two choices:
- Standard custom partitioning. With standard custom partitioning, you group entries into buckets, but you do not specify where the buckets reside. Geode always keeps the entries in the buckets you have specified, but may move the buckets around for load balancing. See Standard Custom Partitioning for implementation and configuration details.
Fixed custom partitioning. With fixed custom partitioning, you specify the exact member where each region entry resides. You assign an entry to a partition and then to a bucket within that partition. You name specific members as primary and secondary hosts of each partition.
This gives you complete control over the locations of your primary and any secondary buckets for the region. This can be useful when you want to store specific data on specific physical machines or when you need to keep data close to certain hardware elements.
Fixed partitioning has these requirements and caveats:
- Geode cannot rebalance fixed partition region data, because it cannot move the buckets around among the host members. You must carefully consider your expected data loads for the partitions you create.
- With fixed partitioning, the region configuration is different between host members. Each member identifies the named partitions it hosts, and whether it is hosting the primary copy or a secondary copy. You then program a fixed-partition resolver to return the partition id, so the entry is placed on the right members. Only one member can be primary for a particular partition name, and that member cannot be the partition’s secondary.
See Fixed Custom Partitioning for implementation and configuration details.
With data colocation, Geode stores entries that are related across multiple data regions in a single member. Geode does this by storing all of the regions’ buckets with the same ID together in the same member. During rebalancing operations, Geode moves these bucket groups together or not at all.
So, for example, if you have one region with customer contact information and another region with customer orders, you can use colocation to keep all contact information and all orders for a single customer in a single member. This way, any operation done for a single customer uses the cache of only a single member.
This figure shows two regions with data colocation where the data is partitioned by customer type.
Data colocation requires the same data partitioning mechanism for all of the colocated regions. You can use the default partitioning provided by Geode or any of the custom partitioning strategies.
You must use the same high availability settings across your colocated regions.
See Colocate Data from Different Partitioned Regions for implementation and configuration details.