Apache Geode CHANGELOG

How Replication and Preloading Work

To work with replicated and preloaded regions, you should understand how their data is initialized and maintained in the cache.

Replicated and preloaded regions are configured by using one of the REPLICATE region shortcut settings, or by setting the region attribute data-policy to replicate, persistent-replicate, or preloaded.

Initialization of Replicated and Preloaded Regions

At region creation, the system initializes the preloaded or replicated region with the most complete and up-to-date data set it can find. The system uses these data sources to initialize the new region, following this order of preference:

  1. Another replicated region that is already defined in the cluster.
  2. For persistent replicate only. Disk files, followed by a union of all copies of the region in the distributed cache.
  3. For preloaded region only. Another preloaded region that is already defined in the cluster.
  4. The union of all copies of the region in the distributed cache.

While a region is being initialized from a replicated or preloaded region, if the source region crashes, the initialization starts over.

If a union of regions is used for initialization, as in the figure, and one of the individual source regions goes away during the initialization (due to cache closure, member crash, or region destruction), the new region may contain a partial data set from the crashed source region. When this happens, there is no warning logged or exception thrown. The new region still has a complete set of the remaining members’ regions.

Behavior of Replicated and Preloaded Regions After Initialization

Once initialized, the preloaded region operates like the region with a normal data-policy, receiving distributions only for entries it has defined in the local cache.

If the region is configured as a replicated region, it receives all new creations in the distributed region from the other members. This is the push distribution model. Unlike the preloaded region, the replicated region has a contract that states it will hold all entries that are present anywhere in the distributed region.