Apache Geode CHANGELOG

Restoring Redundancy in Partitioned Regions

Restoring redundancy is a member operation. It affects all partitioned regions defined by the member, regardless of whether the member hosts data for the regions.

Restoring redundancy creates new redundant copies of buckets on members hosting the region and by default reassigns which members host the primary buckets to give better load balancing. It does not move buckets from one member to another. The reassignment of primary hosts can be prevented using the appropriate flags, as described below. See Configure High Availability for a Partitioned Region for further detail on redundancy.

For efficiency, when starting multiple members, trigger the restore redundancy a single time, after you have added all members.

Initiate a restore redundancy operation using one of the following:

  • gfsh command. First, starting a gfsh prompt and connect to the cluster. Then type the following command:

    gfsh>restore redundancy

    Optionally, you can specify regions to include or exclude from restoring redundancy, and prevent the operation from reassigning which members host primary copies. Type help restore redundancy or see restore redundancy for more information.

  • API call:

    ResourceManager manager = cache.getResourceManager();
    CompletableFuture<RestoreRedundancyResults> future = manager.createRestoreRedundancyOperation()
    //Get the results
    RestoreRedundancyResults results = future.get();
    //These are some of the details we can get about the run from the API
    System.out.println("Restore redundancy operation status is " + results.getStatus());
    System.out.println("Results for each included region: " + results.getMessage());
    System.out.println("Number of regions with no redundant copies: " + results.getZeroRedundancyRegionResults().size();
    System.out.println("Results for region " + regionName + ": " + results.getRegionResult(regionName).getMessage();

If you have start-recovery-delay=-1 configured for your partitioned region, you will need to perform a restore redundancy operation on your region after you restart any members in your cluster in order to recover any lost redundancy.

If you have start-recovery-delay set to a low number, you may need to wait extra time until the region has recovered redundancy.