Creating Backups for System Recovery and Operational Management
A backup is a copy of persisted data from a disk store. A backup is used to restore the disk store to the state it was in when the backup was made. The appropriate back up and restore procedures differ based upon whether the distributed system is online or offline. An online system has currently running members. An offline system does not have any running members.
- Making a Backup While the System Is Online
- What a Full Online Backup Saves
- What an Incremental Online Backup Saves
- Disk Store Backup Directory Structure and Contents
- Offline Members—Manual Catch-Up to an Online Backup
- Restore Using a Backup Made While the System Was Online
The gfsh command
backup disk-store creates a backup of the disk stores for all members running in the distributed system. The backup works by passing commands to the running system members; therefore, the members need to be online for this operation to succeed. Each member with persistent data creates a backup of its own configuration and disk stores. The backup does not block any activities within the distributed system, but it does use resources.
Note: Do not try to create backup files from a running system by using your operating system’s file copy commands. This would create incomplete and unusable copies.
Preparing to Make a Backup
- Consider compacting your disk store before making a backup. If auto-compaction is turned off, you may want to do a manual compaction to save on the quantity of data copied over the network by the backup. For more information on configuring a manual compaction, see Manual Compaction.
- Take the backup when region operations are quiescent, to avoid the possibility of an inconsistency between region data and an asynchronous event queue (AEQ) or a WAN Gateway sender (which uses a persistent queue). A region operation that causes a persisted write to a region involves a disk operation. The associated queue operation also causes a disk operation. These two disk operations are not made atomically, so if a backup is made between the two disk operations, then the backup represents inconsistent data in the region and the queue.
- Run the backup during a period of low activity in your system. The backup does not block system activities, but it uses file system resources on all hosts in your distributed system, and it can affect performance.
Configure each member with any additional files or directories to be backed up by modifying the member’s
cache.xmlfile. Additional items that ought to be included in the backup:
- application jar files
- other files that the application needs when starting, such as a file that sets the classpath
For example, to include file
myExtraBackupStuffin the backup, the
cache.xmlfile specification of the data store would include:
Directories are recursively copied, with any disk stores that are found excluded from this user-specified backup.
Back up to a SAN (recommended) or to a directory that all members can access. Make sure the directory exists and has the proper permissions for all members to write to the directory and create subdirectories.
The directory specified for the backup can be used multiple times. Each time a backup is made, a new subdirectory is created within the specified directory, and that new subdirectory’s name represents the date and time.
You can use one of two locations for the backup:
a single physical location, such as a network file server, for example:
a directory that is local to all host machines in the system, for example:
Make sure all members with persistent data are running in the system, because offline members cannot back up their disk stores. Output from the backup command will not identify members hosting replicated regions that are offline.
How to Do a Full Online Backup
If auto-compaction is disabled, and manual compaction is needed:
gfsh>compact disk-store --name=Disk1
gfsh backup disk-storecommand, specifying the backup directory location. For example:
gfsh>backup disk-store --dir=/export/fileServerDirectory/gemfireBackupLocation
The output will list information for each member that has successfully backed up disk stores. The tabular information will contain the member’s name, its UUID, the directory backed up, and the host name of the member.
Any online member that fails to complete its backup will leave a file named
INCOMPLETE_BACKUPin its highest level backup directory. The existence of this file identifies that the backup file contains only a partial backup, and it cannot be used in a restore operation.
Validate the backup for later recovery use. On the command line, each backup can be checked with commands such as
cd 2010-04-10-11-35/straw_14871_53406_34322/diskstores/ds1 gfsh validate offline-disk-store --name=ds1 --disk-dirs=/home/dsmith/dir1
How to Do an Incremental Backup
An incremental backup contains items that have changed since a previous backup was made.
To do an incremental backup, specify the backup directory that the incremental backup will be based upon with the
--baseline-dir argument. For example:
gfsh>backup disk-store --dir=/export/fileServerDirectory/gemfireBackupLocation --baseline-dir=/export/fileServerDirectory/gemfireBackupLocation/2012-10-01-12-30
The output will appear the same as the output for a full online backup.
Any online member that fails to complete its incremental backup will leave a file named
INCOMPLETE_BACKUP in its highest level backup directory. The existence of this file identifies that the backup file contains only a partial backup, and it cannot be used in a restore operation. The next time a backup is made, a full backup will be made.
For each member with persistent data, a full backup includes the following:
- Disk store files for all members containing persistent region data.
Files and directories specified in the
cache.xmlconfiguration file as
<backup>elements. For example:
Deployed JAR files that were deployed using the gfsh deploy command.
Configuration files from the member startup.
gemfire.properties, including the properties with which the member was started.
cache.xml, if used.
These configuration files are not automatically restored, to avoid interfering with more recent configurations. In particular, if these are extracted from a master
jarfile, copying the separate files into your working area can override the files in the
jar. If you want to back up and restore these files, add them as custom
A restore script, called
restore.baton Windows, and called
restore.shon Linux. This script may later be used to do a restore. The script copies files back to their original locations.
An incremental backup saves the difference between the last backup and the current data. An incremental backup copies only operations logs that are not already present in the baseline directories for each member. For incremental backups, the restore script contains explicit references to operation logs in one or more previously chained incremental backups. When the restore script is run from an incremental backup, it also restores the operation logs from previous incremental backups that are part of the backup chain.
If members are missing from the baseline directory because they were offline or did not exist at the time of the baseline backup, those members place full backups of all their files into the incremental backup directory.
$ cd thebackupdir $ ls -R ./2012-10-18-13-44-53: dasmith_e6410_server1_8623_v1_33892 dasmith_e6410_server2_8940_v2_45565 ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892: config diskstores README.txt restore.sh user ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/config: cache.xml ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores: DEFAULT ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores/DEFAULT: dir0 ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores/DEFAULT/dir0: BACKUPDEFAULT_1.crf BACKUPDEFAULT_1.drf BACKUPDEFAULT.if ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/user:
If you must have a member offline during an online backup, you can manually back up its disk stores. Bring this member’s files into the online backup framework manually, and create a restore script by hand starting with a copy of another member’s script:
- Duplicate the directory structure of a backed up member for this member.
- Rename directories as needed to reflect this member’s particular backup, including disk store names.
- Clear out all files other than the restore script.
- Copy in this member’s files.
- Modify the restore script to work for this member.
restore.bat script copies files back to their original locations.
- Restore your disk stores while cache members are offline and the system is down.
- Look at each of the restore scripts to see where they will place the files and make sure the destination locations are ready. A restore script will refuse to copy over files with the same names.
- Run each restore script on the host where the backup originated.
The restore copies these files back to their original location:
- Disk store files for all stores containing persistent region data.
- Any files or directories you have configured to be backed up in the