Troubleshooting and System Recovery
This section provides strategies for handling common errors and failure situations.
Producing Artifacts for Troubleshooting
There are several types of files that are critical for troubleshooting.
-
This section provides possible causes and suggested responses for system problems.
-
This section describes alerts for and appropriate responses to various kinds of system failures. It also helps you plan a strategy for data recovery.
Handling Forced Cache Disconnection Using Autoreconnect
A Geode member may be forcibly disconnected from a cluster if the member is unresponsive for a period of time, or if a network partition separates one or more members into a group that is too small to act as the cluster.
Recovering from Application and Cache Server Crashes
When the application or cache server crashes, its local cache is lost, and any resources it owned (for example, distributed locks) are released. The member must recreate its local cache upon recovery.
Recovering from Machine Crashes
When a machine crashes because of a shutdown, power loss, hardware failure, or operating system failure, all of its applications and cache servers and their local caches are lost.
Recovering from ConflictingPersistentDataExceptions
A
ConflictingPersistentDataException
while starting up persistent members indicates that you have multiple copies of some persistent data, and Geode cannot determine which copy to use.Preventing and Recovering from Disk Full Errors
It is important to monitor the disk usage of Geode members. If a member lacks sufficient disk space for a disk store, the member attempts to shut down the disk store and its associated cache, and logs an error message. A shutdown due to a member running out of disk space can cause loss of data, data file corruption, log file corruption and other error conditions that can negatively impact your applications.
Understanding and Recovering from Network Outages
The safest response to a network outage is to restart all the processes and bring up a fresh data set.
-
This section provides strategies for responding to a variety of system log messages.