@Deprecated public final class SystemFailure extends Object
This class represents a catastrophic failure of the system, especially the Java virtual machine.
Any class may, at any time, indicate that a system failure has occurred by calling
initiateFailure(Error)
(or, less commonly, setFailure(Error)
).
In practice, the most common type of failure that is likely to be reported by an otherwise
healthy JVM is OutOfMemoryError
. However, GemFire will report any occurrence of
VirtualMachineError
as a JVM failure.
When a failure is reported, you must assume that the JVM has broken its fundamental execution contract with your application. No programming invariant can be assumed to be true, and your entire application must be regarded as corrupted.
startThreads()
is called, a "watchdog" Thread
is
started that periodically checks to see if system corruption has been reported. When system
corruption is detected, this thread proceeds to:
After this has successfully ended, we launch a
setFailureAction(Runnable)
. By
default, this Runnable performs nothing. If you feel you need to perform an action before exiting
the JVM, this hook gives you a means of attempting some action. Whatever you attempt should be
extremely simple, since your Java execution environment has been corrupted.
GemStone recommends that you employ Java Service Wrapper to detect when your JVM exits and to perform appropriate failure and restart actions.
setExitOK(boolean)
), the watchdog calls System.exit(int)
with an argument of 1.
If you have not granted this class permission to close the JVM, you are strongly advised
to call it in your failure action (in the previous step).
Each of these actions will be run exactly once in the above described order. However, if either
step throws any type of error (Throwable
), the watchdog will assume that the JVM is still
under duress (esp. an OutOfMemoryError
), will wait a bit, and then retry the failed
action.
It bears repeating that you should be very cautious of any Runnables you ask this class to run. By definition the JVM is very sick when failure has been signalled.
startThreads()
creates a
second thread (the "proctor") that monitors free memory. It does this by examining
free memory
, total memory
and
maximum memory
. If the amount of available memory stays below a given
threshold
, for more than WATCHDOG_WAIT
seconds,
the watchdog is notified.
Note that the proctor can be effectively disabled by
setting
the failure memory threshold to a
negative value.
The proctor is a second line of defense, attempting to detect OutOfMemoryError conditions in circumstances where nothing alerted the watchdog. For instance, a third-party jar might incorrectly handle this error and leave your virtual machine in a "stuck" state.
Note that the proctor does not relieve you of the obligation to follow the best practices in the next section.
Error
, or Throwable
, you mustalso check for VirtualMachineError
like so:
catch (VirtualMachineError err) {
SystemFailure.initiateFailure
(err);
// If this ever returns, rethrow the error. We're poisoned
// now, so don't let this thread continue.
throw err;
}
checkFailure()
utility function, but you are
not required to (you could just see if getFailure()
returns a non-null
result).
A job processing loop is a good candidate, for instance, in
org.apache.org.jgroups.protocols.UDP#run(), which implements Thread.run()
:
for (;;) {
SystemFailure.checkFailure
();
if (mcast_recv_sock == null || mcast_recv_sock.isClosed()) break;
if (Thread.currentThread().isInterrupted()) break;
...
Error
or Throwable
,
you should also make sure that you aren't dealing with a corrupted JVM:
catch (Throwable t) {
// Whenever you catch Error or Throwable, you must also
// catch VirtualMachineError (see above). However, there is
// _still_ a possibility that you are dealing with a cascading
// error condition, so you also need to check to see if the JVM
// is still usable:
SystemFailure.checkFailure
();
...
}
Modifier and Type | Field and Description |
---|---|
protected static Error |
failure
Deprecated.
the underlying failure
This is usually an instance of
VirtualMachineError , but it is not required to be such. |
static long |
MEMORY_MAX_WAIT
Deprecated.
This is the maximum amount of time, in seconds, that the proctor thread will tolerate seeing
free memory stay below
setFailureMemoryThreshold(long) , after which point it will
declare a system failure. |
Modifier and Type | Method and Description |
---|---|
static void |
checkFailure()
Deprecated.
Utility function to check for failures.
|
static void |
emergencyClose()
Deprecated.
Attempt to close any and all GemFire resources.
|
static Error |
getFailure()
Deprecated.
Returns the catastrophic system failure, if any.
|
static void |
initiateFailure(Error f)
Deprecated.
Signals that a system failure has occurred and then throws an AssertionError.
|
static boolean |
isJVMFailureError(Error err)
Deprecated.
Returns true if the given Error is a fatal to the JVM and it should be shut down.
|
static void |
loadEmergencyClasses()
Deprecated.
Since it requires object memory to unpack a jar file, make sure this JVM has loaded the classes
necessary for closure before it becomes necessary to use them.
|
protected static void |
logFine(String name,
String s)
Deprecated.
Logging can require allocation of objects, so we wrap the logger so that failures are silently
ignored.
|
protected static void |
logInfo(String name,
String s)
Deprecated.
Logging can require allocation of objects, so we wrap the logger so that failures are silently
ignored.
|
protected static boolean |
logWarning(String name,
String s,
Throwable t)
Deprecated.
Logging can require allocation of objects, so we wrap the logger so that failures are silently
ignored.
|
static boolean |
setExitOK(boolean newVal)
Deprecated.
Indicate whether it is acceptable to call
System.exit(int) after failure processing has
completed. |
static void |
setFailure(Error failure)
Deprecated.
Set the underlying system failure, if not already set.
|
static Runnable |
setFailureAction(Runnable action)
Deprecated.
Sets a user-defined action that is run in the event that failure has been detected.
|
static long |
setFailureMemoryThreshold(long newVal)
Deprecated.
Set the memory threshold under which system failure will be notified.
|
static void |
signalCacheClose()
Deprecated.
Should be invoked when GemFire cache is closing or closed.
|
static void |
signalCacheCreate()
Deprecated.
Should be invoked when GemFire cache is being created.
|
static void |
startThreads()
Deprecated.
This starts up the watchdog and proctor threads.
|
static void |
stopThreads()
Deprecated.
This stops the threads that implement this service.
|
protected static volatile Error failure
VirtualMachineError
, but it is not required to be such.getFailure()
,
initiateFailure(Error)
public static final long MEMORY_MAX_WAIT
setFailureMemoryThreshold(long)
, after which point it will
declare a system failure.
The default is 15 sec. This can be set using the system property
gemfire.SystemFailure.MEMORY_MAX_WAIT
.setFailureMemoryThreshold(long)
public static boolean setExitOK(boolean newVal)
System.exit(int)
after failure processing has
completed.
This may be dynamically modified while the system is running.
newVal
- true if it is OK to exit the processpublic static boolean isJVMFailureError(Error err)
initiateFailure(Error)
or setFailure(Error)
if this returns true.err
- an Errorpublic static void signalCacheCreate()
public static void signalCacheClose()
public static void loadEmergencyClasses()
Note that just touching the class in order to load it is usually sufficient, so all an
implementation needs to do is to reference the same classes used in emergencyClose()
.
Just make sure to do it while you still have memory to succeed!
public static void emergencyClose()
The former is because the system is in an undefined state and attempting to acquire the mutex may cause a hang.
The latter is because the likelihood is that we are invoking this method due to memory exhaustion, so any attempt to create an object will also cause a hang.
This method is not meant to be called directly (but, well, I guess it could). It is public to
document the contract that is implemented by emergencyClose
in other parts of the
system.
public static void checkFailure() throws InternalGemFireError, Error
InternalGemFireError
- if the system has been corruptedError
- if the system has been corrupted and a thread-specific AssertionError cannot be
allocatedinitiateFailure(Error)
public static void initiateFailure(Error f) throws InternalGemFireError, Error
f
- the failure to setIllegalArgumentException
- if f is nullInternalGemFireError
- always; this method does not return normally.Error
- if a thread-specific AssertionError cannot be allocated.public static void setFailure(Error failure)
This method does not generate an error, and should only be used in circumstances where
execution needs to continue, such as when re-implementing
ThreadGroup.uncaughtException(Thread, Throwable)
.
failure
- the system failureIllegalArgumentException
- if you attempt to set the failure to nullpublic static Error getFailure()
This is usually (though not necessarily) an instance of VirtualMachineError
.
A return value of null indicates that no system failure has yet been detected.
Object synchronization can implicitly require object creation (fat locks in JRockit for instance), so the underlying value is not synchronized (it is a volatile). This means the return value from this call is not necessarily the first failure reported by the JVM.
Note that even if it were synchronized, it would only be a proximal indicator near the time that the JVM crashed, and may not actually reflect the underlying root cause that generated the failure. For instance, if your JVM is running short of memory, this Throwable is probably an innocent victim and not the actual allocation (or series of allocations) that caused your JVM to exhaust memory.
If this function returns a non-null value, keep in mind that the JVM is very limited. In particular, any attempt to allocate objects may fail if the original failure was an OutOfMemoryError.
public static Runnable setFailureAction(Runnable action)
This action is run after the GemFire cache has been shut down. If it throws any error, it will be reattempted indefinitely until it succeeds. This action may be dynamically modified while the system is running.
The default action prints the failure stack trace to System.err.
action
- the Runnable to useinitiateFailure(Error)
public static long setFailureMemoryThreshold(long newVal)
gemfire.SystemFailure.chronic_memory_threshold
.newVal
- threshold in bytesRuntime.freeMemory()
protected static boolean logWarning(String name, String s, Throwable t)
name
- the name of the loggers
- string to printt
- the call stack, if anyprotected static void logInfo(String name, String s)
name
- the name of the loggers
- string to printprotected static void logFine(String name, String s)
name
- the name of the loggers
- string to printpublic static void startThreads()
public static void stopThreads()