Apache Geode
CHANGELOG
Slow distributed-ack Messages
In systems with distributed-ack regions, a sudden large number of distributed-no-ack operations can cause distributed-ack operations to take a long time to complete.
The distributed-no-ack operations can come from anywhere. They may be updates to distributed-no-ack regions or they may be other distributed-no-ack operations, like destroys, performed on any region in the cache, including the distributed-ack regions.
The main reasons why a large number of distributed-no-ack messages may delay distributed-ack operations are:
- For any single socket connection, all operations are executed serially. If there are any other operations buffered for transmission when a
distributed-ackis sent, thedistributed-ackoperation must wait to get to the front of the line before being transmitted. Of course, the operation’s calling process is also left waiting. - The
distributed-no-ackmessages are buffered by their threads before transmission. If many messages are buffered and then sent to the socket at once, the line for transmission might be very long.
You can take these steps to reduce the impact of this problem:
- If you’re using TCP, check whether you have socket conservation enabled for your members. It is configured by setting the Geode property
conserve-socketsto true. If enabled, each application’s threads will share sockets unless you override the setting at the thread level. Work with your application programmers to see whether you might disable sharing entirely or at least for the threads that performdistributed-ackoperations. These include operations ondistributed-ackregions and alsonetSearchesperformed on regions of any distributed scope. (Note:netSearchis only performed on regions with a data-policy of empty, normal and preloaded.) If you give each thread that performsdistributed-ackoperations its own socket, you effectively let it scoot to the front of the line ahead of thedistributed-no-ackoperations that are being performed by other threads. The thread-level override is done by calling theDistributedSystem.setThreadsSocketPolicy(false)method. - Reduce your buffer sizes to slow down the distributed-no-ack operations. These changes slow down the threads performing distributed-no-ack operations and allow the thread doing the distributed-ack operations to be sent in a more timely manner.
- If you’re using UDP (you either have multicast enabled regions or have set
disable-tcpto true in gemfire.properties), consider reducing the byteAllowance of mcast-flow-control to something smaller than the default of 3.5 megabytes. - If you’re using TCP/IP, reduce the
socket-buffer-sizein gemfire.properties.
- If you’re using UDP (you either have multicast enabled regions or have set