Apache Geode CHANGELOG

Using Indexes on Equi-Join Queries using Multiple Regions

To query across multiple regions, identify all equi-join conditions. Then, create as few indexes for the equi-join conditions as you can while still joining all regions.

If there are equi-join conditions that redundantly join two regions (in order to more finely filter the data, for example), then creating redundant indexes for these joins will negatively impact performance. Create indexes only on one equi-join condition for each region pair.

In this example query:

SELECT DISTINCT * 
FROM /investors inv, /securities sc, /orders or, 
inv.ordersPlaced inv_op, or.securities or_sec 
    WHERE inv_op.orderID = or.orderID 
    AND or_sec.secID = sc.secID

All conditions are required to join the regions, so you would create four indexes, two for each equi-join condition:

FROM clause Indexed expression
/investors inv, inv.ordersPlaced inv_op inv_op.orderID
/orders or, or.securities or_sec or.orderID
FROM clause Indexed expression
/orders or, or.securities or_sec or_sec.secID
/securities sc sc.secID

Adding another condition to the example:

SELECT DISTINCT * 
FROM /investors inv, /securities sc, /orders or, 
inv.ordersPlaced inv_op, or.securities or_sec, sc.investors sc_invs 
    WHERE inv_op.orderID = or.orderID 
    AND or_sec.secID = sc.secID
                AND inv.investorID = sc_invs.investorID

You would still only want to use four indexes in all, as that’s all you need to join all of the regions. You would need to choose the most performant two of the following three index pairs:

FROM clause Indexed expression
/investors inv, inv.ordersPlaced inv_op inv_op.orderID
/orders or, or.securities or_sec or.orderID
FROM clause Indexed expression
/orders or, or.securities or_sec or_sec.secID
/securities sc, sc.investors sc_invs sc.secID
FROM clause Indexed expression
/investors inv, inv.ordersPlaced inv_op inv.investorID
/securities sc, sc.investors sc_invs sc_invs.investorID

The most performant set is that which narrows the data to the smallest result set possible. Examine your data and experiment with the three index pairs to see which provides the best performance.