Skip navigation links

Package org.apache.geode.cache.query

GemFire's query service, an implementation of an OQL/SQL-style query processor with supporting indexes.

See: Description

Package org.apache.geode.cache.query Description

GemFire's query service, an implementation of an OQL/SQL-style query processor with supporting indexes. The query service enables data to be retrieved efficiently from a GemFire cache through the use of a high-level query language. The query language allows you to access the data based on multiple attributes and to tranform the results. The service also allows you to create indexes in order to optimize the performance of data retrieval.

The QueryService interface provides the public entry point to query execution, query compilation, and index creation. A QueryService instance is obtained from RegionService.getQueryService(). Queries can also be performed by using the shortcut methods Region.query(java.lang.String), Region.selectValue(java.lang.String), and Region.existsValue(java.lang.String).

This package description contains the following sections:

Terminology

 Attribute
A named member of a type. An attribute is mapped to a Java class member in three possible ways with the following priority until a match is found. If the attribute is named x, then:
  1. public method getX()

  2. public method  x()

  3. public field  x


(attribute) path
A sequence of attributes separated by a dot (.) that is applied to objects where the value of one attribute is used to apply the next attribute in the sequence. For example, the path x.y.z as it applies to object A is equivalent to A.getX().getY().getZ().
 
collection
In the context of the query language, a collection includes not only instances of java.util.Collection, but arrays and maps as well. Iteration over a map results in the iteration over its entries as Map.Entry.
region path
A fully-qualified specification of a region in the cache. The query language only supports full paths starting from a cache root region. The minimum region path is a forward slash ('/') followed by a root region name. The region path for a non-root region, A, begins with a forward slash and region A' s root region name followed by the series of region names required to navigate to region A, with a forward slash delminiter between region names.
 
struct
A data type that has a fixed number of elements, each of which has a field name and can contain an object value. An element of a struct is typically referred to by an attribute name, for example, if an address were represented by a struct, then you might access the address elements in this manner: address.zipCode = 12345; address.city = 'San Francisco'. Query evaluation may generate structs whose elements can be referred to as an attribute in the query language. Structures that are returned to Java are immutable and implement the Struct interface.

Query Language


The Query Language supported in GemFire is essentially a subset of OQL (ODMG 3.0), which in turn is based on SQL-92.

The query language provides the basic set of clauses, expressions, and operators that allow you to retrieve elements from objects stored in cache regions. This section describes the operators, keywords, and operations that the query language supports for this release.

NOTE: Query language keywords such as SELECT, DATE, and NULL, are case-insensitive. Identifiers in query strings, such as attribute names, method names, and path expressions, are case-sensitive.

Query language keywords are capitalized in the documentation to identify them as keywords.

Region Paths

In the context of a query, the name of a region is specified by its full path starting with a forward slash (/), and delimited by the
forward slash (/) between regions names. Region names in a region path are restricted to alphanumeric characters or underscore characters only.

Many queries need to operate on the values of a Region. Some queries, however, need to operate on the Region itself, e.g. to get the keys or entries of the region. To facilitate this, when a region path evaluates to a Region in the context of a query, the type of object it refers to has the following interface:

 interface QRegion extends org.apache.geode.cache.Region, java.util.Collection { }

Thus, the interface inherits all the attributes and methods from both Region and Collection. When used as a Collection, the elements are the values of the region. This enables a query to use a Region as either a collection of its values or as a Region itself.

Example:

 /root/employees

Can be used as a Collection of values in region /root/employees

 /root/employees.keys

Returns the set of keys in region /root/employees

SELECT Statement

The SELECT statement lets you filter data from collection(s) of objects based on a predicate and optionally transform the results.  

 SELECT DISTINCT projectionList FROM fromClause [WHERE expression]

The DISTINCT keyword is required in this version of GemFire.

The semantics of the SELECT statement are as follows. The FROM clause specifies the collections to iterate over, the WHERE clause specifies a boolean expression to filter the elements in the iteration(s), and the projectionAttributes performs an optional transformation on each element in the result.    

The FROM clause establishes a naming scope for the rest of the fromExpression (if more collections are listed) as well as for the WHERE clause and the projection attributes. Explicit variables may also be declared in the FROM clause. These are added to the current scope, one variable for each collection in the list. The variable becomes an "iterator variable" in
that it is bound to the elements of the associated collection as they are iterated over. Each expression in the FROM clause can be any expression that evaluates to a collection. For example, an expression in the FROM clause is typically a path expression that resolves to a region in the Cache so that the values in the region become the collection of objects to filter. Other possibilities include a query parameter that is bound to a collection, or a nested SELECT statement.  

Result of a Select Statement

The result of a SELECT statement is a collection that implements the SelectResults interface.

The SelectResults is a simple collection of unstructured objects in two cases:

  1. If only one expression is specified by the projection list and no field name is specified

  2. If the SELECT list is * and there is a single collection in the FROM clause

In all other cases, it is a collection of structs containing the data.

The name of each field in the struct is either specified explicitly by an identifier (using the fieldname:expr syntax in the SELECT list or an explicit variable name in the FROM clause) or it defaults to the last attribute name specified in the path expression of the associated expression. If default names can not be decided, arbitrary unique names are generated by the query processor.

Typing of Iteration Elements in the FROM clause

In order for the query processor to be able to resolve implicit attribute names in a select statement, the types of the elements in the iterator collections specified in the FROM clause must be provided.

In the following example, the type of the values in the region and the type of the elements in the positions.values collection must be available to the query processor in order to resolve the implicit attributes id and sharesOutstanding:

SELECT DISTINCT id FROM /portfolios, positions.values WHERE sharesOutstanding > 100

There are three possible ways of providing this type information:

  1. If it is a Region, provide the type in the valueConstraint or keyConstraint RegionAttributes.

  2. Use the TYPE keyword to specify the type of the elements. Note that application-defined types must be imported, e.g.:
    IMPORT com.acme.Portfolio;
    IMPORT com.acme.Position;
    SELECT DISTINCT id
    FROM /portfolios TYPE Portfolio, positions.values TYPE Position
    WHERE sharesOutstanding > 100

  3. Use typecasting:
    IMPORT com.acme.Portfolio;
    IMPORT com.acme.Position;
    SELECT DISTINCT id
    FROM (collection<Portfolio>)/portfolios, (collection<Position>)positions.values
    WHERE sharesOutstanding > 100

If explicit iterator variables are used, this typing is not necessary since the names can be resolved without requiring type information. However, in the future GemFire may allow queries to be compiled into bytecodes, which will require this type information. It is therefore recommended that this type information always be provided to ensure forward compatibility.

Joins

Since the collections in the FROM clause are not necessarily related to each other, SELECT statements could be used to join data between two different regions:

 SELECT p FROM /Persons p, /Flowers f WHERE p.name = f.name

This returns all the Persons from the /Persons region that has the same name as some Flower in the /Flowers region. There could be indexes on both regions for the name attribute, but only one index could be used during query evaluation. GemFire does not currently support the creation of indexes targeted for joins across more than one region.

Method Invocation

The query language supports method invocation in query expressions. The query processor maps attributes in query strings using the rules described under Attribute in the terminology section. If you know that the attribute name maps to a public method that takes no parameters, you can simply include the method name in the query string as an attribute. For example, emps.isEmpty is equivalent to emps.isEmpty(). In the following example, the query invokes isEmpty() on subordinates, and
returns the set of all employees with no subordinates.  

 SELECT DISTINCT * FROM /root/employees WHERE subordinates.isEmpty

GemFire also supports invoking public methods with parameters. To invoke methods with parameters, provide the method arguments between parentheses. For overloaded methods, the query processor matches the runtime argument types with the parameter types required by the method to resolve which method to call. If more than one method can be
invoked, the query processor chooses the method whose parameter types are the most specific for the given arguments.

Methods that are declared with a void return type return NULL when executed in a query.

Operators

GemFire supports the following operator types in expressions:

Comparison Operators

Comparison operators compare two values and return the results, either true or false. GemFire query language supports the following comparison operators:

           =     <>     <     <=     >     >=

The operator != is also accepted as equivalent to <>.  

Logical Operators

The operators AND and OR allow you to create more complex expressions by combining expressions to produce a boolean result (true or false).

Unary Operators

GemFire supports the unary operator NOT.

Map and Index Operators

If expr is an expression that evaluates to an integer i, then a[expr] extracts the (i+1)th element of array, list, or string a. (The first element has rank 0). The same operator is used to access a value by key in a Map or Region, in which case the type of expr can be any Object. In the case of a Region, the map operator performs a non-distributed get that does not cause a netSearch if the value is not present in the region locally. (i.e. equivalent to getEntry(expr).getValue()).  

Dot and Forward Slash Operators

The dot operator separates attribute names in a path expression, and specifies the navigation through object attributes. The forward slash is used to navigate through subregions as described above under Region Paths.

Functions

The query language supports the functions:

ELEMENT(query)
Extracts a single element from a collection or array. This function throws a FunctionDomainException if the argument is not a collection or array with exactly one element.
IS_DEFINED(query)
Returns true if the query evaluates to a non-null value.
IS_UNDEFINED(query)
Returns true if the query cannot be evaluated or evaluates to null.

Construction Expressions

The construction expression currently implemented is the set constructor. A set can be constructed using SET(e1, e2, ..., en) where e1, e2, ..., en are expressions. This constructor creates and returns the set containing the elements e1, e2, ..., en.  


IN Expression

If e1 and e2 are expressions, e2 is a collection, and e1 is an object or a literal having the same type or a subtype as the elements of e2, then

 e1 IN e2

is an expression of type boolean. It returns true if element e1 is not UNDEFINED and belongs to collection e2, it returns false if e1 is not UNDEFINED and does not belong to collection e2, and it returns UNDEFINED if e1 is UNDEFINED.

Example:
 

 2 IN SET(1, 2, 3)

This returns true.

As another example where the collection membership is a subquery:
If, in addition to a region with employees in it there is a separate region named /payroll that contains payroll objects with attributes empId for employee id and rate for payrate, this query returns the names of the employees earning a rate of 8:

 SELECT name FROM /root/employees WHERE empId IN (SELECT empId FROM /payroll WHERE rate = 8)

Literals

Query language expressions can contain literals as well as operators and attribute names. The following lists the literal types that GemFire supports.

boolean
A boolean value, either TRUE or FALSE
integer and long
An integer literal is of type long if it is suffixed with the ASCII letter L. Otherwise it is of type int.
floating point
A floating-point literal is of type float if it is suffixed with an ASCII letter F. Otherwise its type is double and it can optionally be suffixed with an ASCII letter D. A double or floating point literal can optionally include an exponent suffix of E or e, followed by a signed or unsigned number.
string
String literals are delimited by single quotation marks. Embedded single-quotation marks are doubled. For example, the character string 'Hello' evaluates to the value Hello, while the character string 'He said, ''Hello''' evaluates to He said, 'Hello'. Embedded newlines are kept as part of the string literal.
char
A literal is of type char if it is a string literal prefixed by the keyword CHAR, otherwise it is of type string. The CHAR literal for the single-quotation mark character is CHAR '''' (four single quotation marks).
date
A java.sql.Date object that uses the JDBC format prefixed with the DATE keyword: DATE yyyy-mm-dd In the Date, yyyy represents the year, mm represents the month, and dd represents the day. The year must be represented by four digits; a two-digit shorthand for the year is not allowed.
time
A java.sql.Time object that uses the JDBC format (based on a 24-hour clock) prefixed with the TIME keyword: TIME hh:mm:ss In the Time, hh represents the hours, mm represents the minutes, and ss represents the seconds. The hours field is based on a 24-hour clock.
timestamp
A java.sql.Timestamp object that uses the JDBC format with a TIMESTAMP prefix: TIMESTAMP yyyy-mm-dd hh:mm:ss.fffffffff In the Timestamp, yyyy-mm-dd represents the date, hh:mm:ss represents the time, and fffffffff represents the fractional seconds (up to nine digits).
NULL
UNDEFINED

Comparing Values with java.util.Date

You can compare the temporal literal values DATE, TIME, and TIMESTAMP with java.util.Date values. However, there is no literal for java.util.Date values in the query language.

Rules for UNDEFINED

The special value UNDEFINED behaves according to the following rules:

Query Parameter References

Query parameters are identified by a dollar sign ($) followed by a digit that represents the parameter's position in the parameter array passed in the execute() method. Counting begins at one (1). That is, $1 references the first bound attribute, and $2 references the second bound attribute, and so on.  

Comments

You can include comments in the query string. To insert a one-line comment, begin the line with two dashes (--). To insert a multiple-line comment, begin the comment block with /* and end it with */.

Indexes

An index provides significant performance gains for query execution by enabling the query evaluation to avoid the often substantial work of iterating through every object in a collection of objects, instead looking up results in a preconstructed index. GemFire provides functional indexes that support attribute (or more complex functions of attributes) comparison with a constant using any of the relational operators. An index is kept current with respect to a region automatically. The RegionAttribute synchronousIndexMaintenance specifies whether the indexes for the region are kept current atomically when the region is modified (synchronous), or whether they are brought up to date in a background thread (asynchronous). The default is asynchronous since this provides the best performance. Asynchronous index maintenance may batch up multiple updates to the same key in the region and only apply the most recent.

The QueryService provides the following methods for creating an index:

Index createIndex(String name, IndexType indexType, String indexedExpression,
                           String fromClause);
Index createIndex(String name, IndexType indexType, String indexedExpression,
                           String fromClause, String imports);

The indexType currently must be either IndexType.FUNCTIONAL or PRIMARY_KEY.

The fromClause  resolves to a collection or list of collections which will correspond to the fromClause or part of a fromClause in a SELECT statement. The fromClause must contain one and only one region path, and the collections that the expressions evaluate to must have the characteristic that each struct generated by those collections is dependent on one and only one entry in the referenced region (otherwise and exception is thrown since the index wouldn't be able to be maintained on single entry updates). References to query parameters are not allowed.

The indexedExpression should be a function of the elements of the collection (or collection of structs) referenced in the fromExpression, and is used to optimize the comparison of the same path found in a WHERE clause when used to compare against a constant expression (i.e. an expression that does not depend on the iterator context) using the relational operators. The exact use and specification of the indexedExpression varies depending on the index type. See IndexType. No query parameters or region paths are permitted.

The imports provides packages and classes used in variable typing and typecasts.

In addition to functional indexes, GemFire also supports a primary key index. A primary key index uses the keys in the region itself. By creating a primary key index, you make the query service aware of the relationship between the values in the region and the keys in the region and enable the relationship to be used to optimize the execution of queries. For example, if the values in a region are employee objects and the keys in the region is the attribute empId on those employees, then you can create a primary key index on that region with the indexedExpression "empId". Note that primary key indexes are not sorted so they will only be used for equality tests. To obtain a sorted index on the region keys, create a functional index instead.

Index creation is also supported in the cache.xml for creating indexes declaratively at the same time regions are created via the cache.xml.

RESERVED WORDS

abs all and andthen any array as asc avg bag bag boolean by by byte char collection count date declare define desc dictionary distinct double element enum except exists false first flatten float for from group having import in int intersect interval is_defined is_undefined last like list listtoset long map max min mod nil not null octet or order orelse query select set short some string struct sum time timestamp true type undefine undefined union unique unsigned where

LANGUAGE GRAMMAR

symbol      ::=        expression

n                   is a nonterminal symbol that has to appear at some place within the grammar on the left side of a rule, all nonterminal symbols have to be derived to be terminal symbols.

t                    represents the terminal symbol t

x y                 represents x followed by y

x | y 
(x| y )           represents x or y,

[ x ]               x or empty

{ x }              possibly empty sequence of x



query_program ::= [ imports ; ]  query  [;]
imports ::= import { ; import }
import ::= IMPORT qualifiedName [ AS identifier ]
query ::= selectExpr  | expr
selectExpr ::= SELECT DISTINCT projectionAttributes fromClause [ whereClause ]
projectionAttributes ::= *
                                       | projectionList
projectionList ::= projection { , projection }
projection ::= field
                      | expr [ AS identifier ] 
field ::= identifier : expr
fromClause ::= FROM iteratorDef { , iteratorDef }
iteratorDef ::=   expr [ [ AS ] identifier ] [ TYPE type ]                 
                          |  identifier IN expr  [ TYPE type ]              
whereClause ::= WHERE expr
expr ::= castExpr
castExpr ::= orExpr
                    | left_paren type right_paren castExpr
orExpr ::= andExpr { OR andExpr }
andExpr ::= equalityExpr { AND equalityExpr }
equalityExpr ::= relationalExpr { ( = | <>  | != ) relationalExpr }
relationalExpr ::= inExpr { ( < | <= | > | >= ) inExpr }
inExpr ::= unaryExpr { IN unaryExpr }
unaryExpr ::= [ NOT ] unaryExpr
postfixExpr ::=  primaryExpr { left_bracket expr right_bracket }
                     | primaryExpr { dot identifier [ argList ] }
argList ::= left_paren [ valueList ] right_paren
qualifiedName ::= identifier { dot identifier }
primaryExpr ::= conversionExpr
                    | identifier [ argList ]
                    | undefinedExpr
                    | collectionConstruction
                    | queryParam
                    | literal
                    | ( query )
conversionExpr ::= element ( query )
undefinedExpr ::= IS_UNDEFINED ( query )
                              | IS_DEFINED ( query )
collectionConstruction ::= SET left_paren [ valueList ] right_paren
valueList ::= expr { , expr }
queryParam ::= $ integerLiteral
identifier ::= letter { letter | digit | _ }
literal ::= booleanLiteral
                | integerLiteral
                | longLiteral
                | doubleLiteral
                | floatLiteral
                | charLiteral
                | stringLiteral
                | dateLiteral
                | timeLiteral
                | timestampLiteral
                | NULL
                | UNDEFINED
booleanLiteral ::= TRUE | FALSE
integerLiteral ::= [ - ] digit { digit }
longLiteral ::= integerLiteral L
floatLiteral ::= [ - ] digit { digit } dot digit { digit }
                         [ ( E | e ) [ + | - ] digit { digit } ] F
doubleLiteral ::= [ - ] digit { digit } dot digit { digit }
                            [ ( E | e ) [ + | - ] digit { digit } ] [ D ]
charLiteral ::= CHAR ' character '
stringLiteral ::= ' { character } '
dateLiteral ::= DATE ' integerLiteral – integerLiteral – integerLiteral '
timeLiteral ::= TIME
                    ' integerLiteral : integerLiteral : integerLiteral'
timestampLiteral ::= TIMESTAMP
                    ' integerLiteral – integerLiteral - integerLiteral
                    integerLiteral : integerLiteral : digit { digit } [ . digit
                   { digit } ] '
letter ::= (any unicode letter)
character ::= (any character)
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
dot ::= .
left_paren ::= (
right_paren ::= )
left_bracket ::= [
right_bracket ::= ]

Language Notes:

1. Query language keywords (for example, SELECT, NULL, DATE) are

case-insensitive; identifiers are case-sensitive.

2. Comment lines begin with --

3. Comment blocks are delimited by /* and */

4. String literals are delimited by single-quotes; embedded single-quotes

are doubled. Examples:

'Hello' value = Hello

'He said, ''Hello''' value = He said, 'Hello'

5. Character literals begin with the CHAR keyword followed by the character

in single quotation marks. The single-quotation mark character itself is

represented as CHAR ' ' ' ' (with four single quotation marks).

6. In the TIMESTAMP literal, there is amaximum of nine digits after the

decimal point.

Skip navigation links