
MarkLogic 12 Product Documentation
cts.wordQuerycts.wordQuery(
text as String[],
[options as String[]],
[weight as Number?]
) as cts.wordQuery
Summary
Returns a query matching text content containing a given phrase.
Parameters |
text |
Some words or phrases to match.
When multiple strings are specified,
the query matches if any string matches.
|
options |
Options to this query. The default is ().
Options include:
- "case-sensitive"
- A case-sensitive query.
- "case-insensitive"
- A case-insensitive query.
- "diacritic-sensitive"
- A diacritic-sensitive query.
- "diacritic-insensitive"
- A diacritic-insensitive query.
- "punctuation-sensitive"
- A punctuation-sensitive query.
- "punctuation-insensitive"
- A punctuation-insensitive query.
- "whitespace-sensitive"
- A whitespace-sensitive query.
- "whitespace-insensitive"
- A whitespace-insensitive query.
- "stemmed"
- A stemmed query.
- "unstemmed"
- An unstemmed query.
- "wildcarded"
- A wildcarded query.
- "unwildcarded"
- An unwildcarded query.
- "exact"
- An exact match query. Shorthand for "case-sensitive",
"diacritic-sensitive", "punctuation-sensitive",
"whitespace-sensitive", "unstemmed", and "unwildcarded".
- "lang=iso639code"
- Specifies the language of the query. The iso639code
code portion is case-insensitive, and uses the languages
specified by
ISO 639.
The default is specified in the database configuration.
- "distance-weight=number"
- A weight applied based on the minimum distance between matches
of this query. Higher weights add to the importance of
proximity (as opposed to term matches) when the relevance order is
calculated.
The default value is 0.0 (no impact of proximity). The
weight should be between 64 and -16.
Weights greater than 64 will have the same effect as a
weight of 64.
This parameter has no effect if the
word positions
index is not enabled. This parameter has no effect on searches that
use score-simple, score-random, or score-zero (because those scoring
algorithms do not consider term frequency, proximity is irrelevant).
- "min-occurs=number"
- Specifies the minimum number of occurrences required. If
fewer that this number of words occur, the fragment does not match.
The default is 1.
- "max-occurs=number"
- Specifies the maximum number of occurrences required. If
more than this number of words occur, the fragment does not match.
The default is unbounded.
- "synonym"
- Specifies that all of the terms in the $text parameter are
considered synonyms for scoring purposes. The result is that
occurrences of more than one of the synonyms are scored as if
there are more occurrences of the same term (as opposed to
having a separate term that contributes to score).
- "lexicon-expand=value"
- The value is one of
full ,
prefix-postfix , off , or
heuristic (the default is heuristic ).
An option with a value of lexicon-expand=full
specifies that wildcards are resolved by expanding the pattern to
words in a lexicon (if there is one available), and turning into a
series of cts:word-queries , even if this takes a long
time to evaluate.
An option with a value of lexicon-expand=prefix-postfix
specifies that wildcards are resolved by expanding the pattern to the
pre- and postfixes of the words in the word lexicon (if there is one),
and turning the query into a series of character queries, even if it
takes a long time to evaluate.
An option with a value of lexicon-expand=off
specifies that wildcards are only resolved by looking up character
patterns in the search pattern index, not in the lexicon.
An option with a value of lexicon-expand=heuristic ,
which is the default, specifies that wildcards are resolved by using
a series of internal rules, such as estimating the number of lexicon
entries that need to be scanned, seeing if the estimate crosses
certain thresholds, and (if appropriate), using another way besides
lexicon expansion to resolve the query.
- "lexicon-expansion-limit=number"
- Specifies the limit for lexicon expansion. This puts a restriction
on the number of lexicon expansions that can be performed. If the limit is
exceeded, the server may raise an error depending on whether the "limit-check"
option is set. The default value for this option will be 4096.
- "limit-check"
- Specifies that an error will be raised if the lexicon expansion
exceeds the specified limit.
- "no-limit-check"
- Specifies that error will not be raised if the lexicon expansion
exceeds the specified limit. The server will try to resolve the wildcard.
"no-limit-check" is default, if neither "limit-check" nor "no-limit-check" is explicitly
specified.
|
weight |
A weight for this query.
Higher weights move search results up in the relevance
order. The default is 1.0. The
weight should be between 64 and -16.
Weights greater than 64 will have the same effect as a
weight of 64.
Weights less than the absolute value of 0.0625 (between -0.0625 and
0.0625) are rounded to 0, which means that they do not contribute to the
score.
|
Usage Notes
If neither "case-sensitive" nor "case-insensitive"
is present, $text is used to determine case sensitivity.
If $text contains no uppercase, it specifies "case-insensitive".
If $text contains uppercase, it specifies "case-sensitive".
If neither "diacritic-sensitive" nor "diacritic-insensitive"
is present, $text is used to determine diacritic sensitivity.
If $text contains no diacritics, it specifies "diacritic-insensitive".
If $text contains diacritics, it specifies "diacritic-sensitive".
If neither "punctuation-sensitive" nor "punctuation-insensitive"
is present, $text is used to determine punctuation sensitivity.
If $text contains no punctuation, it specifies "punctuation-insensitive".
If $text contains punctuation, it specifies "punctuation-sensitive".
If neither "whitespace-sensitive" nor "whitespace-insensitive"
is present, the query is "whitespace-insensitive".
If neither "wildcarded" nor "unwildcarded"
is present, the database configuration and $text determine wildcarding.
If the database has any wildcard indexes enabled ("three character
searches", "two character searches", "one character searches", or
"trailing wildcard searches") and if $text contains either of the
wildcard characters '?' or '*', it specifies "wildcarded".
Otherwise it specifies "unwildcarded".
If neither "stemmed" nor "unstemmed"
is present, the database configuration determines stemming.
If the database has "stemmed searches" enabled, it specifies "stemmed".
Otherwise it specifies "unstemmed".
If the query is a wildcarded query and also a phrase query
(contains two or more terms), the wildcard terms in the query
are unstemmed.
Negative "min-occurs" or "max-occurs" values will be treated as 0 and
non-integral values will be rounded down. An error will be raised if
the "min-occurs" value is greater than the "max-occurs" value.
Relevance adjustment for the "distance-weight" option depends on
the closest proximity of any two matches of the query. For example,
cts:word-query(("dog","cat"),("distance-weight=10"))
will adjust relevance based on the distance between the closest pair of
matches of either "dog" or "cat" (the pair may consist only of matches of
"dog", only of matches of "cat", or a match of "dog" and a match of "cat").
Example
cts.search(
cts.wordQuery("MarkLogic Corporation"));
=> .. A Sequence containing relevance-ordered
documents containing the phrase 'MarkLogic Corporation'.
Example
cts.search(
cts.wordQuery("MarkLogic Corporation",
["case-insensitive"]));
=> .. A Sequence containing relevance-ordered
documents containing the phrase 'MarkLogic Corporation'
or any other case-shift like 'MarkLogic Corporation',
'MARKLOGIC Corporation', etc.
Example
cts.search(
cts.wordQuery("to be, or not to be",
["punctuation-insensitive"]));
=> .. relevance-ordered sequence of 'SPEECH'
element ancestors (or self) of any node
containing the phrase 'to be, or not to be',
ignoring punctuation.
Example
// the following query uses the "synonym" option to make the
// terms "cat" and "kitty" treated the same for scoring
// purposes
cts.search(
cts.wordQuery(["cat", "kitty"], ["synonym"]) );
=> Returns a Sequence containing relevance-ordered
documents containing at least one of the specified terms,
where the words "cat" and "kitty" are treated, for scoring
purposes, as if they are both the word "cat". Without
the synonym option, there would be one contribution to
the score from "cat" matches and one from "kitty" matches.
Copyright © 2025 MarkLogic Corporation. MARKLOGIC is a
registered trademark of MarkLogic Corporation.