MarkLogic Server 11.0 Product Documentation
cts.fieldValueCoOccurrencescts.fieldValueCoOccurrences(
field-name-1 as String,
field-name-2 as String,
[options as String[]],
[query as cts.query?],
[quality-weight as Number?],
[forest-ids as (Number|String)[]]
) as Sequence
Summary
Returns value co-occurrences (that is, pairs of values, both of which appear
in the same fragment) from the specified field value lexicon(s). The
values are returned as
an ArrayNode with two children, each child
containing one of the co-occurring values. You can use
cts.frequency
on each item returned to find how many times the pair occurs.
Value lexicons are implemented using range indexes; consequently
this function requires an field range index for each field specified
in the function. If there is not a range index configured for each
of the specified fields, an exception is thrown.
Parameters |
field-name-1 |
A string.
|
field-name-2 |
A string.
|
options |
Options. The default is ().
Options include:
- "ascending"
- Co-occurrences should be returned in ascending order.
- "descending"
- Co-occurrences should be returned in descending order.
- "any"
- Co-occurrences from any fragment should be included.
- "document"
- Co-occurrences from document fragments should be included.
- "properties"
- Co-occurrences from properties fragments should be included.
- "locks"
- Co-occurrences from locks fragments should be included.
- "frequency-order"
- Co-occurrences should be returned ordered by frequency.
- "item-order"
- Co-occurrences should be returned ordered by item.
- "fragment-frequency"
- Frequency should be the number of fragments with
an included co-occurrences.
This option is used with
cts:frequency .
- "item-frequency"
- Frequency should be the number of occurrences of
an included co-occurrence.
This option is used with
cts:frequency .
- "type=type"
- For both lexicons, use the type specified by type
(int, unsignedInt, long, unsignedLong, float, double, decimal,
dateTime, time, date, gYearMonth, gYear, gMonth, gDay,
yearMonthDuration, dayTimeDuration, string, or anyURI)
- "type-1=type"
- For the first lexicon, use the type specified by type
(int, unsignedInt, long, unsignedLong, float, double, decimal,
dateTime, time, date, gYearMonth, gYear, gMonth, gDay,
yearMonthDuration, dayTimeDuration, string, or anyURI)
- "type-2=type"
- For the second lexicon, use the type specified by type
(int, unsignedInt, long, unsignedLong, float, double, decimal,
dateTime, time, date, gYearMonth, gYear, gMonth, gDay,
yearMonthDuration, dayTimeDuration, string, or anyURI)
- "collation=URI"
- For both lexicons, use the collation specified by
URI.
- "collation-1=URI"
- For the first lexicon, use the collation specified by
URI.
- "collation-2=URI"
- For the second lexicon, use the collation specified by
URI.
- "timezone=TZ"
- Return timezone sensitive values (dateTime, time, date,
gYearMonth, gYear, gMonth, and gDay) adjusted to the timezone
specified by TZ.
Example timezones: Z, -08:00, +01:00.
- "ordered"
- Include co-occurrences only when the value from the first lexicon
appears before the value from the second lexicon.
Requires that word positions be enabled for both lexicons.
- "proximity=N"
- Include co-occurrences only when the values appear within
N words of each other.
Requires that word positions be enabled for both lexicons.
- "limit=N"
- Return no more than N co-occurrences. You should not
use this option with the "skip" option. Use "truncate" instead.
- "skip=N"
- Skip over fragments selected by the
cts:query
to treat the Nth fragment as the first fragment.
Co-occurrences from skipped fragments are not included.
This option affects the number of fragments selected
by the cts:query to calculate frequencies.
Only applies when a $query parameter is specified.
- "sample=N"
- Return only co-occurrences from the first N
fragments after skip selected by the
cts:query .
This option does not affect the number of fragments selected
by the cts:query to calculate frequencies.
Only applies when a $query parameter is specified.
- Return only co-occurrences from the first N
fragments after skip selected by the
cts:query ,
bit do not affect frequencies.
Only applies when a $query parameter is specified.
- "truncate=N"
- Include only co-occurrences from the first N
fragments after skip selected by the
cts:query .
This option also affects the number of fragments selected
by the cts:query to calculate frequencies.
Only applies when a $query parameter is specified.
- "score-logtfidf"
- Compute scores using the logtfidf method.
Only applies when a
$query parameter is specified.
- "score-logtf"
- Compute scores using the logtf method.
Only applies when a
$query parameter is specified.
- "score-simple"
- Compute scores using the simple method.
Only applies when a
$query parameter is specified.
- "score-random"
- Compute scores using the random method.
Only applies when a
$query parameter is specified.
- "score-zero"
- Compute all scores as zero.
Only applies when a
$query parameter is specified.
- "checked"
- Word positions should be checked when resolving the query.
- "unchecked"
- Word positions should not be checked when resolving the query.
- "too-many-positions-error"
- If too much memory is needed to perform positions calculations
to check whether a document matches a query,
return an XDMP-TOOMANYPOSITIONS error,
instead of accepting the document as a match.
- "eager"
- Perform most of the work concurrently before returning
the first item from the indexes, and only some of the work
sequentially while iterating through the rest of the items.
This usually takes the shortest time for a complete item-order
result or for any frequency-order result.
- "lazy"
- Perform only some the work concurrently before returning
the first item from the indexes, and most of the work
sequentially while iterating through the rest of the items.
This usually takes the shortest time for a small item-order
partial result.
- "concurrent"
- Perform the work concurrently in another thread. This is a hint
to the query optimizer to help parallelize the lexicon work, allowing
the calling query to continue performing other work while the lexicon
processing occurs. This is especially useful in cases where multiple
lexicon calls occur in the same query (for example, resolving many
facets in a single query).
- "map"
- Return results as
a JavaScript Object instead of as
a Sequence.
- "coordinate-system=name"
- Use the lexicon that is configured with the specified coordinate
system. Allowed values: "wgs84", "wgs84/double", "raw",
"raw/double". Only applicable if the lexicon value type is
point or long-lat-point .
- "precision=value"
- Use the lexicon that is configured with the specified precision.
Allowed values:
float and double .
Only applicable if the lexicon value type is point or
long-lat-point . This value takes precedence over the
precision implicit in the coordinate system name.
|
query |
Only include co-occurrences in fragments selected by the cts:query ,
and compute frequencies from this set of included co-occurrences.
The co-occurrences do not need to match the query, but they must occur in
fragments selected by the query.
The fragments are not filtered to ensure they match the query,
but instead selected in the same manner as
"unfiltered" cts.search
operations. If a string
is entered, the string is treated as a cts:word-query of the
specified string.
|
quality-weight |
A document quality weight to use when computing scores.
The default is 1.0.
|
forest-ids |
A sequence of IDs of forests to which the search will be constrained.
An empty sequence means to search all forests in the database.
The default is ().
|
Usage Notes
Only one of "frequency-order" or "item-order" may be specified
in the options parameter. If neither "frequency-order" nor "item-order"
is specified, then the default is "item-order".
Only one of "fragment-frequency" or "item-frequency" may be specified
in the options parameter. If neither "fragment-frequency" nor
"item-frequency" is specified, then the default is "fragment-frequency".
Only one of "ascending" or "descending" may be specified
in the options parameter. If neither "ascending" nor "descending"
is specified, then the default is "ascending" if "item-order" is
specified, and "descending" if "frequency-order" is specified.
Only one of "eager" or "lazy" may be specified
in the options parameter. If neither "eager" nor "lazy"
is specified, then the default is "eager" if "frequency-order" or "map"
is specified, otherwise "lazy".
Only one of "any", "document", "properties", or "locks"
may be specified in the options parameter.
If none of "any", "document", "properties", or "locks" are specified
and there is a $query parameter, then the default is "document".
If there is no $query parameter then the default is "any".
Only one of the "score-logtfidf", "score-logtf", "score-simple",
"score-random", or "score-zero" options may be specified in the options
parameter.
If none of "score-logtfidf", "score-logtf", "score-simple", "score-random",
or "score-zero" are specified, then the default is "score-logtfidf".
Only one of the "checked" or "unchecked" options may be specified
in the options parameter.
If neither "checked" nor "unchecked" are specified,
then the default is "checked".
If "collation=URI" is not specified in the options parameter,
then the default collation is used. If a lexicon with that collation
does not exist, an error is thrown.
If "sample=N" is not specified in the options parameter,
then all included co-occurrences may be returned.
If a $query
parameter
is not present, then "sample=N" has no effect.
If "truncate=N" is not specified in the options parameter,
then co-occurrences from all fragments selected by the
$query
parameter are included.
If a $query
parameter is not present, then
"truncate=N" has no effect.
To incrementally fetch a subset of the co-occurrences returned by this
function, use
fn.subsequence
on the output, rather than
the "skip" option. The "skip" option is based on fragments matching the
query
parameter (if present), not on values. A fragment
matched by query might contain multiple occurrences or no occurrences.
The number of fragments skipped does not correspond to the number of
values. Also, the skip is applied to the relevance ordered query matches,
not to the ordered co-occurrences list.
When using the "skip" option, use the "truncate" option rather than
the "limit" option to control the number of matching fragments from which
to draw values.
Example
******
Suppose we insert these two documents in the database.
Document 1:
<doc>
<name1>
<i11>John</i11><e12>Smith</e12><i13>Griffith</i13>
</name1>
<name2>
<i21>Will</i21><e22>Tim</e22><i23>Shields</i23>
</name2>
</doc>
Document 2:
<doc>
<name1>
<i11>Will<e12>Frank</e12>Shields</i11>
</name1>
<name2>
<i21>John<e22>Tim</e22>Griffith</i21>
</name2>
</doc>
*******
// Now suppose we have two fields aname1 and aname2 defined on the database.
// The field aname1 includes element "name1" and excludes "e12".
// The field aname2 includes element "name2" and excludes "e22".
// Both the fields have field range indexes configures with positions ON.
cts.fieldValueCoOccurrences("aname1","aname2");
=>
["John Griffith", "Will Shields"]
["Will Shields", "John Griffith"]
Copyright © 2024 MarkLogic Corporation. MARKLOGIC is a
registered trademark of MarkLogic Corporation.