Loading TOC...

cts.elementValueCoOccurrences

cts.elementValueCoOccurrences(
   $element-name-1 as xs.QName,
   $element-name-2 as xs.QName,
   [$options as String[]],
   [$query as cts.query?],
   [$quality-weight as Number?],
   [$forest-ids as String[]]
) as ValueIterator

Summary

Returns value co-occurrences (that is, pairs of values, both of which appear in the same fragment) from the specified element value lexicon(s). The values are returned as an ArrayNode with two children, each child containing one of the co-occurring values. You can use cts.frequency on each item returned to find how many times the pair occurs. Value lexicons are implemented using range indexes; consequently this function requires an element range index for each element specified in the function. If there is not a range index configured for each of the specified elements, an exception is thrown.

Parameters
$element-name-1 An element QName.
$element-name-2 An element QName.
$options Options. The default is ().

Options include:

"ascending"
Co-occurrences should be returned in ascending order.
"descending"
Co-occurrences should be returned in descending order.
"any"
Co-occurrences from any fragment should be included.
"document"
Co-occurrences from document fragments should be included.
"properties"
Co-occurrences from properties fragments should be included.
"locks"
Co-occurrences from locks fragments should be included.
"frequency-order"
Co-occurrences should be returned ordered by frequency.
"item-order"
Co-occurrences should be returned ordered by item.
"fragment-frequency"
Frequency should be the number of fragments with an included co-occurrences. This option is used with cts:frequency.
"item-frequency"
Frequency should be the number of occurences of an included co-occurrence. This option is used with cts:frequency.
"type=type"
For both lexicons, use the type specified by type (int, unsignedInt, long, unsignedLong, float, double, decimal, dateTime, time, date, gYearMonth, gYear, gMonth, gDay, yearMonthDuration, dayTimeDuration, string, or anyURI)
"type-1=type"
For the first lexicon, use the type specified by type (int, unsignedInt, long, unsignedLong, float, double, decimal, dateTime, time, date, gYearMonth, gYear, gMonth, gDay, yearMonthDuration, dayTimeDuration, string, or anyURI)
"type-2=type"
For the second lexicon, use the type specified by type (int, unsignedInt, long, unsignedLong, float, double, decimal, dateTime, time, date, gYearMonth, gYear, gMonth, gDay, yearMonthDuration, dayTimeDuration, string, or anyURI)
"collation=URI"
For both lexicons, use the collation specified by URI.
"collation-1=URI"
For the first lexicon, use the collation specified by URI.
"collation-2=URI"
For the second lexicon, use the collation specified by URI.
"timezone=TZ"
Return timezone sensitive values (dateTime, time, date, gYearMonth, gYear, gMonth, and gDay) adjusted to the timezone specified by TZ. Example timezones: Z, -08:00, +01:00.
"ordered"
Include co-occurrences only when the value from the first lexicon appears before the value from the second lexicon. Requires that word positions be enabled for both lexicons.
"proximity=N"
Include co-occurrences only when the values appear within N words of each other. Requires that word positions be enabled for both lexicons.
"limit=N"
Return no more than N co-occurrences.
"skip=N"
Skip over fragments selected by the cts:query to treat the Nth fragment as the first fragment. Co-occurrences from skipped fragments are not included. This option affects the number of fragments selected by the cts:query to calculate frequencies. Only applies when a $query parameter is specified.
"sample=N"
Return only co-occurrences from the first N fragments after skip selected by the cts:query. This option does not affect the number of fragments selected by the cts:query to calculate frequencies. Only applies when a $query parameter is specified.
"truncate=N"
Include only co-occurrences from the first N fragments after skip selected by the cts:query. This option also affects the number of fragments selected by the cts:query to calculate frequencies. Only applies when a $query parameter is specified.
"score-logtfidf"
Compute scores using the logtfidf method. Only applies when a $query parameter is specified.
"score-logtf"
Compute scores using the logtf method. Only applies when a $query parameter is specified.
"score-simple"
Compute scores using the simple method. Only applies when a $query parameter is specified.
"score-random"
Compute scores using the random method. Only applies when a $query parameter is specified.
"score-zero"
Compute all scores as zero. Only applies when a $query parameter is specified.
"checked"
Word positions should be checked when resolving the query.
"unchecked"
Word positions should not be checked when resolving the query.
"too-many-positions-error"
If too much memory is needed to perform positions calculations to check whether a document matches a query, return an XDMP-TOOMANYPOSITIONS error, instead of accepting the document as a match.
"eager"
Perform most of the work concurrently before returning the first item from the indexes, and only some of the work sequentially while iterating through the rest of the items. This usually takes the shortest time for a complete item-order result or for any frequency-order result.
"lazy"
Perform only some the work concurrently before returning the first item from the indexes, and most of the work sequentially while iterating through the rest of the items. This usually takes the shortest time for a small item-order partial result.
"concurrent"
Perform the work concurrently in another thread. This is a hint to the query optimizer to help parallelize the lexicon work, allowing the calling query to continue performing other work while the lexicon processing occurs. This is especially useful in cases where multiple lexicon calls occur in the same query (for example, resolving many facets in a single query).
"map"
Return results as a JavaScript Object instead of as a ValueIterator.
$query Only include co-occurrences in fragments selected by the cts:query, and compute frequencies from this set of included co-occurrences. The co-occurrences do not need to match the query, but they must occur in fragments selected by the query. The fragments are not filtered to ensure they match the query, but instead selected in the same manner as "unfiltered" cts.search operations. If a string is entered, the string is treated as a cts:word-query of the specified string.
$quality-weight A document quality weight to use when computing scores. The default is 1.0.
$forest-ids A sequence of IDs of forests to which the search will be constrained. An empty sequence means to search all forests in the database. The default is ().

Usage Notes

Only one of "frequency-order" or "item-order" may be specified in the options parameter. If neither "frequency-order" nor "item-order" is specified, then the default is "item-order".

Only one of "fragment-frequency" or "item-frequency" may be specified in the options parameter. If neither "fragment-frequency" nor "item-frequency" is specified, then the default is "fragment-frequency".

Only one of "ascending" or "descending" may be specified in the options parameter. If neither "ascending" nor "descending" is specified, then the default is "ascending" if "item-order" is specified, and "descending" if "frequency-order" is specified.

Only one of "eager" or "lazy" may be specified in the options parameter. If neither "eager" nor "lazy" is specified, then the default is "eager" if "frequency-order" or "map" is specified, otherwise "lazy".

Only one of "any", "document", "properties", or "locks" may be specified in the options parameter. If none of "any", "document", "properties", or "locks" are specified and there is a $query parameter, then the default is "document". If there is no $query parameter then the default is "any".

Only one of the "score-logtfidf", "score-logtf", "score-simple", "score-random", or "score-zero" options may be specified in the options parameter. If none of "score-logtfidf", "score-logtf", "score-simple", "score-random", or "score-zero" are specified, then the default is "score-logtfidf".

Only one of the "checked" or "unchecked" options may be specified in the options parameter. If neither "checked" nor "unchecked" are specified, then the default is "checked".

If "collation=URI" is not specified in the options parameter, then the default collation is used. If a lexicon with that collation does not exist, an error is thrown.

If "sample=N" is not specfied in the options parameter, then all included co-occurrences may be returned. If a $query parameter is not present, then "sample=N" has no effect.

If "truncate=N" is not specfied in the options parameter, then co-occurrences from all fragments selected by the $query parameter are included. If a $query parameter is not present, then "truncate=N" has no effect.

Example

//     This query has the database fragmented on SPEECH and
//     finds the first 3 SPEAKERs that co-occur in a SPEECH
//     in the play Hamlet.
//     Requires an element range index on SPEAKER with range
//     value positions enabled on the range index.
  
fn.subsequence(
  cts.elementValueCoOccurrences(
    xs.QName("SPEAKER"), xs.QName("SPEAKER"),
    ["frequency-order","ordered"],
    cts.documentQuery("/shakespeare/plays/hamlet.xml")), 1, 3);
  =>
["MARCELLUS", "BERNARDO"]
["ROSENCRANTZ", "GUILDENSTERN"]
["HORATIO", "MARCELLUS"]

Example

//     this query has the database fragmented on SPEECH and
//     finds SPEAKERs that co-occur in a SPEECH in the play Hamlet, 
//     returned as a map
  
cts.elementValueCoOccurrences(
    xs.QName("SPEAKER"), xs.QName("SPEAKER"),
    ["frequency-order","ordered", "map"],
    cts.documentQuery("/shakespeare/plays/hamlet.xml"))

  =>
{
        "HORATIO":"MARCELLUS", 
        "CORNELIUS":"VOLTIMAND", 
        "MARCELLUS":
                ["BERNARDO", "HORATIO"], 
        "ROSENCRANTZ":"GUILDENSTERN"
}

Example

// This example uses the co-occurrences between the URI lexicon
// and an element range index to effectively join documents together.

// Load sample data 
declareUpdate();
xdmp.documentInsert("/test1.xml", xdmp.unquote(
'  <test1><hello>this is a value</hello></test1>').next().value);
xdmp.documentInsert("/test2.xml", xdmp.unquote(
'  <test2><hello>this is a value</hello></test2>').next().value);
xdmp.documentInsert("/test3.xml", xdmp.unquote(
'  <test3><hello>this is a different value</hello></test3>').next().value);

**********
// Requires an element range index on 'hello' and the URI lexicon.
// This query finds 'hello' element values that occur in more than one 
// document. It is an effecient way to join documents using range indexes.

var x =
  cts.elementValueCoOccurrences(
  // note the special xdmp:document QName for the URI lexicon 
    xs.QName("hello"), xs.QName("xdmp:document"),
  // uses the "map" option, so returns the results as a JavaScript object
    ["frequency-order", "map",
     "collation-1=http://marklogic.com/collation/",
     "collation-2=http://marklogic.com/collation/codepoint"]);
var res = new Array();
// Iterate through each property and return the ones that have 
// 2 or more values.
for (y in x) {
  if (x[y].length > 1) {
    var o = new Object();
    o[y] = x[y];
    res.push(o);}
};
res;

// returns the values that occur in more than one document (URI) 

=>
[{"this is a value":["/test1.xml", "/test2.xml"]}]

Comments

    Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy