MarkLogic 10 Product Documentation
xdmp.encodingLanguageDetect

xdmp.encodingLanguageDetect(
   document as Node
) as Array

Summary

Analyzes binary, text, or XML data and suggests possible pairs of encoding and language, with a confidence score for each pair. Scores of 10 and above are high confidence recommendations. The results are given in order of decreasing score. Accuracy may be poor for short documents.

Parameters
document Node to be analyzed for possible encodings and languages. If the node is an XML element or document node, the function takes the string value of the specified node (equivalent of fn:string($node)) to detect the encoding and language.

Usage Notes

If the input is very small (for example, less than two words), then this built-in returns the empty sequence.

For best results, the input should be at least several hundred bytes.

Example

xdmp.encodingLanguageDetect(xdmp.documentGet("/space/appserver/test.sjs"));
=>
[
  {"encoding":"utf-8","language":"en","score":10.4685125490417},
  {"encoding":"utf-8","language":"ro","score":10.2732191159561},
  {"encoding":"utf-8","language":"fr","score":9.73656934079629},
  ...
]
Powered by MarkLogic Server | Terms of Use | Privacy Policy