MarkLogic Server 11.0 Product Documentation
xdmp.encodingLanguageDetect

xdmp.encodingLanguageDetect(
   document as Node
) as Array

Summary

Analyzes binary, text, or XML data and suggests possible pairs of encoding and language, with a confidence score for each pair. Scores of 10 and above are high confidence recommendations. The results are given in order of decreasing score. Accuracy may be poor for short documents.

Parameters
document	Node to be analyzed for possible encodings and languages. If the node is an XML element or document node, the function takes the string value of the specified node (equivalent of `fn:string($node)`) to detect the encoding and language.

Usage Notes

If the input is very small (for example, less than two words), then this built-in returns the empty sequence.

For best results, the input should be at least several hundred bytes.

Example

xdmp.encodingLanguageDetect(xdmp.documentGet("/space/appserver/test.sjs"));
=>
[
  {"encoding":"utf-8","language":"en","score":10.4685125490417},
  {"encoding":"utf-8","language":"ro","score":10.2732191159561},
  {"encoding":"utf-8","language":"fr","score":9.73656934079629},
  ...
]

Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

MarkLogic Server 11.0 Product Documentationxdmp.encodingLanguageDetect

Summary

Usage Notes

Example

MarkLogic Server 11.0 Product Documentation
xdmp.encodingLanguageDetect