Loading TOC...

MarkLogic 9 Product Documentation
xdmp:encoding-language-detect

xdmp:encoding-language-detect(
   $document as node()
) as element()*

Summary

Analyzes binary, text, or XML data and suggests possible pairs of encoding and language, with a confidence score for each pair. Scores of 10 and above are high confidence recommendations. The results are given in order of decreasing score. Accuracy may be poor for short documents.

Parameters
document Node to be analyzed for possible encodings and languages. If the node is an XML element or document node, the function takes the string value of the specified node (equivalent of fn:string($node)) to detect the encoding and language.

Usage Notes

If the input is very small (for example, less than two words), then this built-in returns the empty sequence.

For best results, the input should be at least several hundred bytes.

Example

xdmp:encoding-language-detect(xdmp:document-get("/tmp/unknown.dat"))
=>
<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>windows-1252</encoding>
  <language>en</language>
  <score>9.834</score>
</encoding-language>
<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>windows-1252</encoding>
  <language>it</language>
  <score>8.976</score>
</encoding-language>
<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>windows-1250</encoding>
  <language>sl</language>
  <score>8.265</score>
</encoding-language>
...

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.