xdmp:encoding-language-detect( $document as node() ) as element()*
Analyzes binary, text, or XML data and suggests possible pairs of encoding and language, with a confidence score for each pair. Scores of 10 and above are high confidence recommendations. The results are given in order of decreasing score. Accuracy may be poor for short documents.
If the input is very small (for example, less than two words), then this built-in returns the empty sequence.
For best results, the input should be at least several hundred bytes.
xdmp:encoding-language-detect(xdmp:document-get("/tmp/unknown.dat")) => <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>windows-1252</encoding> <language>en</language> <score>9.834</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>windows-1252</encoding> <language>it</language> <score>8.976</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>windows-1250</encoding> <language>sl</language> <score>8.265</score> </encoding-language> ...