MarkLogic 12 EA 1 Product Documentation
xdmp:word-convert

xdmp:word-convert(
   $doc as node(),
   $filename as xs:string,
   [$options as (element()|map:map)?]
) as node()*

Summary

Converts a Microsoft Word document to XHTML. Returns several nodes, including a parts node, the converted document xml node, and any other document parts (for example, css files and images). The first node is the parts node, which contains a manifest of all of the parts generated as result of the conversion. Does not convert Microsoft Office 2007 and later documents.

Parameters
doc Microsoft Word document to convert to HTML, as binary node().
filename The root for the name of the converted files and directories. If the specified filename includes an extension, then the extension is appended to the root with an underscore. The directory for other parts of the conversion (images, for example) has the string "_parts" appended to the root. For example, if you specify a filename of "myFile.doc", the generated names will be "myFile_doc.xhtml" for the xml node and "myFile_doc_parts" for the directory containing the any other parts generated by the conversion (images, css files, and so on).
options Options with which to customize this operation. You can specify options as either an XML options element in the "xdmp:word-convert" namespace, or as a map:map. The options names below are XML element localnames. When using a map, replace any hyphens with camel casing. For example, "an-option" becomes "anOption" when used as a map:map key. This function supports the following options, plus the options from the xdmp:tidy function.
tidy
Specify true to run tidy on the document and false not to run tidy. If you run tidy, you can also include xdmp:tidy options. Any tidy option elements must be in the xdmp:tidy namespace.
compact
Specify true to produce "compact" HTML, that is, without style information. The default is false.

Usage Notes

This function is part of a separate package which may generate temporary files. These temporary files are not supported by encryption at rest.

This function is not available on Mac OS X.

This function requires separate converter installation package starting with release 9.0-4, see MarkLogic Converters Installation Changes Starting at Release 9.0-4 in the Installation Guide for All Platforms.

This function supports the following file formats: Microsoft Word 97/98 (Mac)/2000/2001 (Mac)/XP/2003 (Native format).

The convert functions return several nodes. The first node is a manifest containing the various parts of the conversion. Typically there will be an xml part, a css part, and some image parts. Each part is returned as a separate node in the order shown in the manifest.

Therefore, given the following manifest:

<parts>
  <part>myFile_doc.xhtml</part>
  <part>myFile_doc_parts/conv.css</part>
  <part>myFile_doc_parts/toc.xml</part>
</parts>

the first node of the returned query is the manifest, the second is the "myFile_doc.xhtml" node, the third is the "myFile_doc_parts/conv.css" node, and the fourth is the myFile_doc_parts/toc.xml node.

Example

(: This example uses a combination of xdmp:word-convert options ("tidy")
 : and xdmp:tidy options ("clean"), expressed as an options element. :)
let $results := 
  xdmp:word-convert(
    xdmp:document-get("myFile.doc"),
    "myFile.doc",
    <options xmlns="xdmp:word-convert" xmlns:tidy="xdmp:tidy">
      <tidy>true</tidy>
      <tidy:clean>yes</tidy:clean>
    </options>
 ),
 $manifest := $results[1]
return $results[2 to last()]

(: returns all of the converted nodes :)

Example

(: This example uses a combination of xdmp:word-convert options ("tidy")
 : and xdmp:tidy options ("clean"), expressed as a map:map. :)
let $results := 
  xdmp:word-convert(
    xdmp:document-get("myFile.doc"),
    "myFile.doc",
    map:map() => map:with("tidy", fn:true())
              => map:with("clean", "yes")

 ),
 $manifest := $results[1]
return $results[2 to last()]

(: returns all of the converted nodes :)
Powered by MarkLogic Server | Terms of Use | Privacy Policy