Loading TOC...

xdmp.wordConvert

xdmp.wordConvert(
   $doc as Node,
   $filename as String,
   [$options as Object?]
) as ValueIterator

Summary

Converts a Microsoft Word document to XHTML. Returns several nodes, including a parts node, the converted document xml node, and any other document parts (for example, css files and images). The first node is the parts node, which contains a manefest of all of the parts generated as result of the conversion. Does not convert Microsoft Office 2007 and later documents.

Parameters
$doc Microsoft Word document to convert to HTML, as binary node().
$filename The root for the name of the converted files and directories. If the specified filename includes an extension, then the extension is appended to the root with an underscore. The directory for other parts of the conversion (images, for example) has the string "_parts" appended to the root. For example, if you specify a filename of "myFile.doc", the generated names will be "myFile_doc.xhtml" for the xml node and "myFile_doc_parts" for the directory containing the any other parts generated by the conversion (images, css files, and so on).
$options The options object for this conversion. In addition to the options shown below, you can freely add xdmp.tidy options.

Options include:

tidy

Specify true to run tidy on the document and false not to run tidy. If you run tidy, you can also specify any xdmp.tidy options.

compact

Specify true to produce "compact" HTML, that is, without style information. The default is false.

Sample Options:

The following is a sample options object which specifies that tidy should be used to clean the generated html and specifies to use the tidy "clean" option for the conversion:
{
  "tidy":true,
  // "clean" is a 'tidy' package option (which is passed through) and 
  // uses "yes"/"no"
  "clean":"yes" 
}

Usage Notes

The convert functions return several nodes. The first node is a manifest containing the various parts of the conversion. Typically there will be an xml part, a css part, and some image parts. Each part is returned as a separate node in the order shown in the manifest.

Therefore, given the following manifest:

<parts>
  <part>myFile_doc.xhtml</part>
  <part>myFile_doc_parts/conv.css</part>
  <part>myFile_doc_parts/toc.xml</part>
</parts>

the first node of the returned query is the manifest, the second is the "myFile_doc.xhtml" node, the third is the "myFile_doc_parts/conv.css" node, and the fourth is the myFile_doc_parts/toc.xml node.

Example

var results = xdmp.wordConvert(
                xdmp.documentGet("/space/Hello.doc"),
                "Hello.doc");
var manifest= results.next().value;
var wordAsXHTML = results.next().value;
wordAsXHTML;

=> The word document converted as xhtml.  The results variable
   is a ValueIterator, where the first item is the manifest, and the 
   remaining items are the converted nodes.

Comments

    Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy