options |
The options are based on the open source HTML Tidy configuration options,
available at http://tidy.sourceforge.net/docs/quickref.html.
Most of the tidy options are available through this function, with
the following exceptions:
- The character encoding for the output is always UTF-8.
- The filesystem options which allow you to specify where to save output
are not supported (although there are many ways to achieve this through
functions such as
xdmp:save
).
- The output is always XHTML.
- Entities except for the built-in HTML entities will be always be
output in numeric form.
You can specify options as either an XML element
in the "xdmp:tidy" namespace, or as a map:map . The
options names below are XML element localnames. When using a map,
replace the hyphens with camel casing. For example, "an-option"
becomes "anOption" when used as a map:map key.
This function supports the following options:
HTML, XHTML, and XML Options
<add-xml-decl >
- Default Value:
no
Description: This option specifies if Tidy should add the XML
declaration when outputting XML or XHTML. Note that if the input
already includes an <?xml ... ?> declaration then
this option will be ignored.
<add-xml-space >
- Default Value:
no
Description: This option specifies if Tidy should add
xml:space="preserve" to elements such as <PRE>,
<STYLE> and <SCRIPT> when generating XML. This is needed if
the whitespace in such elements is to be parsed appropriately without
having access to the DTD.
<alt-text >
- Default Value: n/a
Description: This option specifies the default "alt=" text Tidy uses
for <IMG> attributes. This feature is dangerous as it suppresses
further accessibility warnings. You are responsible for making your
documents accessible to people who can not see the images!
<assume-xml-procins >
- Default Value:
no
Description: This option specifies if Tidy should change the parsing
of processing instructions to require ?> as the terminator rather
than >. This option is automatically set if the input is in XML.
<bare >
- Default Value:
no
Description: This option specifies if Tidy should strip Microsoft specific
HTML from Word 2000 documents, and output spaces rather than
non-breaking spaces where they exist in the input.
<clean >
- Default Value:
no
Description: This option specifies if Tidy should strip out surplus
presentational tags and attributes replacing them by style rules and
structural markup as appropriate. It works well on the HTML saved by
Microsoft Office products.
<css-prefix >
- Default Value: n/a
Description: This option specifies the prefix that Tidy uses for styles
rules. By default, "c" will be used.
<doctype >
- Default Value:
auto
Possible Values: auto , omit , strict ,
loose , transitional , or user-specified fpi string
Description:
This option specifies the DOCTYPE declaration generated by Tidy.
If set to omit the output won't contain a DOCTYPE declaration.
If set to auto (the default) Tidy will use an educated
guess based upon the contents of the document. If set to
strict ,
Tidy will set the DOCTYPE to the strict DTD. If set to loose ,
the DOCTYPE is set to the loose (transitional) DTD. Alternatively, you can
supply a string for the formal public identifier (FPI). For example:
doctype: "-//ACME//DTD HTML 3.14159//EN"
If you specify the FPI for an XHTML document, Tidy will set the
system identifier to the empty string. Tidy leaves the DOCTYPE for
generic XML documents unchanged. Specifying a doctype of omit
implies that the numeric-entities option is set to yes .
<drop-empty-paras >
- Default Value:
yes
Description:
This option specifies if Tidy should discard empty paragraphs. If
set to no, empty paragraphs are replaced by a pair of <BR>
elements as HTML4 precludes empty paragraphs.
<drop-font-tags >
- Default Value:
no
Description:
This option specifies if Tidy should discard <FONT>
and <CENTER> tags without creating the corresponding
style rules. This option can be set independently of the clean option.
<drop-proprietary-attributes >
- Default Value:
no
Description:
This option specifies if Tidy should strip out proprietary attributes,
such as MS data binding attributes.
<enclose-block-text >
- Default Value:
no
Description:
This option specifies if Tidy should insert a <P> element to enclose
any text it finds in any element that allows mixed content for HTML
transitional but not HTML strict.
<enclose-text >
- Default Value:
no
Description:
This option specifies if Tidy should enclose any text it finds in
the body element within a <P> element. This is useful when you want
to take existing HTML and use it with a style sheet.
<escape-cdata >
- Default Value:
no
Description:
This option specifies if Tidy should convert <![CDATA[]]>
sections to normal text.
<fix-backslash >
- Default Value:
yes
Description:
This option specifies if Tidy should replace backslash characters
"\" in URLs by forward slashes "/".
<fix-bad-comments >
- Default Value:
yes
Description:
This option specifies if Tidy should replace unexpected hyphens
with "=" characters when it comes across adjacent hyphens. The
default is yes. This option is provided for users of Cold Fusion
which uses the comment syntax: <!--- --->
<fix-uri >
- Default Value:
yes
Description:
This option specifies if Tidy should check attribute values that carry
URIs for illegal characters and if such are found, escape them as
HTML 4 recommends.
<hide-comments >
- Default Value:
no
Description:
This option specifies if Tidy should print out comments.
<hide-endtags >
- Default Value:
no
Description:
This option specifies if Tidy should omit optional end-tags when
generating the pretty printed markup. This option is ignored if
you are outputting to XML.
<indent-cdata >
- Default Value:
no
Description:
This option specifies if Tidy should indent <![CDATA[]]>
sections.
<input-xml >
- Default Value:
no
Description:
This option specifies if Tidy should use the XML parser rather than
the error correcting HTML parser.
<join-classes >
- Default Value:
no
Description:
This option specifies if Tidy should combine class names to generate a
single new class name, if multiple class assignments are detected on
an element.
<join-styles >
- Default Value:
yes
Description:
This option specifies if Tidy should combine styles to generate a
single new style, if multiple style values are detected on an element.
<literal-attributes >
- Default Value:
no
Description:
This option specifies if Tidy should ensure that whitespace characters
within attribute values are passed through unchanged.
<logical-emphasis >
- Default Value:
no
Description:
This option specifies if Tidy should replace any occurrence of <I>
by <EM> and any occurrence of <B> by <STRONG>. In both
cases, the attributes are preserved unchanged. This option can be set
independently of the clean and drop-font-tags options.
<lower-literals >
- Default Value:
yes
Description:
This option specifies if Tidy should convert the value of an attribute
that takes a list of predefined values to lower case. This is required for
XHTML documents.
<merge-divs >
- Default Value:
yes
Description:
Can be used to modify behavior of setting the clean option
to yes . This option specifies if Tidy should merge
nested <div> such as
<div><div>...</div></div> .
<ncr >
- Default Value:
yes
Description:
This option specifies if Tidy should allow numeric character
references.
<new-blocklevel-tags >
- Default Value: none
Description:
This option specifies new block-level tags. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy will
refuse to generate a tidied file if the input includes previously unknown
tags. Note you can't change the content model for elements such
as <TABLE>, <UL>, <OL> and <DL>.
<new-empty-tags >
- Default Value: none
Description:
This option specifies new empty inline tags. This option takes a space
or comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Remember to also declare empty tags as either inline or
blocklevel.
<new-inline-tags >
- Default Value: none
Description:
This option specifies new non-empty inline tags. This option takes a
space or comma separated list of tag names. Unless you declare new tags,
Tidy will refuse to generate a tidied file if the input includes
previously unknown tags.
<new-pre-tags >
- Default Value: none
Description:
This option specifies new tags that are to be processed in exactly the
same way as HTML's <PRE> element. This option takes a space or
comma separated list of tag names. Unless you declare new tags, Tidy
will refuse to generate a tidied file if the input includes previously
unknown tags. Note you can not as yet add new CDATA elements (similar
to <SCRIPT>).
<numeric-entities >
- Default Value:
no
Description:
This option specifies if Tidy should output entities other than the
built-in HTML entities (&, <, > and ") in the numeric
rather than the named entity form.
<output-html >
- Default Value:
no
Description:
This option specifies if Tidy should generate pretty printed output,
writing it as HTML.
<output-xhtml >
- Default Value:
yes
Description:
This option specifies if Tidy should generate pretty printed output,
writing it as extensible HTML. This option causes Tidy to set the
DOCTYPE and default namespace as appropriate to XHTML. If a DOCTYPE or
namespace is given they will checked for consistency with the content
of the document. In the case of an inconsistency, the corrected values
will appear in the output. For XHTML, entities can be written as named
or numeric entities according to the setting of the
numeric-entities option. The original case of tags and
attributes will be preserved, regardless of other options.
<output-xml >
- Default Value:
yes
Description:
This option specifies if Tidy should pretty print output, writing it as
well-formed XML. Any entities not defined in XML 1.0 will be written
as numeric entities to allow them to be parsed by a XML parser. The
original case of tags and attributes will be preserved, regardless
of other options.
<quote-ampersand >
- Default Value:
yes
Description:
This option specifies if Tidy should output unadorned & characters
as &.
<quote-marks >
- Default Value:
no
Description:
This option specifies if Tidy should output " characters as " as
is preferred by some editing environments. The apostrophe character '
is written out as ' since many web browsers don't yet
support '.
<quote-nbsp >
- Default Value:
yes
Description:
This option specifies if Tidy should output non-breaking space characters
as entities, rather than as the Unicode character value 160 (decimal).
<repeated-attributes >
- Default Value:
keep-last
Possible Values:keep-first , keep-last
Description:
This option specifies if Tidy should keep the first or last attribute,
if an attribute is repeated (for example, if a tag has has two
align attributes.
<replace-color >
- Default Value:
no
Description:
This option specifies if Tidy should replace numeric values in color
attributes by HTML/XHTML color names where defined, e.g. replace
"#ffffff" with "white".
<show-body-only >
- Default Value:
no
Description:
This option specifies if Tidy should print only the contents of the body
tag as an HTML fragment. Useful for incorporating existing whole pages
as a portion of another page.
<uppercase-attributes >
- Default Value:
no
Description:
This option specifies if Tidy should output attribute names in upper case.
The default is no, which results in lower case attribute names, except
for XML input, where the original case is preserved.
<uppercase-tags >
- Default Value:
no
Description:
This option specifies if Tidy should output tag names in upper case.
The default is no, which results in lower case tag names, except for
XML input, where the original case is preserved.
<word-2000 >
- Default Value:
no
Description:
This option specifies if Tidy should go to great pains to strip out all
the surplus stuff Microsoft Word 2000 inserts when you save Word
documents as "Web pages". Doesn't handle embedded images or VML.
Diagnostic Options
<accessibility-check >
- Default Value: 0
Possible Values: 0, 1, 2, or 3
Description:
This option specifies what level of accessibility checking, if any,
that Tidy should do. Level 0 is equivalent to Tidy Classic's
accessibility checking. For more information on Tidy's accessibility
checking, see the web site for the
Adaptive Technology Resource Centre at the University of Toronto.
<show-errors >
- Default Value:
6
Possible Values: Any integer.
Description:
This option specifies the number Tidy uses to determine if further
errors should be shown. If set to 0, then no errors are shown.
<show-warnings >
- Default Value:
yes
Description:
This option specifies if Tidy should suppress warnings. This is
useful when a few errors are hidden between many warning messages.
Pretty Print Options
<break-before-br >
- Default Value:
no
Description:
This option specifies if Tidy should output a line break before each
<BR> element.
<indent >
- Default Value:
no
Possible Values: no , yes , auto
Description:
This option specifies if Tidy should indent block-level tags. If set
to auto , this option causes Tidy to decide whether or not
to indent the content of tags such as TITLE, H1-H6, LI, TD, TD, or P
depending on whether or not the content includes a block-level element.
You are advised to avoid setting indent to yes as this
can expose layout bugs in some browsers.
<indent-attributes >
- Default Value:
no
Description:
This option specifies if Tidy should begin each attribute on a new
line.
<indent-spaces >
- Default Value:
2
Possible Values: Any integer.
Description:
This option specifies the number of spaces Tidy uses to indent content,
when indentation is enabled.
<markup >
- Default Value:
yes
Description:
This option specifies if Tidy should generate a pretty printed version
of the markup. Note that Tidy won't generate a pretty printed version
if it finds significant errors (see force-output).
<punctuation-wrap >
- Default Value:
no
Description:
This option specifies if Tidy should line wrap after some Unicode or
Chinese punctuation characters.
<split >
- Default Value:
no
Description:
This option specifies if Tidy should create a sequence of slides from
the input, splitting the markup prior to each successive <H2>.
The slides are written to "slide001.html", "slide002.html" etc.
<tab-size >
- Default Value: 8
Possible Values: Any integer.
Description:
This option specifies the number of columns that Tidy uses between
successive tab stops. It is used to map tabs to spaces when reading
the input. Tidy never outputs tabs.
<vertical-space >
- Default Value:
no
Description:
This option specifies if Tidy should add some empty lines for
readability.
<wrap >
- Default Value: 68
Possible Values: Any integer.
Description:
This option specifies the right margin Tidy uses for line wrapping.
Tidy tries to wrap lines so that they do not exceed this length.
Set wrap to zero if you want to disable line wrapping.
<wrap-asp >
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
ASP pseudo elements, which look as follows:
<% ... %> .
<wrap-attributes >
- Default Value:
no
Description:
This option specifies if Tidy should line wrap attribute values,
for easier editing. This option can be set independently of
wrap-script-literals.
<wrap-jste >
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
JSTE pseudo elements, which look as follows:
<# ... #> .
<wrap-php >
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained within
PHP pseudo elements, which look as follows:
<?php ... ?> .
<wrap-script-literals >
- Default Value:
no
Description:
This option specifies if Tidy should line wrap string literals that
appear in script attributes. Tidy wraps long script string literals
by inserting a backslash character before the line break.
<wrap-sections >
- Default Value:
yes
Description:
This option specifies if Tidy should line wrap text contained
within <![ ... ]> section tags.
Miscellaneous Options
<force-output >
- Default Value:
no
Description:
This option specifies if Tidy should produce output even if errors
are encountered. Use this option with care - if Tidy reports an error,
this means Tidy was not able to, or is not sure how to, fix the
error, so the resulting output may not be what you expect.
<keep-time >
- Default Value:
no
Description:
This option specifies if Tidy should keep the original modification
time of files that Tidy modifies in place. The default is no. Setting
the option to yes allows you to tidy files without causing these files
to be uploaded to a web server when using a tool such as SiteCopy.
Note this feature is not supported on some platforms.
<quiet >
- Default Value:
no
Description:
This option specifies if Tidy should output the summary of the
numbers of errors and warnings, or the welcome or
informational messages.
<tidy-mark >
- Default Value:
yes
Description:
This option specifies if Tidy should add a meta element to the
document head to indicate that the document has been tidied.
Tidy won't add a meta element if one is already present.
|
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.