public interface MarkLogicConstants
Use the mapreduce.marklogic.input.*
properties when
using MarkLogic Server as an input source. Use the
mapreduce.marklogic.output.*
properties when using
MarkLogic Server to store your results.
Modifier and Type | Field and Description |
---|---|
static String |
ADVANCED_MODE
Value string of advanced mode for
input.mode . |
static String |
ASSIGNMENT_POLICY
The config property name
(
"mapreduce.marklogic.output.assignmentpolicy" ) which,
if set, indicates assignment policy for output documents. |
static String |
BASIC_MODE
Value string of basic mode for
input.mode . |
static String |
BATCH_SIZE
The config property name
(
"mapreduce.marklogic.output.batchsize" ) which, if
set, indicates the number of records in one request. |
static String |
BIND_SPLIT_RANGE
The config property name
(
"mapreduce.marklogic.input.bindsplitrange" ) which, if
set to true, specifies that the input query declares and references
external variables
"splitstart" and
"splitend" under the namespace
"http://marklogic.com/hadoop". |
static String |
COLLECTION_FILTER
The config property name
(
"mapreduce.marklogic.input.filter.collection" ) which,
if set, indicates to only include documents with one or many of
specified collection URIs when using ForestInputFormat . |
static String |
CONTENT_TYPE
The config property name
(
"mapreduce.marklogic.output.content.type" ) which, if
set, indicates type of content to be inserted when using
ContentOutputFormat. |
static String |
COPY_COLLECTIONS
The config property name
(
"mapreduce.marklogic.copycollections" ) which, if set,
specifies whether to copy document collections from source to
destination. |
static String |
COPY_METADATA
The config property name
(
"mapreduce.marklogic.copymetadata" ) which, if set,
specifies whether to copy document metadata from source to
destination. |
static String |
COPY_QUALITY
The config property name
(
"mapreduce.marklogic.copyquality" ) which, if set,
specifies whether to copy document quality from source to
destination. |
static int |
DEFAULT_BATCH_SIZE
Default batch size.
|
static String |
DEFAULT_CONTENT_TYPE
Default content type.
|
static long |
DEFAULT_LOCAL_MAX_SPLIT_SIZE
The default maximum split size for input splits,
used if
input.maxsplitsize is not specified and running in
local mode. |
static long |
DEFAULT_MAX_SPLIT_SIZE
The default maximum split size for input splits,
used if
input.maxsplitsize is not specified. |
static String |
DEFAULT_OUTPUT_CONTENT_ENCODING
Default output content encoding
|
static String |
DEFAULT_OUTPUT_XML_REPAIR_LEVEL
Default output XML repair level
|
static String |
DEFAULT_PROPERTY_OPERATION_TYPE
Default property operation type.
|
static int |
DEFAULT_TXN_SIZE
Default transaction size.
|
static String |
DIRECTORY_FILTER
The config property name
(
"mapreduce.marklogic.input.filter.directory" ) which,
if set, indicates to only include documents with one of specified
directory URIs when using ForestInputFormat . |
static String |
DOCUMENT_SELECTOR
The config property name
(
"mapreduce.marklogic.input.documentselector" ) which,
if set, specifies the document selection portion of the path
expression used to retrieve data from the server. |
static String |
EXECUTION_MODE
The config property name
(
"mapreduce.marklogic.mode" ) which, if set, indicates
whether the job is running in local or distributed mode. |
static String |
EXTRACT_URI |
static String |
INDENTED
The config property name
(
"mapreduce.marklogic.input.indented" ) which, if set,
specifies whether to format data with indentation retrieved from
MarkLogic. |
static String |
INPUT_DATABASE_NAME
Not yet Implemented.
|
static String |
INPUT_HOST
The config property name
(
"mapreduce.marklogic.input.host" ) which, if set,
specifies the MarkLogic Server host to use for input
operations. |
static String |
INPUT_KEY_CLASS
The config property name
(
"mapreduce.marklogic.input.keyclass" ) which, if set,
specifies the name of the class of the map input keys for KeyValueInputFormat . |
static String |
INPUT_LEXICON_FUNCTION_CLASS
The config property name
(
"mapreduce.marklogic.input.lexiconfunctionclass" )
which, if set, specifies the name of the class implementing
LexiconFunction
which will be used to generate input. |
static String |
INPUT_MODE
The config property name
(
"mapreduce.marklogic.input.mode" ) which, if set,
specifies whether to use basic or advanced input query mode. |
static String |
INPUT_PASSWORD
The config property name
(
"mapreduce.marklogic.input.password" ) which, if set,
specifies the cleartext password to use for authentication with
input.username . |
static String |
INPUT_PORT
The config property name
(
"mapreduce.marklogic.input.port" ) which, if set,
specifies the port number of the input XDBC server on the MarkLogic
Server host specified by the
input.host property. |
static String |
INPUT_QUERY
The config property name
(
"mapreduce.marklogic.input.query" ) which, if set,
specifies the query used to retrieve input records from MarkLogic
Server. |
static String |
INPUT_QUERY_LANGUAGE
The config property name
(
"mapreduce.marklogic.input.querylanguage" ) which, if
set, specifies the query language will be used for input query and
split query. |
static String |
INPUT_QUERY_TIMESTAMP
The config property name
(
"mapreduce.marklogic.input.querytimestamp" ) which, if
set, specifies data retrieval from MarkLogic Server at the
specified timestamp. |
static String |
INPUT_RESTRICT_HOSTS
The config property name
(
"mapreduce.marklogic.input.restricthosts" ) which, if
set, specifies whether to restrict input hosts that mlcp will
connect to. |
static String |
INPUT_SSL_OPTIONS_CLASS
The config property name
(
"mapreduce.marklogic.input.ssloptionsclass" ) which,
if set, specifies the name of the class implementing SslConfigOptions
which will be used if
input.ssl is set to true. |
static String |
INPUT_SSL_PROTOCOL
The config property name
(
"mapreduce.marklogic.input.sslprotocol" ) which, if
set, specifies the SSL protocol which will be used if
input.ssl is set to true. |
static String |
INPUT_USE_SSL
The config property name
(
"mapreduce.marklogic.input.usessl" ) which, if set,
specifies whether the connection to the input server is SSL
enabled; false is assumed if not set. |
static String |
INPUT_USERNAME
The config property name
(
"mapreduce.marklogic.input.username" ) which, if set,
specifies the MarkLogic Server user name under which input queries
and operations run. |
static String |
INPUT_VALUE_CLASS
The config property name
(
"mapreduce.marklogic.input.valueclass" ) which, if
set, specifies the name of the class of the map input value for
KeyValueInputFormat ,
ValueInputFormat
and DocumentInputFormat . |
static String |
MAX_SPLIT_SIZE
The config property name
(
"mapreduce.marklogic.input.maxsplitsize" ) which, if
set, specifies the maximum number of fragments per input
split. |
static long |
MIN_NODEUPDATE_VERSION
Minimum MarkLogic version to accept node-update
permissions.
|
static String |
MODE_DISTRIBUTED |
static String |
MODE_LOCAL |
static String |
MR_NAMESPACE
The namespace ("http://marklogic.com/hadoop") in
which the split range external variables are defined.
|
static String |
NODE_OPERATION_TYPE
The config property name
(
"mapreduce.marklogic.output.node.optype" ) which, if
set, indicates what node operation to perform during output. |
static String |
OUTPUT_CLEAN_DIR
The config property name
(
"mapreduce.marklogic.output.content.cleandir" ) which,
if set, indicates whether or not to remove the output
directory. |
static String |
OUTPUT_COLLECTION
The config property name
(
"mapreduce.marklogic.output.content.collection" )
which, if set, specifies a comma-separated list of collections to
which generated output documents are added. |
static String |
OUTPUT_CONTENT_ENCODING
The config property name
(
"mapreduce.marklogic.output.content.encoding" ) which,
if set, specifies the charset encoding to be used by the server
when loading this document. |
static String |
OUTPUT_CONTENT_LANGUAGE
The config property name
(
"mapreduce.marklogic.output.content.language" ) which,
if set, specifies the language name to associate with inserted
documents. |
static String |
OUTPUT_CONTENT_NAMESPACE
The config property name
(
"mapreduce.marklogic.output.content.namespace" )
which, if set, specifies the namespace to associate with inserted
documents. |
static String |
OUTPUT_DATABASE_NAME
The config property name
(
"mapreduce.marklogic.output.databasename" ) which, if
set, specifies the MarkLogic Server database to use for output
operations. |
static String |
OUTPUT_DIRECTORY
The config property name
(
"mapreduce.marklogic.output.content.directory" )
which, if set, specifies the MarkLogic Server database directory
where output documents are created. |
static String |
OUTPUT_FAST_LOAD
The config property name
(
"mapreduce.marklogic.output.content.fastload" ) which,
if set, indicates whether or not to use the fast load mode to load
content into MarkLogic. |
static String |
OUTPUT_FOREST_HOST
Internal use only.
|
static String |
OUTPUT_GRAPH
Default graph for rdf
|
static String |
OUTPUT_HOST
The config property name
(
"mapreduce.marklogic.output.host" ) which, if set,
specifies the MarkLogic Server host to use for output
operations. |
static String |
OUTPUT_KEY_TYPE
The config property name
(
"mapreduce.marklogic.output.keytype" ) which, if set,
specifies the data type of the output keys for KeyValueOutputFormat . |
static String |
OUTPUT_KEY_VARNAME
Value string of the output key external variable
name.
|
static String |
OUTPUT_NAMESPACE
The config property name
(
"mapreduce.marklogic.output.node.namespace" ) which,
if set, indicates the namespace used for output. |
static String |
OUTPUT_OVERRIDE_GRAPH
Graph overrided for rdf
|
static String |
OUTPUT_PARTITION
The config property name
(
"mapreduce.marklogic.output.partition" ) which, if
set, specifies the partition where output documents are
created. |
static String |
OUTPUT_PASSWORD
The config property name
(
"mapreduce.marklogic.output.password" ) which, if set,
specifies the cleartext password to use for authentication with
output.username . |
static String |
OUTPUT_PERMISSION
The config property name
(
"mapreduce.marklogic.output.content.permission" )
which, if set, specifies a comma-separated list role-capability
pairs to associate with created output documents. |
static String |
OUTPUT_PORT
The config property name
(
"mapreduce.marklogic.output.port" ) which, if set,
specifies the port number of the output MarkLogic Server specified
by the
input.host property. |
static String |
OUTPUT_PROPERTY_ALWAYS_CREATE
The config property name
(
"mapreduce.marklogic.output.property.alwayscreate" )
which, if set to true, causes PropertyOutputFormat
to create document properties for reduce output key-value pairs
even when no document exists with the target URI. |
static String |
OUTPUT_QUALITY
The config property name
(
"mapreduce.marklogic.output.content.quality" ) which,
if set, specifies the document quality for created output
documents. |
static String |
OUTPUT_QUERY
The config property name
(
"mapreduce.marklogic.output.query" ) which, if set,
specifies the statement to execute against MarkLogic Server. |
static String |
OUTPUT_QUERY_LANGUAGE
The config property name
(
"mapreduce.marklogic.output.querylanguage" ) which, if
set, specified the query language will be used for output
query. |
static String |
OUTPUT_RESTRICT_HOSTS
The config property name
(
"mapreduce.marklogic.output.restricthosts" ) which, if
set, specifies whether to restrict output hosts that mlcp will
connecot to. |
static String |
OUTPUT_SSL_OPTIONS_CLASS
The config property name
(
"mapreduce.marklogic.output.ssloptionsclass" ) which,
if set, specifies the name of the class implementing SslConfigOptions
which will be used if
output.usesslprotocol is set to SSLv3. |
static String |
OUTPUT_SSL_PROTOCOL
The config property name
(
"mapreduce.marklogic.output.sslprotocol" ) which, if
set, specifies SSL protocol which will be used if
output.usessl is set to true. |
static String |
OUTPUT_STREAMING
The config property name
(
"mapreduce.marklogic.output.content.streaming" )
which, if set, specifies whether to use streaming to insert
content. |
static String |
OUTPUT_URI_PREFIX
The config property name
(
"mapreduce.marklogic.output_uriprefix" ) which, if
set, specifies a string to prepend to all document URIs. |
static String |
OUTPUT_URI_REPLACE
The config property name
(
"mapreduce.marklogic.output.urireplace" ) which, if
set, specifies a comma separated list of regex pattern and string
pairs, 1st to match a uri segment, 2nd the string to replace with,
with the 2nd one in ''. |
static String |
OUTPUT_URI_SUFFIX
The config property name
(
"mapreduce.marklogic.output_urisuffix" ) which, if
set, specifies a string to append to all document URIs. |
static String |
OUTPUT_USE_SSL
The config property name
(
"mapreduce.marklogic.output.usessl" ) which, if set,
specifies whether the connection to the output server is SSL
enabled; false is assumed if not set. |
static String |
OUTPUT_USERNAME
The config property name
(
"mapreduce.marklogic.output.username" ) which, if set,
specifies the MarkLogic Server user name under which output
operations run. |
static String |
OUTPUT_VALUE_TYPE
The config property name
(
"mapreduce.marklogic.output.valuetype" ) which, if
set, specifies the data type of the map output value for KeyValueOutputFormat . |
static String |
OUTPUT_VALUE_VARNAME
Value string of the output value external
variable name.
|
static String |
OUTPUT_XML_REPAIR_LEVEL
The config property name
(
"mapreduce.marklogic.output.content.repairlevel" )
which, if set, specifies the document repair level for this options
object. |
static String |
PATH_NAMESPACE
The config property name
(
"mapreduce.marklogic.input.namespace" ) which, if set,
specifies a list of namespaces to use when evaluating the path
expression constructed from the
input.documentselector and
input.subdocumentexpr properties. |
static String |
PROPERTY_OPERATION_TYPE
The config property name
(
"mapreduce.marklogic.output.property.optype" ) which,
if set, indicates what property operation to perform during output
when using PropertyOutputFormat . |
static String |
QUERY_FILTER
The config property name
(
"mapreduce.marklogic.input.filter.query" ) which, if
set, indicates to only include documents matching the cts query
MarkLogicInputFormat . |
static String |
RECORD_TO_FRAGMENT_RATIO
The config property name
(
"mapreduce.marklogic.input.recordtofragmentratio" )
which, if set, specifies the ratio of the number of retrieved
records to the number of accessed fragments. |
static String |
REDACTION_RULE_COLLECTION
The config property name
(
"mapreduce.marklogic.input.redaction.rules" ) which,
if set, specifies a comma-separated list of redaction rule
collection URIs. |
static String |
SPLIT_END_VARNAME
Use this external variable name
(
"splitend" ) in your advanced mode input query to
access the end value of the record range in an input split when
"mapreduce.marklogic.input.bindsplitrange" is
true. |
static String |
SPLIT_QUERY
The config property name
(
"mapreduce.marklogic.input.splitquery" ) which, if
set, specifies the query MarkLogic Server uses to generate input
splits. |
static String |
SPLIT_START_VARNAME
Use this external variable name
(
"splitstart" ) in your advanced mode input query to
access the start value of the record range in an input split when
"mapreduce.marklogic.input.bindsplitrange" is
true. |
static String |
SUBDOCUMENT_EXPRESSION
The config property name
(
"mapreduce.marklogic.input.subdocumentexpr" ) which,
if set, specifies the path expression used to retrieve sub-document
records from the server. |
static String |
TEMPORAL_COLLECTION
The config property name
(
"mapreduce.marklogic.output.temporalcollection" )
which, if set, indicates temporal collection for documents. |
static String |
TXN_SIZE
The config property name
(
"mapreduce.marklogic.output.transactionsize" ) which,
if set, indicates the number of requests in one transaction. |
static String |
TYPE_FILTER
The config property name
(
"mapreduce.marklogic.input.filter.type" ) which, if
set, indicates to only include documents with one of specified
types when using ForestInputFormat . |
static final String INPUT_USERNAME
"mapreduce.marklogic.input.username"
) which, if set,
specifies the MarkLogic Server user name under which input queries
and operations run. Required if using MarkLogic Server for
input.static final String INPUT_PASSWORD
"mapreduce.marklogic.input.password"
) which, if set,
specifies the cleartext password to use for authentication with
input.username
. Required if using MarkLogic Server
for input.static final String INPUT_HOST
"mapreduce.marklogic.input.host"
) which, if set,
specifies the MarkLogic Server host to use for input operations.
Required if using MarkLogic Server for input.static final String INPUT_PORT
"mapreduce.marklogic.input.port"
) which, if set,
specifies the port number of the input XDBC server on the MarkLogic
Server host specified by the
input.host
property. Required if using MarkLogic
Server for input.
NOTE: Within a cluster, all nodes supplying MapReduce input data must use the same XDBC server port number.
static final String INPUT_USE_SSL
"mapreduce.marklogic.input.usessl"
) which, if set,
specifies whether the connection to the input server is SSL
enabled; false is assumed if not set.static final String INPUT_SSL_PROTOCOL
"mapreduce.marklogic.input.sslprotocol"
) which, if
set, specifies the SSL protocol which will be used if
input.ssl
is set to true.static final String INPUT_SSL_OPTIONS_CLASS
"mapreduce.marklogic.input.ssloptionsclass"
) which,
if set, specifies the name of the class implementing SslConfigOptions
which will be used if
input.ssl
is set to true.static final String DOCUMENT_SELECTOR
"mapreduce.marklogic.input.documentselector"
) which,
if set, specifies the document selection portion of the path
expression used to retrieve data from the server. Only used if
using MarkLogic Server for input in basic
mode.
The XQuery path expression step given in this property must
select a sequence of document nodes. To further refine the input
selection to nodes or values within the documents, use
input.subdocumentexpr
. If this property is not
set, fn:collection()
is used. For more information,
see the overview.
This property is only usable when basic
mode is
specified with the
input.mode
property. If more powerful input
customization is needed, use advanced
mode and specify
a complete input query with the
input.query
property.
The path expression step given in this property must be searchable. A searchable expression is one which can be optimized using indexes. See the Query and Performance Tuning Guide for more information on searchable path expressions.
The following selects all documents:
<property> <name>mapreduce.marklogic.input.documentselector</name> <value>fn:collection()</value> </property>
static final String SUBDOCUMENT_EXPRESSION
"mapreduce.marklogic.input.subdocumentexpr"
) which,
if set, specifies the path expression used to retrieve sub-document
records from the server. Used only if using MarkLogic Server for
input in basic
mode. If not set, the document nodes
selected by the
document selector
are used.
The XQuery path expression step given in this property should
select a sequence of nodes or atomic values from the set of
documents selected by the path step given in the
input.documentselector
property. For more
information, see the overview.
This property is only usable when basic
mode is
specified with the
input.mode
property. If more powerful input
customization is needed, use advanced
mode and specify
a complete input query with the
input.query
property.
The following would select all documents containing hrefs:
<property> <name>mapreduce.marklogic.input.documentselector</name> <value>fn:collection()</value> </property> <property> <name>mapreduce.marklogic.input.subdocumentexpr</name> <value>//wp:a[@href]</value> </property>
static final String INPUT_LEXICON_FUNCTION_CLASS
"mapreduce.marklogic.input.lexiconfunctionclass"
)
which, if set, specifies the name of the class implementing
LexiconFunction
which will be used to generate input.static final String PATH_NAMESPACE
"mapreduce.marklogic.input.namespace"
) which, if set,
specifies a list of namespaces to use when evaluating the path
expression constructed from the
input.documentselector
and
input.subdocumentexpr
properties.
Specify the namespaces as comma separated alias-URI pairs. For example:
<property> <name>mapreduce.marklogic.input.namespace</name> <value>wp, "http://www.mediawiki.org.xml/export-0.4/"</value> </property>
If a namespace URI includes a comma, you must set this property programmatically, rather than in a config file.
static final String SPLIT_QUERY
"mapreduce.marklogic.input.splitquery"
) which, if
set, specifies the query MarkLogic Server uses to generate input
splits. This property is required (and only usable) in
advanced
mode; see the
input.mode
property for details.
The split query must return a sequence of (forest id, record count, hostname) tuples. The host name and forest id identify the forest associated with the split. The count is an estimate of the number of key-value pairs in the split.
The default split query used in basic
input mode
computes a rough estimate based on the number of documents in the
database.
static final String MAX_SPLIT_SIZE
"mapreduce.marklogic.input.maxsplitsize"
) which, if
set, specifies the maximum number of fragments per input split.
Optional. Default:
50000L. The default should be suitable for most
applications.static final String INPUT_DATABASE_NAME
The config property name
("mapreduce.marklogic.input.databasename"
) which, if
set, specifies the name of the MarkLogic Server database from which
to create input splits.
static final String INPUT_KEY_CLASS
"mapreduce.marklogic.input.keyclass"
) which, if set,
specifies the name of the class of the map input keys for KeyValueInputFormat
.
Optional. Default: Text
.static final String INPUT_VALUE_CLASS
"mapreduce.marklogic.input.valueclass"
) which, if
set, specifies the name of the class of the map input value for
KeyValueInputFormat
,
ValueInputFormat
and DocumentInputFormat
.
Optional. Default: Text
for KeyValueInputFormat
and ValueInputFormat
,
DatabaseDocument
for DocumentInputFormat
.static final String INPUT_MODE
"mapreduce.marklogic.input.mode"
) which, if set,
specifies whether to use basic or advanced input query mode.
Allowable values are basic
and advanced
.
Optional. Default: basic
.
Only basic mode is supported at this time.
Basic mode enables use of the
input.documentselector
,
input.subdocumentexpr
, and
input.namespace
properties. Advanced mode enables
use of the
input.query
and
input.splitquery
properties.
static final String BASIC_MODE
input.mode
.static final String ADVANCED_MODE
input.mode
.static final String INPUT_QUERY
"mapreduce.marklogic.input.query"
) which, if set,
specifies the query used to retrieve input records from MarkLogic
Server. This property is required when advanced
is
specified in the
input.mode
property.
The value of this property must be a fully formed query,
suitable for evaluation by xdmp:eval
, and must return
a sequence. The items in the sequence depend on the
InputFormat
subclass configured for the job. For
details, see "Advanced Input Mode" in the Hadoop MapReduce
Connector Developer's Guide.
static final String INPUT_QUERY_TIMESTAMP
"mapreduce.marklogic.input.querytimestamp"
) which, if
set, specifies data retrieval from MarkLogic Server at the
specified timestamp. static final String BIND_SPLIT_RANGE
"mapreduce.marklogic.input.bindsplitrange"
) which, if
set to true, specifies that the input query declares and references
external variables
"splitstart"
and
"splitend"
under the namespace
"http://marklogic.com/hadoop". The connector binds to these
variables with the start and end of an input split instead of
constraining the query with the split range.
For details, see "Optimizing Your Input Query" in the Hadoop MapReduce Connector Developer's Guide.
static final String MR_NAMESPACE
The split range variables
"splitstart"
and
"splitend"
are in this namespace when using advanced
input mode and
"mapreduce.marklogic.input.bindsplitrange"
is true.
Declare a namespace prefix for this namespace in your input query
and qualify references to
"splitstart"
and
"splitend"
by the prefix. For details, see "Optimizing
Your Input Query" in the Hadoop MapReduce Connector Developer's
Guide.
static final String SPLIT_START_VARNAME
"splitstart"
) in your advanced mode input query to
access the start value of the record range in an input split when
"mapreduce.marklogic.input.bindsplitrange"
is true.
The variable must be declared and referenced in the namespace
"http://marklogic.com/hadoop"
. For details, see
"Optimizing Your Input Query" in the Hadoop MapReduce Connector
Developer's Guide.
static final String SPLIT_END_VARNAME
"splitend"
) in your advanced mode input query to
access the end value of the record range in an input split when
"mapreduce.marklogic.input.bindsplitrange"
is true.
The variable must be declared and referenced in the namespace
"http://marklogic.com/hadoop"
. For details, see
"Optimizing Your Input Query" in the Hadoop MapReduce Connector
Developer's Guide.
static final String RECORD_TO_FRAGMENT_RATIO
"mapreduce.marklogic.input.recordtofragmentratio"
)
which, if set, specifies the ratio of the number of retrieved
records to the number of accessed fragments. Optional. Default: 1.0
(one record per fragment) for documents, 100 for nodes and values.
The record to fragment ratio is used for progress estimate.
static final String INDENTED
"mapreduce.marklogic.input.indented"
) which, if set,
specifies whether to format data with indentation retrieved from
MarkLogic. Optional. Valid values: TRUE, FALSE, SERVERDEFAULT.
Default: false.static final String COLLECTION_FILTER
"mapreduce.marklogic.input.filter.collection"
) which,
if set, indicates to only include documents with one or many of
specified collection URIs when using ForestInputFormat
.static final String DIRECTORY_FILTER
"mapreduce.marklogic.input.filter.directory"
) which,
if set, indicates to only include documents with one of specified
directory URIs when using ForestInputFormat
.static final String QUERY_FILTER
"mapreduce.marklogic.input.filter.query"
) which, if
set, indicates to only include documents matching the cts query
MarkLogicInputFormat
.static final String TYPE_FILTER
"mapreduce.marklogic.input.filter.type"
) which, if
set, indicates to only include documents with one of specified
types when using ForestInputFormat
.static final String EXTRACT_URI
static final String OUTPUT_USERNAME
"mapreduce.marklogic.output.username"
) which, if set,
specifies the MarkLogic Server user name under which output
operations run. Required if using MarkLogic Server for
output.static final String OUTPUT_PASSWORD
"mapreduce.marklogic.output.password"
) which, if set,
specifies the cleartext password to use for authentication with
output.username
. Required if using MarkLogic
Server for output.static final String OUTPUT_HOST
"mapreduce.marklogic.output.host"
) which, if set,
specifies the MarkLogic Server host to use for output operations.
Required if using MarkLogic Server for output.static final String OUTPUT_FOREST_HOST
static final String OUTPUT_PORT
"mapreduce.marklogic.output.port"
) which, if set,
specifies the port number of the output MarkLogic Server specified
by the
input.host
property. Required if using MarkLogic
Server for output.static final String OUTPUT_DATABASE_NAME
"mapreduce.marklogic.output.databasename"
) which, if
set, specifies the MarkLogic Server database to use for output
operations. The default value is the target database assigned to
the AppServer. .static final String OUTPUT_USE_SSL
"mapreduce.marklogic.output.usessl"
) which, if set,
specifies whether the connection to the output server is SSL
enabled; false is assumed if not set.static final String OUTPUT_SSL_PROTOCOL
"mapreduce.marklogic.output.sslprotocol"
) which, if
set, specifies SSL protocol which will be used if
output.usessl
is set to true.static final String OUTPUT_SSL_OPTIONS_CLASS
"mapreduce.marklogic.output.ssloptionsclass"
) which,
if set, specifies the name of the class implementing SslConfigOptions
which will be used if
output.usesslprotocol
is set to SSLv3.static final String OUTPUT_DIRECTORY
"mapreduce.marklogic.output.content.directory"
)
which, if set, specifies the MarkLogic Server database directory
where output documents are created.
If
output.cleandir
is false (the default) then an
error occurs if the directory already exists. If
output.cleandir
is true, then the directory is
removed as part of the job submission process.
static final String OUTPUT_CONTENT_ENCODING
"mapreduce.marklogic.output.content.encoding"
) which,
if set, specifies the charset encoding to be used by the server
when loading this document. The encoding provided will be passed to
the server at document load time and must be a name that it
recognizes. The document byte stream will be transcoded to UTF-8
for storage.static final String DEFAULT_OUTPUT_CONTENT_ENCODING
static final String OUTPUT_COLLECTION
"mapreduce.marklogic.output.content.collection"
)
which, if set, specifies a comma-separated list of collections to
which generated output documents are added. Optional. Relevant only
when using MarkLogic Server for output with ContentOutputFormat
.
Example:
<property> <name>mapreduce.marklogic.output.content.collection</name> <value>latest,top10</value> </property>
static final String OUTPUT_GRAPH
static final String OUTPUT_OVERRIDE_GRAPH
static final String OUTPUT_PERMISSION
"mapreduce.marklogic.output.content.permission"
)
which, if set, specifies a comma-separated list role-capability
pairs to associate with created output documents. Optional. If not
set, the default permissions for
output.username
are used. Relevant only when using
MarkLogic Server for output with ContentOutputFormat
.
Example:
<property> <name>mapreduce.marklogic.output.content.permission</name> <value>dls-user,update,dls-user,read</value> </property>
See "URI Privileges and Permissions on Documents" in the Understanding and Using Security Guide for more information about roles and capabilities.
If the property value includes a comma in embedded in the role name, you must set this property in your code, rather than in a configuration file.
static final String OUTPUT_QUALITY
"mapreduce.marklogic.output.content.quality"
) which,
if set, specifies the document quality for created output
documents. Optional. Relevant only when using MarkLogic Server for
output with ContentOutputFormat
.
Quality affects the search relevance of a document. The value must be a positive or negative integer. For more information about document quality, see "Relevance Scores: Understanding and Customizing" in the Search Developer's Guide.
static final String OUTPUT_STREAMING
"mapreduce.marklogic.output.content.streaming"
)
which, if set, specifies whether to use streaming to insert
content. When streaming is set to true, the content will not be
fully buffered in memory, hence will consume less memory but will
disable auto-retry if there is a problem inserting the
content.static final String OUTPUT_CLEAN_DIR
"mapreduce.marklogic.output.content.cleandir"
) which,
if set, indicates whether or not to remove the output directory.
Only applicable to ContentOutputFormat
.
Default: false.
When set to true, the output directory specified by the
output.content.directory
property is removed. When
set to false, an exception is thrown if the output content
directory already exists.
static final String OUTPUT_FAST_LOAD
"mapreduce.marklogic.output.content.fastload"
) which,
if set, indicates whether or not to use the fast load mode to load
content into MarkLogic. Default: false.
Setting it to true when the documents to be loaded already exist may cause XDMP-DBDUPURI error if the original documents were inserted when the database had a different forest count. The fast load mode will always be used if "mapreduce.marklogic.output.content.directory" is set.
static final String NODE_OPERATION_TYPE
"mapreduce.marklogic.output.node.optype"
) which, if
set, indicates what node operation to perform during output.
Required if using MarkLogic Server for output with
NodeOutputFormat. Valid choices: INSERT_BEFORE, INSERT_AFTER,
INSERT_CHILD, REPLACE.NodeOpType
,
NodeOutputFormat
,
Constant Field Valuesstatic final String OUTPUT_PROPERTY_ALWAYS_CREATE
"mapreduce.marklogic.output.property.alwayscreate"
)
which, if set to true, causes PropertyOutputFormat
to create document properties for reduce output key-value pairs
even when no document exists with the target URI. Default: false.
By default, PropertyOutputFormat
does not create a property for a document URI unless the document
already exists.
static final String OUTPUT_NAMESPACE
"mapreduce.marklogic.output.node.namespace"
) which,
if set, indicates the namespace used for output. This is used only
in NodeOutputFormat, and is used for resolving element names in the
node path.static final String EXECUTION_MODE
"mapreduce.marklogic.mode"
) which, if set, indicates
whether the job is running in local or distributed mode.static final String MODE_DISTRIBUTED
static final String MODE_LOCAL
static final long DEFAULT_MAX_SPLIT_SIZE
input.maxsplitsize
is not specified.static final long DEFAULT_LOCAL_MAX_SPLIT_SIZE
input.maxsplitsize
is not specified and running in
local mode.static final String PROPERTY_OPERATION_TYPE
"mapreduce.marklogic.output.property.optype"
) which,
if set, indicates what property operation to perform during output
when using PropertyOutputFormat
.
Ignored if not using PropertyOutputFormat
.
Optional. Valid choices: SET_PROPERTY, ADD_PROPERTY. Default:
SET_PROPERTY.static final String DEFAULT_PROPERTY_OPERATION_TYPE
static final String CONTENT_TYPE
"mapreduce.marklogic.output.content.type"
) which, if
set, indicates type of content to be inserted when using
ContentOutputFormat. Optional. Valid choices: XML, JSON, TEXT,
BINARY, MIXED, UNKNOWN. Default: XML.static final String OUTPUT_KEY_TYPE
"mapreduce.marklogic.output.keytype"
) which, if set,
specifies the data type of the output keys for KeyValueOutputFormat
.
Optional. Default: xs:string.static final String OUTPUT_VALUE_TYPE
"mapreduce.marklogic.output.valuetype"
) which, if
set, specifies the data type of the map output value for KeyValueOutputFormat
.
Optional. Default: xs:string.static final String OUTPUT_QUERY
"mapreduce.marklogic.output.query"
) which, if set,
specifies the statement to execute against MarkLogic Server. This
property is required for KeyValueOutputFormat.
The statement is allowed to declare and refernce two external variables "key" and "value" under namespace "http://marklogic.com/hadoop", which will be bound by the connector with the output key and value in the user specified data type.
static final String OUTPUT_KEY_VARNAME
static final String OUTPUT_CONTENT_LANGUAGE
"mapreduce.marklogic.output.content.language"
) which,
if set, specifies the language name to associate with inserted
documents. A value of en
indicates that the document
is in english. The default is null, which indicates to use the
server default.static final String OUTPUT_CONTENT_NAMESPACE
"mapreduce.marklogic.output.content.namespace"
)
which, if set, specifies the namespace to associate with inserted
documents. The default is null, which indicates that the default
namespace should be used.static final String OUTPUT_VALUE_VARNAME
static final String OUTPUT_XML_REPAIR_LEVEL
"mapreduce.marklogic.output.content.repairlevel"
)
which, if set, specifies the document repair level for this options
object.static final String OUTPUT_PARTITION
"mapreduce.marklogic.output.partition"
) which, if
set, specifies the partition where output documents are
created.static final String OUTPUT_URI_REPLACE
"mapreduce.marklogic.output.urireplace"
) which, if
set, specifies a comma separated list of regex pattern and string
pairs, 1st to match a uri segment, 2nd the string to replace with,
with the 2nd one in ''.static final String OUTPUT_URI_PREFIX
"mapreduce.marklogic.output_uriprefix"
) which, if
set, specifies a string to prepend to all document URIs.static final String OUTPUT_URI_SUFFIX
"mapreduce.marklogic.output_urisuffix"
) which, if
set, specifies a string to append to all document URIs.static final String DEFAULT_OUTPUT_XML_REPAIR_LEVEL
static final String DEFAULT_CONTENT_TYPE
static final String BATCH_SIZE
"mapreduce.marklogic.output.batchsize"
) which, if
set, indicates the number of records in one request. Optional.
Currently only applies to ContentOutputFormat.static final int DEFAULT_BATCH_SIZE
static final int DEFAULT_TXN_SIZE
static final String TXN_SIZE
"mapreduce.marklogic.output.transactionsize"
) which,
if set, indicates the number of requests in one transaction.
Optional.static final String ASSIGNMENT_POLICY
"mapreduce.marklogic.output.assignmentpolicy"
) which,
if set, indicates assignment policy for output documents.
Optional.static final String TEMPORAL_COLLECTION
"mapreduce.marklogic.output.temporalcollection"
)
which, if set, indicates temporal collection for documents.
Optional.static final String INPUT_QUERY_LANGUAGE
"mapreduce.marklogic.input.querylanguage"
) which, if
set, specifies the query language will be used for input query and
split query. Optional. Valid values: XQuery, Javascript. Default:
XQuery.static final String OUTPUT_QUERY_LANGUAGE
"mapreduce.marklogic.output.querylanguage"
) which, if
set, specified the query language will be used for output query.
Optional. Valid values: XQuery, Javascript. Default: XQuery.static final String REDACTION_RULE_COLLECTION
"mapreduce.marklogic.input.redaction.rules"
) which,
if set, specifies a comma-separated list of redaction rule
collection URIs. Optional. If not set, no data will be
redacted.static final String COPY_COLLECTIONS
"mapreduce.marklogic.copycollections"
) which, if set,
specifies whether to copy document collections from source to
destination.static final String COPY_QUALITY
"mapreduce.marklogic.copyquality"
) which, if set,
specifies whether to copy document quality from source to
destination.static final String COPY_METADATA
"mapreduce.marklogic.copymetadata"
) which, if set,
specifies whether to copy document metadata from source to
destination.static final String INPUT_RESTRICT_HOSTS
"mapreduce.marklogic.input.restricthosts"
) which, if
set, specifies whether to restrict input hosts that mlcp will
connect to.static final String OUTPUT_RESTRICT_HOSTS
"mapreduce.marklogic.output.restricthosts"
) which, if
set, specifies whether to restrict output hosts that mlcp will
connecot to.static final long MIN_NODEUPDATE_VERSION
Copyright © 2020 MarkLogic
Corporation. All Rights Reserved.
Complete online documentation for MarkLogic Server,
XQuery and related components may be found at
developer.marklogic.com