ForestReader (MarkLogic Connector for Hadoop v2.4)

java.lang.Object
- org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
- - com.marklogic.mapreduce.ForestReader<VALUEIN>

Type Parameters:

VALUEIN - Currently only support ForestDocument, but other types like Text or BytesWritable are possible candidates to be added.

All Implemented Interfaces:

MarkLogicConstants, Closeable, AutoCloseable
```
public class ForestReader<VALUEIN>
extends org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
implements MarkLogicConstants
```
RecordReader that reads from forest data dirs.

Field Summary

Fields
Modifier and Type	Field and Description
`protected long`	`bytesRead`
`protected Collection<String>`	`colFilters`
`protected org.apache.hadoop.conf.Configuration`	`conf`
`protected BiendianDataInputStream`	`dataIs`
`protected int`	`deletedCnt`
`protected Collection<String>`	`dirFilters`
`protected boolean`	`done`
`protected int`	`fragCnt`
`protected DocumentURIWithSourceInfo`	`key`
`protected org.apache.hadoop.fs.Path`	`largeForestDir`
`static org.apache.commons.logging.Log`	`LOG`
`protected int`	`nascentCnt`
`protected BiendianDataInputStream`	`ordIs`
`protected int`	`position`
`protected int`	`prevDocid`
`protected BiendianDataInputStream`	`qualIs`
`protected org.apache.hadoop.mapreduce.lib.input.FileSplit`	`split`
`protected String`	`srcId`
`protected BiendianDataInputStream`	`tsIs`
`protected Collection<String>`	`typeFilters`
`protected VALUEIN`	`value`
`protected Class<? extends org.apache.hadoop.io.Writable>`	`valueClass`

Fields inherited from interface com.marklogic.mapreduce.MarkLogicConstants
ADVANCED_MODE, ASSIGNMENT_POLICY, BASIC_MODE, BATCH_SIZE, BIND_SPLIT_RANGE, COLLECTION_FILTER, CONTENT_TYPE, COPY_COLLECTIONS, COPY_METADATA, COPY_QUALITY, DEFAULT_BATCH_SIZE, DEFAULT_CONTENT_TYPE, DEFAULT_LOCAL_MAX_SPLIT_SIZE, DEFAULT_MAX_SPLIT_SIZE, DEFAULT_OUTPUT_CONTENT_ENCODING, DEFAULT_OUTPUT_XML_REPAIR_LEVEL, DEFAULT_PROPERTY_OPERATION_TYPE, DEFAULT_TXN_SIZE, DIRECTORY_FILTER, DOCUMENT_SELECTOR, EXECUTION_MODE, EXTRACT_URI, INDENTED, INPUT_DATABASE_NAME, INPUT_HOST, INPUT_KEY_CLASS, INPUT_LEXICON_FUNCTION_CLASS, INPUT_MODE, INPUT_PASSWORD, INPUT_PORT, INPUT_QUERY, INPUT_QUERY_LANGUAGE, INPUT_QUERY_TIMESTAMP, INPUT_RESTRICT_HOSTS, INPUT_SSL_OPTIONS_CLASS, INPUT_SSL_PROTOCOL, INPUT_USE_SSL, INPUT_USERNAME, INPUT_VALUE_CLASS, MAX_SPLIT_SIZE, MIN_NODEUPDATE_VERSION, MODE_DISTRIBUTED, MODE_LOCAL, MR_NAMESPACE, NODE_OPERATION_TYPE, OUTPUT_CLEAN_DIR, OUTPUT_COLLECTION, OUTPUT_CONTENT_ENCODING, OUTPUT_CONTENT_LANGUAGE, OUTPUT_CONTENT_NAMESPACE, OUTPUT_DATABASE_NAME, OUTPUT_DIRECTORY, OUTPUT_FAST_LOAD, OUTPUT_FOREST_HOST, OUTPUT_GRAPH, OUTPUT_HOST, OUTPUT_KEY_TYPE, OUTPUT_KEY_VARNAME, OUTPUT_NAMESPACE, OUTPUT_OVERRIDE_GRAPH, OUTPUT_PARTITION, OUTPUT_PASSWORD, OUTPUT_PERMISSION, OUTPUT_PORT, OUTPUT_PROPERTY_ALWAYS_CREATE, OUTPUT_QUALITY, OUTPUT_QUERY, OUTPUT_QUERY_LANGUAGE, OUTPUT_RESTRICT_HOSTS, OUTPUT_SSL_OPTIONS_CLASS, OUTPUT_SSL_PROTOCOL, OUTPUT_STREAMING, OUTPUT_URI_PREFIX, OUTPUT_URI_REPLACE, OUTPUT_URI_SUFFIX, OUTPUT_USE_SSL, OUTPUT_USERNAME, OUTPUT_VALUE_TYPE, OUTPUT_VALUE_VARNAME, OUTPUT_XML_REPAIR_LEVEL, PATH_NAMESPACE, PROPERTY_OPERATION_TYPE, QUERY_FILTER, RECORD_TO_FRAGMENT_RATIO, REDACTION_RULE_COLLECTION, SPLIT_END_VARNAME, SPLIT_QUERY, SPLIT_START_VARNAME, SUBDOCUMENT_EXPRESSION, TEMPORAL_COLLECTION, TXN_SIZE, TYPE_FILTER

Constructor Summary

Constructors
Constructor and Description

ForestReader()

Constructors
Constructor and Description
`ForestReader()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected boolean`	`applyFilter(String uri, ExpandedTree tree)`
`void`	`close()`
`DocumentURIWithSourceInfo`	`getCurrentKey()`
`VALUEIN`	`getCurrentValue()`
`float`	`getProgress()`
`void`	`initialize(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)`
`boolean`	`nextKeyValue()`
`protected void`	`setKey(String uri, String sub, int line, int col)` Apply URI prefix and suffix configuration options and set the result as DocumentURI key.
`protected void`	`setSkipKey(String sub, int line, int col, String reason)` Set the result as DocumentURI key.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

split

protected org.apache.hadoop.mapreduce.lib.input.FileSplit split

bytesRead
```
protected long bytesRead
```

conf

protected org.apache.hadoop.conf.Configuration conf

dataIs

protected BiendianDataInputStream dataIs

ordIs

protected BiendianDataInputStream ordIs

tsIs
```
protected BiendianDataInputStream tsIs
```

qualIs

protected BiendianDataInputStream qualIs

key

protected DocumentURIWithSourceInfo key

value
```
protected VALUEIN value
```

valueClass

protected Class<? extends org.apache.hadoop.io.Writable> valueClass

position
```
protected int position
```

prevDocid
```
protected int prevDocid
```

done
```
protected boolean done
```

largeForestDir

protected org.apache.hadoop.fs.Path largeForestDir

nascentCnt
```
protected int nascentCnt
```

deletedCnt
```
protected int deletedCnt
```

fragCnt
```
protected int fragCnt
```

colFilters

protected Collection<String> colFilters

dirFilters

protected Collection<String> dirFilters

typeFilters

protected Collection<String> typeFilters

srcId
```
protected String srcId
```

Constructor Detail
- ForestReader
```
public ForestReader()
```

Method Detail
- close
```
public void close()
           throws IOException
```
  Specified by:
  
  close in interface Closeable
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
- getCurrentKey
```
public DocumentURIWithSourceInfo getCurrentKey()
                                        throws IOException,
                                               InterruptedException
```
  Specified by:
  
  getCurrentKey in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
  
  InterruptedException
- getCurrentValue
```
public VALUEIN getCurrentValue()
                        throws IOException,
                               InterruptedException
```
  Specified by:
  
  getCurrentValue in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
  
  InterruptedException
- getProgress
```
public float getProgress()
                  throws IOException,
                         InterruptedException
```
  Specified by:
  
  getProgress in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
  
  InterruptedException
- initialize
```
public void initialize(org.apache.hadoop.mapreduce.InputSplit split,
                       org.apache.hadoop.mapreduce.TaskAttemptContext context)
                throws IOException,
                       InterruptedException
```
  Specified by:
  
  initialize in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
  
  InterruptedException
- nextKeyValue
```
public boolean nextKeyValue()
                     throws IOException,
                            InterruptedException
```
  Specified by:
  
  nextKeyValue in class org.apache.hadoop.mapreduce.RecordReader<DocumentURIWithSourceInfo,VALUEIN>
  
  Throws:
  
  IOException
  
  InterruptedException
- setKey
```
protected void setKey(String uri,
                      String sub,
                      int line,
                      int col)
```
  Apply URI prefix and suffix configuration options and set the result as DocumentURI key.
  
  Parameters:
  
  uri - Source string of document URI.
  
  sub - Sub-entry of the source of the document origin.
  
  line - Line number in the source if applicable; -1 otherwise.
  
  col - Column number in the source if applicable; -1 otherwise.
- setSkipKey
```
protected void setSkipKey(String sub,
                          int line,
                          int col,
                          String reason)
```
  Set the result as DocumentURI key.
  
  Parameters:
  
  uri - Source string of document URI.
  
  line - Line number in the source if applicable; -1 otherwise.
  
  col - Column number in the source if applicable; -1 otherwise.
  
  reason - Reason for skipping.
- applyFilter
```
protected boolean applyFilter(String uri,
                              ExpandedTree tree)
```

Class ForestReader<VALUEIN>

Field Summary

Fields inherited from interface com.marklogic.mapreduce.MarkLogicConstants

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

LOG

split

bytesRead

conf

dataIs

ordIs

tsIs

qualIs

key

value

valueClass

position

prevDocid

done

largeForestDir

nascentCnt

deletedCnt

fragCnt

colFilters

dirFilters

typeFilters

srcId

Constructor Detail

ForestReader

Method Detail

close

getCurrentKey

getCurrentValue

getProgress

initialize

nextKeyValue

setKey

setSkipKey

applyFilter