The LineSplitter class is used to separate lines
in line-delimited JSON, XML or TEXT files. It should also work with
gzip-compressed line-delimited files.
Used to set document format to splitter. The
extension of URLs generated for splits is determined by this
format. The format should be set before splitting. If not set, the
default is JSON.
Parameters:
format - the document content format.
getCount
publiclonggetCount()
Used to return the number of objects in the
stream.
Takes the input stream and converts it into a
stream of StringHandle. The content could be line-delimited JSON,
XML or TEXT file. It could also be gzip-compressed line-delimited
files. Provide GZIPInputStream to the splitter when splitting gzip
files.
Takes the input stream and converts it into a
stream of DocumentWriteOperation. The content could be
line-delimited JSON, XML or TEXT file. It could also be
gzip-compressed line-delimited files. Provide GZIPInputStream to
the splitter when splitting gzip files.
Takes the input stream and input file name and
converts it into a stream of DocumentWriteOperation. The content
could be line-delimited JSON, XML or TEXT file. It could also be
gzip-compressed line-delimited files. Provide GZIPInputStream to
the splitter when splitting gzip files.
splitFilename - is the name of the input file,
including name and extension. It is used to generate URLs for split
files. The splitFilename could either be provided here or in
user-defined UriMaker.
Takes the input stream and converts it into a
stream of StringHandle. The content could be line-delimited JSON
file, line-delimited XML file or gzip-compressed line-delimited
JSON file.
Parameters:
input - is the incoming input stream.
charset - is the encoding scheme the document
uses.
Takes the Reader input and converts it into a
stream of StringHandle. The content could be line-delimited JSON
file, line-delimited XML file or gzip-compressed line-delimited
JSON file.