The size of each batch (usually 50-500). With some
experimentation with your custom job, this value can be tuned.
Tuning this value is one of the best ways to achieve optimal
throughput.
This method cannot be called after the job has started.
Parameters:
batchSize - the batch size -- must be 1 or
greater
The number of threads to be used internally by this job to
perform concurrent tasks on batches (usually > 10). With some
experimentation with your custom job and client environment, this
value can be tuned. Tuning this value is one of the best ways to
achieve optimal throughput or to throttle the server resources used
by this job. Setting this to 1 does not guarantee that batches will
be processed sequentially because the calling thread will sometimes
also process batches.
Unless otherwise noted by a subclass, this method cannot be
called after the job has started.
Parameters:
threadCount - the number of threads to use in this
Batcher
Updates the ForestConfiguration used by this job
to spread the writes or reads. This can be called mid-job in order
to accommodate for node failures or other changes without requiring
a restart of this job. Ideally, this ForestConfiguration will come
from DataMovementManager.readForestConfig(),
perhaps wrapped by something like FilteredForestConfiguration.
Parameters:
forestConfig - the updated list of forests with
thier hosts, etc.