Time vs. Space: Configuring Batch and Transaction Size
You can tune the document insertion throughput and memory requirements of your job by configuring the batch size and transaction size of the job.
-batch_size
controls the number of updates per request to the server.-transaction_size
controls the number of requests to the server per transaction.
The default batch size is 100 and the maximum batch size is 200. (However, some options can affect the default). The default transaction size is 1 and the maximum transaction size is 4000/actualBatchSize. This means that the default maximum number of updates per transaction is 1000, and updates per transaction can range from 20 to 4000.
Selecting a batch size is a speed vs. memory tradeoff. Each request to the server introduces overhead because extra work must be done. However, unless you use -streaming
or -document_type mixed,
all the updates in a batch stay in memory until a request is sent, so larger batches consume more memory.
Transactions introduce overhead on MarkLogic Server, so performing multiple updates per transaction can improve insertion throughput. However, an open transaction holds locks on fragments with pending updates, potentially increasing lock contention and affecting overall application performance.
It is also possible to overwhelm MarkLogic Server if you have too many concurrent sessions active.