If You Want to Reduce the Number of "Large" Merges
In most cases, MarkLogic Server will perform relatively small merges just often enough to keep the system properly optimized. Small merges are generally not very disruptive and reasonably fast. In some cases, however, you might find that your merges are too large and are taking too much time. Exactly how large constitutes a “Large” merge is difficult to measure, but if you determine that your merges are too large, then you might want to try and configure your settings to avoid a really large merge.
One way to avoid large merges is to set the merge max size
value. If you do set this value, however, you should only set it to a value as a temporary way to control your maximum merge size, as it can lead to a state where the database really needs to perform a large merge but cannot. Such a situation can lead to a poorly optimized system. One way to think about large merges is to compare them to sleeping for people; a person can go without much sleep for relatively short periods of time (a day or two or maybe even three for some people), but eventually, the person needs sleep or else he begins to function extremely poorly. Similarly, if a database is growing, it will eventually need to perform a large merge. Also, be careful not to set merge max size
to such a small value that you end up with a very large number of stands. Always use care when setting the merge max size
value, as you might end up with a large number of stands in your database, which can cause it to perform poorly and, when it reaches the maximum number of stands (64), will cause it to go offline.
Another way to accomplish a goal of reducing the number of large merges is to lower the value for merge min ratio
to 1. A value of 1 for merge min ratio
will not stop large merges from happening, but will make large merges only occur when the number of fragments in your largest stand is equal to the number of fragments in all of the other stands combined. Therefore, the only time merges will be more than 1/2 the size of your forest is when the fragment count of the sum of all but the largest stand is equal to or greater than the fragment count of the largest stand. To illustrate this, consider a forest with the following scenario:
If the merge min ratio
is set to 1, then a stand can merge if the following ratio is less than 1:
Substituting in the values from the example for Stand 1 yields
10000/(5000 + 1000 + 500) = 10000/6500 = 1.54
which is greater than 1. Therefore, Stand 1 is not merged.
Next, putting in the values for Stand 2 yields
5000/(1000 + 500) = 5000/1500 = 3.33
which is greater than 1. Therefore, Stand 2 is not merged.
Next, putting in the values for Stand 3 yields
1000/500 = 2.0
which is greater than 1. Therefore, Stand 3 is not merged.
Therefore, if the forest remains in a steady state (that is, no new content is added), then a merge min ratio
of 1 will cause this forest to not be merged.
Now, consider that a load is happening during this time and a stand that has 501 fragments is saved into the forest. The result is 5 stands as follows:
Now, substituting in the values for Stand 3 yields
1000/(500 + 501) = 1000/1001 = 0.99
which is less than 1. Therefore, Stand 3 is merged.
Note that Stands 4 and 5 are smaller than Stand 3, so the sum of the fragments in those stands appear in the denominator of the merge min ratio
. Therefore, Stands 3, 4, and 5 are merged. Therefore, a merge min ratio
of 1 will cause this forest to be merged down to 3 stands, where Stands 1 and 2 remain unmerged and Stands 3, 4, and 5 are merged together into a new stand. The stands then look like this:
Note that, in a real world scenario with relatively large forests, this scenario (where the sum of the smaller stands fragment counts have as many fragments as the largest stand) will not happen very often, but will happen occasionally. For example, if another 3,000 fragments continued to accumulate in this forest, then Stand 1 would merge with the other stands.