Loading TOC...
Administrator's Guide (PDF)

Administrator's Guide — Chapter 26

Fragments

When loading data into a database, you have the option of specifying how XML documents are partitioned for storage into smaller blocks of information called fragments. For large XML documents, size can be an issue, and using fragments may help manage performance of your system. In general, fragments for XML documents should be sized between 10K and 100K. Fragments set too small or too big can slow down performance, so proper fragment sizing is important.

The actual fragmentation of an XML document is completely transparent to an application developer. At the application level, the document appears to be a single integral structure, regardless of how it is stored and managed as fragments on disk. Fragmentation is an application-transparent tuning mechanism.

However, fragmentation does impact relevance ranking. The relevance-ranking algorithm considers both term frequency within a target piece of content and overall term frequency within the database to rank results by relevance. Rather than consider term frequency across the entire XML document for ranking purposes, MarkLogic Server considers term frequency within the individual fragment (and its descendants) being ranked. Consequently, different fragmentation strategies may impact relevance rankings--particularly in situations when a single fragment may straddle multiple XML structures that you are trying to differentiate on a relevance basis.

With MarkLogic Server, you specify fragmentation rules that are used to partition your XML documents. These rules are applied one document at a time. However, fragmentation rules are specified at the database level--on the assumption that databases contain many documents with similar structures where the same fragmentation rules should be applied.

Fragmentation rules are applied to documents during document loads, updates, and database reindexing. Specifying additional fragmentation rules after documents have been loaded causes future updates and/or reindexing of those documents to use the new fragmentation rules, but does not change the fragmentation of existing documents (if reindex enable is set to true, however, the documents will eventually be reindexed and take on the new fragmentation policy). As a result, if you want to change the fragmentation rules for already loaded content, you will have to reload your documents or reindex the database so that your new fragmentation rules can take effect.

Use the following procedures for managing fragmentation rules:

Choosing a Fragmentation Strategy

Proper fragmentation is important to performance. Before you specify how to fragment the XML data being loaded, you need to plan your fragmentation strategy. Apply the following guidelines:

  • Fragments are described generically using XML element names.
  • Fragments for XML documents should be between 10K and 100K in size (these are just general guidelines; in some situations, larger or smaller fragment sizes can work fine, and there are many factors that will affect performance for a given fragment size including disk block size, how many fragments are in the database, how often fragments are accessed, the types of queries used in the application, and so on).
  • Fragments can be (and in many cases, should be) nested hierarchically.
  • Smaller fragment sizes allow more efficient element-level updates in the database, but excessively small fragments can slow down both loading speed and query performance.
  • Larger fragment sizes can also slow down query performance by requiring excessive loading of data from disk in resolving queries.
  • In general, within the size range set above, larger fragment sizes deliver higher-performance overall than smaller fragment sizes.
  • Text and small binary documents must fit in a single fragment. Therefore, set the database in memory tree size parameter to 1 to 2 MB larger than your largest text or small binary file. The largest small binary file size is always constrained by the 'large size threhold' database configuration setting.

After you decide how to fragment your data, you can use either of the following methods:

Both methods turn your fragmentation strategy into concrete rules for the system.

Fragment Roots

If a document contains many instances of an XML structure that share a common element name, then these structures make sensible fragments. With MarkLogic Server, you can use this common element name as a fragment root.

The following diagram shows an XML document rooted at <CitationSet> that contains many instances of a <Citation> node. Each <Citation> node contains further XML and averages between 15K and 20K in size. Based on this information, <Citation> is a sensible element to use as a fragment root:

Fragment Parents

If your document contains many different XML substructures, each of which is a good candidate to be a fragment, then it would be time consuming to specify each substructure as a fragment root. Instead, you can specify fragments by setting the parent of these substructures to be a fragment parent--so that every substructure under this parent becomes a separate fragment, regardless of its name.

The following diagram shows a document with substructures of different names:

In this case, you can use the <Products> element as a fragment parent, and the <Books>, <Movies>, <Music>, <Games> and <Toys> children automatically become fragments.

Defining Fragment Roots

To define a rule for a fragment root, complete the following procedure:

  1. Click the Databases icon on the left tree menu.
  2. Determine the database for which you are specifying a new fragment rule.
  3. Click the icon for this database, either in the tree menu or the Database Summary page.
  4. Click the Fragment Roots icon.
  5. Click the Create tab. The Fragment Roots Configuration page displays:

  6. Enter the namespace URI of the XML element that you are using as a rule for the fragment root.

    Every XML element is associated with a namespace. For the fragment rule to be precise, you must specify the namespace of the XML element. Leaving the namespace URI field blank specifies the universal unnamed namespace.

    Alternatively, you can specify that the rule for the fragment root is namespace independent by putting an asterisk (*) in the namespace URI field.

  7. Enter the element name in the localname field.

    The local name is the name of the XML element used as the root of a fragment. If you have more than one fragment root rule associated with the specified namespace, you can provide a comma-separated list of element names.

  8. To add more fragment roots, click the More Items button and repeat step 6 - step 7 for each fragment root as needed.
  9. Scroll to the top or bottom and click OK.

The new fragment root rules are added to the database. These rules are applied to XML documents loaded into the specified database from this point on.

Defining Fragment Parents

To define a rule for a fragment parent, perform the following steps:

  1. Click the Databases icon on the left tree menu.
  2. Determine the database for which you are specifying a new fragment parent.
  3. Click the icon for this database, either in the tree menu or the Database Summary page.
  4. Click the Fragment Parents icon.
  5. Click the Create tab. The Fragment Parents Configuration page displays:

  6. Enter the namespace URI of the XML element that you are using as a rule for the fragment parent.

    Every XML element is associated with a namespace. For the fragment rule to be precise, you must specify the namespace of the XML element. Leaving the namespace URI field blank specifies the universal unnamed namespace.

    Alternatively, you can specify that the rule for the fragment root is namespace independent by putting an asterisk (*) in the namespace URI field.

  7. Enter the element name in the localname field.

    The local name is the name of the parent XML element whose children will be fragment roots. If you have more than one fragment parent rule associated with the specified namespace, you can provide a comma-separated list of element names.

  8. To add more fragment parents, click the More Items button and repeat step 6 - step 7 for each fragment parent as needed.
  9. Scroll to the top or bottom and click OK.

The new fragment rules are added to the database. These rules are applied to XML documents loaded into the specified database from this point on.

Viewing Fragment Rules

To view fragment rules that are in effect, perform the following steps:

  1. Click the Databases icon on the left tree menu.
  2. Locate the database whose fragment rules you want to view, either in the tree menu or the Database Summary page.
  3. Click the icon for this database.
  4. Determine whether to view the rules for the fragment root or fragment parent.
  5. Click either the Fragment Roots icon or Fragment Parents icon, under the specified database.

The following example shows that the Documents database has only one rule defined for a fragment parent. The rule states that any direct child of an <RDF> element, regardless of the namespace for the <RDF> element, should form the root of a fragment:

Deleting Fragment Rules

To delete fragment rules for a specific database, perform the following steps:

  1. Click the Databases icon on the left tree menu.
  2. Locate the database that contains the fragment rules you want to delete, either in the tree menu or the Database Summary page.
  3. Click the icon for this database.
  4. Determine whether you need to delete a rule for a fragment root or fragment parent.
  5. Click either the Fragment Roots icon or Fragment Parents icon, under the specified database.
  6. Locate the fragment rule you want to delete and click Delete.
  7. A confirmation message displays. Confirm the delete and click OK.

The fragment rule is dropped from the database.

Deleting fragment rules has no impact on the fragmentation that has already been applied to documents loaded into the database, unless reindexing is enabled for the database.

« Previous chapter
Next chapter »