Loading TOC...
Information Studio Developer's Guide (PDF)

Information Studio Developer's Guide — Chapter 5

Scripting Information Studio Tasks

You can use the info API to programmatically accomplish the same tasks as described for the Information Studio interface in chapters Creating and Configuring Databases and REST Servers and Creating and Configuring Flows.

This chapter describes:

The info API

The info API provides functions to create databases and load them with data. The info API functions that manage databases are built on top of the admin API described in the Scripting Administrative Tasks Guide. In addition, the info API provides functions that simplify and enhance database load operations.

Creating a Database

The info:database-create function simplifies the task of programmatically creating forests and databases. Creating forests and databases using the admin API is described in Creating and Configuring Forests and Databases in the Scripting Administrative Tasks Guide. When using the info:database-create function, you are trading the finer-level control provided by the admin functions for simplicity.

For example, the following function creates a new database, named Sample-Database, with two forests per host. By default, the database is located in the Default group and the forest data is placed in the default location (/MarkLogic/Data/Forests) on each host in the Default group. Each forest is given a name like Sample-Database-<unique-id>, where <unique-id> is a unique number generated by the API. The Sample-Database database is configured with the default security and schema databases, Security and Schemas.

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
      at "/MarkLogic/appservices/infostudio/info.xqy";
info:database-create("Sample-Database", 2)

The info:database-create function provides optional parameters to control the location of your forest data, as well as which databases to use to manage security, schema and trigger data. The function also accepts a group parameter. The info API determines which hosts are in the group and creates the specified number of forests for each host in the group.

For example, the following function creates a Sample-Database with three forests per host. The database is located in the MyGroup group and the forest data is placed in the c:\myData directory on each host in the MyGroup group. The security database is MySecurity, the schema database is MySchemas and the triggers database is MyTriggers.

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
      at "/MarkLogic/appservices/infostudio/info.xqy";
info:database-create(
  "Sample-Database",
  3,
  "MyGroup",
  "c:\myData",
  "MySecurity",
  "MySchemas",
  "MyTriggers")

Configuring the Database Text Indexes

You can configure the database Text Indexes by means of the info:database-set-feature function. This function allows you to configure the database in a manner similar to that described in Configuring Text Indexes.

For example, the following query enables both Wildcards and Positions:

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
      at "/MarkLogic/appservices/infostudio/info.xqy";
let $settings :=
<settings xmlns="http://marklogic.com/appservices/infostudio">
  <wildcard>true</wildcard>
  <position>true></position>
  <reverse>false</reverse>
</settings> )
return
  info:database-set-feature("Sample-Database", $settings)

The following table lists the possible elements in a database settings node, their purpose, and possible values:

ElementDescriptionPossible Values
wildcardEnables three character searches and codepoint word lexicon indexing. Use this setting for more efficient wildcard searches on the documents in your database. true false
positionEnables word positions indexing. Use this setting for more efficient phrase searches on the documents in your database.true false
reverseEnables fast reverse searches. Use this setting to index saved queries in order to speed up reverse query searches. This option requires a special license.true false

Loading Data into Databases

The info API enables you to script the operations described in Creating and Configuring Flows.

When a database load operation is initiated, Information Studio immediately returns a ticket URI. You can pass the ticket URI to the info:ticket function to return the contents of the ticket, which includes the status of the load and any errors encountered. Load operations are asynchronous, so the ticket is returned before the load operation has completed. Information Studio updates the status of the ticket during the load operation. Initially, the ticket status is ‘active.' When the load has completed, the ticket status is updated to ‘completed.' Under special circumstances, other statuses can be set on the ticket.

The simplest way to load data into the database is to call the info:load function. The following example loads the files from the C:\mydocs directory into the Sample-Database:

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
       at "/MarkLogic/appservices/infostudio/info.xqy";
info:load("C:\mydocs", (), (), "Sample-Database")

The info:load function also enables you to specify an ingestion policy and/or deltas for an ingestion policy to fine-tune how documents are to be loaded into the database. Ingestion policies are discussed in Establishing Ingestion Policies.

Establishing Ingestion Policies

When using the Information Studio interface, you establish an ingestion policy using the ingestion settings described in Configuring Ingestion Options. This section describes how to programmatically establish and use ingestion policies.

Ingestion policies control which documents are loaded into the database and to what URI location, as well as the permissions, collections, and format type to be assigned to each document. If you are bulk loading a large number of files into a database, you may want to break the load operation into multiple transactions. Ingestion policies enable you to control the maximum number of files to be loaded during a single transaction. Ingestion policies also enable you to control whether to overwrite existing files in the database or generate an error when an attempt is made to overwrite an existing file.

You can create a default policy to be used in the event no policy is specified for a load operation. When you set a policy, you only specify the options you want to change. Information Studio merges your changes with the global default policy settings.

The following is an example of a simple default ingestion policy:

let $policy :=
<options name="default" xmlns="http://marklogic.com/appservices/infostudio">
  <collection>http://marklogic.com/appservices/infostudio</collection>
  <error-handling>continue-with-warning</error-handling>
  <fab-retention-duration>P30D</fab-retention-duration>
  <file-filter>^[^\.]</file-filter>
  <max-docs-per-transaction>100</max-docs-per-transaction>
  <overwrite>overwrite</overwrite>
  <ticket-retention-duration>P30D</ticket-retention-duration>
  <uri>
    <literal>/content</literal>
    <filename/>
    <literal>.</literal>
    <ext/>
  </uri>
</options>

You can set the ingestion policy as the default policy by calling the info:policy-set function, as follows:

info:policy-set("default", $policy)

The following table lists all of the possible elements in an ingestion policy, their purpose, and possible values:

ElementDescriptionPossible Values and Default Value
annotationA description of the policy, or any other notation.

Any string

Default: None

overwrite

Specify how to manage files that already exist in the database.

Specify overwrite to overwrite existing files in the database; skip to not overwrite the files, but continue with the load, or error to not overwrite the files and generate an error.

overwrite skip error

Default: overwrite

error-handlingHow to handle load errors. Specify continue-with-warning to continue the load or error to abort the load when an error is encountered.

continue-with-warning error

Default: continue-with-warning

collection

The URI of a collection.

By default, existing collections are overridden by the specified collection. You can use the add attribute to add the collection to any existing collections, rather than overriding them.

The collection URI.

Default: None

max-docs-per-transactionThe maximum number of documents to be ingested in a single transaction. If ingesting more than the maximum, the ingest operation is scheduled as more than one transaction.

Any xs:unsignedInt

Default: 100

file-filterThe filter used to select the documents in the filesystem. This can be any XQuery regular expression. The default regular expression specifies all documents in the directory and its subdirectories except for those that start with a dot, such as .mydoc.

Any valid XQuery regular expression

Default: ^[^\.]

repairSpecify full to attempt to repair malformed XML content on each document during ingestion. Specifying no value or none causes documents containing malformed XML content to be rejected with an error.

none full

Default: None

formatIngest documents as a particular format, such as XML, text, or binary. No value indicates to ingest documents as any format. Documents that are not of the specified format generate an error.

xml text binary

Default: None

default-namespaceApply a default namespace to all the nodes that do not have an associated namespace.

The namespace URI.

Default: None

default-languageAdd an xml:lang attribute to the root element node on all ingested documents to indicate they are written in a particular language, such as English or French. Default indicates to not tag ingested documents with an xml:lang attribute.

ar de en es fa fr it ko nl pt ru zh zh-Hant

Default: None

uriThe URI structure for the ingested documents in the database. For a complete discussion, see Configuring the URI Structure.

<literal/> <path @[strip-prefix]/> <guid/> <filename/> <ext/>

Default:

<literal>    /content </literal> <path/> <literal>/</literal> <filename/> <literal>.</literal> <ext/>

encodingIngest documents as a particular encoding type, such as UTF-8, ASCII, and so on. See the Search Developer's Guide for a list of character set encodings by language. All encodings are translated into UTF-8 from the specified encoding. The string specified for the encoding option is matched to an encoding name according to the Unicode Charset Alias Matching rules. See http://www.unicode.org/reports/tr22/#Charset_Alias_Matching. The Auto option indicates to use an automatic encoding detector. If no encoding can be detected, the encoding defaults to UTF-8.

A valid encoding type

Default: UTF-8

filesize-limit-kbSpecifies the maximum size a file can be without generating a load error.

xs:unsignedInt

Default: None

permissionSpecifies the permissions to set on the loaded documents. This is expressed in the form:
<permission>
  <role>role</role> 
  <capability>permission</capability>
</permission>

Possible roles are:

  app-user   alert-user   alert-admin   alert-execution   dls-admin   dls-user   flexrep-admin   flexrep-user   infostudio-user

As well as any custom roles you have created.

Possible permissions are:

  read   insert   update   execute

Default: None

qualityAssociate all ingested documents with the specified quality value. A positive value increases the relevance score of the document in text search functions. The converse is true for a negative value. Leaving this field blank specifies the default document quality.

xs:integer

Default: 1

forestThe name of a specific forest in which to load the documents.

xs:string

Default: None

ticket-retention-durationThe length of time to keep the state data for tickets in the App-Services database. For an overview of the App-Services database, see Application Services App Server and Databases.

xs:duration

Default: P30D (30 days)

fab-retention-durationThe length of time to keep the document ingestion data generated by the load operation in the Fab database. For an overview of the Fab database, see Application Services App Server and Databases.

xs:duration

Default: P30D (30 days)

Applying Ingestion Policies

The info:load function enables you to name a stored ingestion policy to use for the load operation, and a set of specific options (deltas) that selectively overrides the stored policy. If no ingestion policy is specified for a load operation, the default policy is used. If no default policy is specified, then a policy consisting of the global defaults is applied to the load operation.

For example, the following query loads the documents from the C:\mydocs directory into the Sample-Database using the above default ingestion policy:

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
       at "/MarkLogic/appservices/infostudio/info.xqy";
return
  info:load("C:\test", (), (), "Sample-Database")

To change the URI to http://docs/mydocs, you can define a delta that changes the literal value in the URI. This delta change only applies to this load operation and leaves the URI in the default ingestion policy unchanged.

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio" 
       at "/MarkLogic/appservices/infostudio/info.xqy";
let $delta :=
  <options name="default" 
    xmlns="http://marklogic.com/appservices/infostudio">
    <uri> 
      <literal>http://docs/mydocs/</literal>
      <filename/>
      <literal>.</literal>
      <ext/>
    </uri>
  </options>
return
  info:load("C:\test", (), $delta, "Sample-Database")

When defining deltas, only the children elements of the root element are preserved in the ingestion policy. So, in the above example, the children of the uri element must be defined in their entirety in order for the filenames and extensions to be included in the URI.

Ingestion Policies and Multiple Load Operations

If you are initiating multiple load operations that require changes to the ingestion policy, it is important to understand that load operations are asynchronous. A load operation returns the ticket immediately before loading the files into the database. Any changes applied to an ingestion policy after launching a load operation may impact the policy set for the previous load.

For example, the following pseudo query changes the policy between loads, which may produce unexpected results:

let $mypolicy := info:policy-set("mypolicy", set options)
return info:load($dirpath, "mypolicy", (), $database),
let $mypolicy := info:policy-set("mypolicy", change options)
return info:load($dirpath, "mypolicy", (), $database),
let $mypolicy := info:policy-set("mypolicy", change options)
return info:load($dirpath, "mypolicy", (), $database)

The solution to this is to define unique deltas that define the changes to the policy and pass them to the info:load function, as shown in the pseudo query below:

let $mypolicy := info:policy-set("mypolicy", set options...)
return info:load($dirpath, "mypolicy", (), $database),
let $delta1 := change options
return info:load($dirpath, "mypolicy", $delta1, $database),
let $delta2 := change options
return info:load($dirpath, "mypolicy", $delta2, $database)

For example, you want to load some modules into one URI and some 4.1 and 4.2 scripts into their own unique URIs. This could be done by defining a policy with the correct URI for the modules and then defining a delta to change the URI for each set of scripts:

xquery version "1.0-ml"; 
import module namespace info = 
   "http://marklogic.com/appservices/infostudio"
      at "/MarkLogic/appservices/infostudio/info.xqy";
(: Create a policy with a URI for the modules:)
  let $mypolicy := info:policy-set(
      "mypolicy",
      <options xmlns="http://marklogic.com/appservices/infostudio">
          <collection>
             http://marklogic.com/appservices/infostudio
          </collection>
          <error-handling>continue-with-warning</error-handling>
          <fab-retention-duration>P30D</fab-retention-duration>
          <file-filter>^[^\.]</file-filter>
          <max-docs-per-transaction>100</max-docs-per-transaction>
          <overwrite>overwrite</overwrite>
          <ticket-retention-duration>P30D</ticket-retention-duration>
          <uri>
              <literal>http://pubs/modules/actions/</literal>
              <filename/>
              <literal>.</literal>
              <ext/>
          </uri>
      </options>)
(: Define a delta to change the URI for the 4.2 scripts :)
  let $delta1 := 
      <options xmlns="http://marklogic.com/appservices/infostudio">
          <uri>
              <literal>http://pubs/42scripts/</literal>
              <filename/>
              <literal>.</literal>
              <ext/>
          </uri>
      </options>
(: Define a delta to change the URI for the 4.1 scripts :)
  let $delta2 := 
      <options xmlns="http://marklogic.com/appservices/infostudio">
          <uri>
              <literal>http://pubs/41scripts/</literal>
              <filename/>
              <literal>.</literal>
              <ext/>
          </uri>
      </options>
(: Load actions into the database :)
  let $ticket1 := info:load(
    "C:\cvs\latest\myapp\scripts\actions",
    "mypolicy",
    (),
    "Sample-Database")
(: Load 4.2 scripts into the database :)
  let $ticket2 := info:load(
    "C:\cvs\latest\myapp\scripts\4.2scripts",
    "mypolicy",
    $delta1,
    "Sample-Database")
(: Load 4.1 scripts into the database :)
  let $ticket3 := info:load(
    "C:\cvs\latest\myapp\scripts\4.1scripts",
    "mypolicy",
    $delta2,
    "Sample-Database")
return (
    "Loaded files from",
    fn:data(info:ticket($ticket1)//directory,
    fn:data(info:ticket($ticket2)//directory,
    fn:data(info:ticket($ticket3)//directory ) 
« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy