Loading TOC...
Information Studio Developer's Guide (PDF)

Information Studio Developer's Guide — Chapter 1

Introduction to Information Studio

Information Studio is deprecated and will be removed in a future release of MarkLogic Server.

The MarkLogic Server Application Services suite includes Information Studio, which is a browser-based Interface and XQuery API that enables you to quickly create MarkLogic Server databases and load them with content. Information Studio simplifies how you load and transform content by enabling you to collect content from different sources, process it with XSLT and built-in transformation logic, and load it into a MarkLogic database. You can customize Information Studio to connect to any data source and create your own solutions for transforming content as it is collected and loaded into the database.

This chapter includes the following sections:

Information Studio Components

The Information Studio components are the following:

  • A flow is a content load configuration that defines the documents to be loaded into a database and how to load them into the database. A flow consists of the following:
    • A collector
    • A transform
    • An ingestion policy
  • A collector is an Information Studio plugin that gathers the content to load into a database. Specific collector implementations gather content in different ways. You can also create custom collectors, as described in Creating Custom Collectors. The collectors bundled with Information Studio are the following:
    • A one-shot collector scans and loads files from a filesystem directory and then stops.
    • A long-running collector listens for documents from a source and loads them into the database until it is explicitly stopped.
  • A transform is an Information Studio plugin that modifies content as it is loaded into the database. Specific transform implementations modify the content in different ways. For more details, see the following sections:
  • An ingestion policy is a unit of XML configuration, in the form of a stored <options> node, that specifies how to load content into a database. An Information Studio database can have multiple named policies, as well as a default policy. For more details, see the following sections:
  • A ticket is a mechanism for tracking a database load process and recording errors that occur. A ticket has a unique ID that can be used to get status reports. Tickets persist in a database until they reach their expiration date or are explicitly deleted.

Application Services App Server and Databases

Information Studio uses an HTTP App Server at port 8002, named App-Services, which stores data in the App-Services database described below. In addition, Information Studio internally uses a database named Fab.

DatabasePurpose
App-Services

The App-Services database stores the Information Studio configuration data for the flows and the tickets and log messages generated by the load operations.

The App-Services database also serves as the triggers database for both the App-Services and Fab databases.

Fab

The Fab database retains the state information related to the document transformation and distribution processes. Documents that generate errors during a load operation are retained in the Fab database.

If there are transformation steps configured for the flow, the collector loads the documents to the Fab database, where they are processed by a Content Processing Framework (CPF) pipeline. The CPF pipeline transforms the content and distributes the resulting documents to the destination database.

When you create a flow, two scheduled tasks are created to garbage-collect the content in the Information Studio databases as follows:

  • Deleting expired documents from the Fab database: By default, this task is scheduled to run in 30 days at 11:59 pm. The start time can be configured programmatically by means of the fab-retention-duration element, as described in Establishing Ingestion Policies. The task logs a message at the "Debug" level when no documents remain to be removed.
  • Deleting expired tickets from the App-Services database: By default, this task is scheduled to run in 30 days at 11:59 pm. The start time can be configured programmatically by means of the ticket-retention-duration element, as described in Establishing Ingestion Policies. The task logs a message at the "Debug" level for each ticket it deletes and a final message when complete. A default message is logged if the task runs and no tickets are deleted.

Information Studio APIs

The info and infodev APIs enable you to programmatically configure and use Information Studio and to create custom collector and transform plugins. For reference documentation on each function, see the MarkLogic XQuery and XSLT Function Reference.

The info and infodev APIs provide the following functionality:

  • The info module API enables you to script the Information Studio processes. The use of the info API is described in Scripting Information Studio Tasks. Information Studio processes include the following:
    • Creating, configuring, and deleting databases
    • Loading content
    • Setting policy
    • Getting status information using info:ticket
    • Getting error information using info:ticket-errors
    • Running a flow configured in Information Studio
  • The infodev module API enables you to create custom collector and transform plugins. The functions in this API provide the hooks into the plugin framework. The use of the infodev API is described in Creating Custom Collectors and Transforms.

Configuring Large-Scale Loading Processes

Information Studio uses the App-Services and Fab databases to temporarily store collected documents and to retain configuration and state information for the Information Studio flows.

If you plan to load a large amount of content with an Information Studio flow, consider mounting the App-Services and Fab databases on a different volume from the volume where you mount the destination database.

If you initially configure all of your MarkLogic Server databases on one volume, you can delete the forests for the App-Services and Fab databases, create new forests on a different volume, and attach the new forests to the App-Services and Fab databases.

If you want to retain the existing data in the App-Services and Fab databases, you can move the forests. The following procedure assumes there is no activity on the forests being moved. If there are updates to the forests being moved, they might end up in different states. This procedure should not be done on active systems, as there is a short outage period between detaching the old forest and attaching the new forest.

To move an existing forest to another volume, use the following steps:

  1. Backup the App-Services and Fab forests to a directory using the forest backup/restore page of the Admin Interface as described in Making Backups of a Forest in the Administrator's Guide.
  2. On the new destination volume, create new App-Services and Fab forests, as described in Creating a Forest in the Administrator's Guide.
  3. For the new App-Services and Fab forests, restore the forests from the backups made in step 1, as described in Restoring a Forest in the Administrator's Guide.
  4. Detach your original App-Services and Fab forests from their respective databases and attach the newly restored public forest to the database, as described in Attaching and/or Detaching Forests to/from a Database in the Administrator's Guide.
  5. Delete the original private forests.

Starting Application Services

Information Studio is bundled as part of the Application Services suite of applications. To start Application Services, open the following URL in a browser window:

http://localhost:8000/appservices

If your instance of MarkLogic Server runs on a different host, or if Information Studio is configured on a different port, substitute the appropriate values for host and port.

To use Information Studio, you need the infostudio-user role assigned to your login account. To use Application Builder, you need the app-builder role. Users with the admin role have access to both applications.

« Table of contents
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy