Loading TOC...
Semantics Developer's Guide (PDF)

Semantics Developer's Guide — Chapter 2

Getting Started with Semantics in MarkLogic

This chapter includes the following sections:

Setting up MarkLogic Server

When you install MarkLogic Server, a database, REST instance, and XDBC server (for loading content) are created automatically for you. The default Documents database is available on port 8000 as soon as you install MarkLogic Server, with a REST instance attached to it.

The examples in this guide use this pre-configured database, XDBC server, and REST API instance on port 8000. This section focuses on setting up MarkLogic Server to store triples. To do this, you will need to configure the database to store, search, and manage triples.

You must have admin privileges for MarkLogic Server to complete the procedures described in this section.

Install MarkLogic Server on the database server, as described in the Installation Guide for All Platforms, and then perform the following tasks:

Configuring the Database to Work with Triples

To use the default database for triples, you will need to enable the Triples index and the Collection Lexicon.

  1. To configure the database for triples, navigate to the Admin Interface (localhost:8001). Click the Documents database, and then click the Configure tab.
  2. Scroll down the Admin Configure page and set the triple index to true.

The triple index shown in the Admin Interface is the semantics index.

  1. Scroll down a bit further and set the collection lexicon to true.

The Collection Lexicon index is required and used by the REST endpoint. You will only need the collection lexicon if you are querying a named graph.

  1. Click OK when you are done.

This is all you need to do before loading triples into your default database.

Setting Up Additional Servers

In a production enviroment, you will want to create additional app servers, REST instances, and XDBC servers. Use these links to find out more:

Loading Triples

This section covers loading triples into the triples database. It includes the following topics:

Downloading the Dataset

Use the full sample of Persondata from DBPedia (in English and Turtle serialization) for the examples. Or use a different subset of Persondata if you prefer.

  1. Download the Persondata example dataset from DB Pedia: . You will use this dataset for the steps in the rest of this chapter. The dataset is available at http://wiki.dbpedia.org/Downloads. To manually select it, scroll down on the page to find Persondata and select the 'en' (English) and 'ttl' (Turtle) version (persondata_en.ttl.b2zip).

    DBPedia datasets are licensed under the terms of the of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License. The data is available in localized languages and in N-Triple and N-Quad serialized formats.

  2. Extract the data from the compressed file to a local directory. For example, to C:\space.

Importing Triples with mlcp

There are a number of ways to load triples into MarkLogic, including MarkLogic Content Pump (mlcp), REST endpoints, and XQuery. The recommended way to bulk-load triples is with mlcp. These examples use mlcp on a stand-alone Windows environment.

  1. Install and configure MarkLogic Pump as described in Installation and Configuration in the mlcp User Guide.
  2. In a Windows command interpreter, cmd.exe, navigate to the mlcp bin directory for your mlcp installation. For example:
    cd C:\space\marklogic-contentpump-1.3-1\bin
  3. Assuming that the Persondata in saved locally under C:\space, enter the following single-line command at the prompt:
    mlcp.bat import -host localhost -port 8000 -username admin ^
    -password password -input_file_path c:\space\persondata_en.ttl ^
    -mode local -input_file_type RDF -output_uri_prefix /people/

    For clarity the long command line is broken into multiple lines using the Windows line continuation character '^'. Remove the line continuation characters to use the command.

    • The modified command for Unix is:
      mlcp.sh import -host localhost -port 8000 -username admin -password\
      password -input_file_path /space/persondata_en.ttl -mode local\
      -input_file_type RDF -output_uri_prefix /people/

      For clarity, the long command line is broken into multiple lines using the Unix line continuation character '\'. Remove the line continuation characters to use the command.

      The triples will be imported and stored in the default Documents database.

  4. The successful triples data import (Unix output) looks like this when it is complete:
    14/09/15 14:35:38 INFO contentpump.ContentPump: Hadoop library version:
    2.0.0-alpha
    14/09/15 14:35:38 INFO contentpump.LocalJobRunner: Content type: XML
    14/09/15 14:35:38 INFO input.FileInputFormat: Total input paths to
    process : 1
    O:file:///home/persondata_en.ttl : persondata_en.ttl
    14/09/15 14:35:40 INFO contentpump.LocalJobRunner:  completed 0%
    14/09/15 14:40:27 INFO contentpump.LocalJobRunner:  completed 100%
    14/09/15 14:40:28 INFO contentpump.LocalJobRunner:
    com.marklogic.contentpump.ContentPumpStats:
    14/09/15 14:40:28 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 59595
    14/09/15 14:40:28 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
    14/09/15 14:40:28 INFO contentpump.LocalJobRunner: Total execution time: 289 sec

Verifying the Import

To verify that the RDF triples are successfully loaded into the triples database, do the following.

  1. Navigate to the REST Server with a Web browser:
    http://hostname:8000

    where hostname is the name of your MarkLogic Server host machine, and 8000 is the default port number for the REST instance that was created when you installed MarkLogic Server.

  2. Append '/v1/graphs/things' to the end of the web address URL.

    For example:

    http://hostname:8000/v1/graphs/things

    The first one thousand subjects are displayed:

  3. Click on a subject link to view the triples. Subject and object IRIs are presented as links.

Querying Triples

You can run SPARQL queries in Query Console or via an HTTP endpoint using the /v1/graphs/sparql endpoint (GET /v1/graphs/sparql and POST /v1/graphs/sparql). This section includes the following topics:

Querying with Native SPARQL

You can run queries in Query Console using native SPARQL or the built-in function sem:sparql.

To run queries:

  1. Navigate to Query Console with a Web browser:
    http://hostname:8000/qconsole

    where hostname is the name of your MarkLogic Server host.

  2. From the Content Source drop-down list, select the Documents database.

  3. From the Query Type drop-down list, select SPARQLQuery.

  4. In the query window, replace the default query text with this SPARQL query:
    PREFIX db: <http://dbpedia.org/resource/>
    PREFIX onto: <http://dbpedia.org/ontology/>
    
    SELECT * 
    WHERE { ?s onto:birthPlace db:Brooklyn } 
  5. Click Run.

    The results show people born in Brooklyn, as IRIs.

Querying with the sem:sparql Functions

Use the built-in XQuery function sem:sparql in Query Console to run the same query.

  1. From the Content Source drop-down list, select the Documents database.
  2. From the Query Type drop-down list, select 'XQuery'.
  3. In the query window, enter this query:
    xquery version "1.0-ml";
    
    sem:sparql('
    PREFIX db: <http://dbpedia.org/resource/>
    PREFIX onto: <http://dbpedia.org/ontology/>
    
    SELECT * 
    WHERE { ?s onto:birthPlace db:Brooklyn } 
    ')
  4. Click Run.

  5. The results show people born in Brooklyn as IRIs, the same as in Querying with Native SPARQL.

For more information and examples of SPARQL queries, see Semantic Queries.

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy