This chapter includes the following sections:
When you install MarkLogic Server, a database, REST instance, and XDBC server (for loading content) are created automatically for you. The default Documents database is available on port 8000 as soon as you install MarkLogic Server, with a REST instance attached to it.
The examples in this guide use this pre-configured database, XDBC server, and REST API instance on port 8000. This section focuses on setting up MarkLogic Server to store triples. To do this, you will need to configure the database to store, search, and manage triples.
You must have admin privileges for MarkLogic Server to complete the procedures described in this section.
Install MarkLogic Server on the database server, as described in the Installation Guide for All Platforms, and then perform the following tasks:
The Documents database has the triple index and the collection lexicon enabled by default. These options are also enabled by default for any new databases you create.
If you have an existing database that you want to use for triples, you need to make sure that the triple index and the collection lexicon are both enabled. You can also use these steps to verify that a database is correctly set up. Follow these steps to configure an existing database for triples:
localhost:8001
). Click the Documents database, and then click the Configure tab.Set this to true if it is not already configured. The triple index is used for semantics.
The collection lexicon index is required and used by the REST endpoint. You will only need the collection lexicon if you are querying a named graph.
This is all you need to do before loading triples into your default database (the Documents database).
For all new installations of MarkLogic 9 and later, the triple index and collection lexicon are enabled by default. Any new databases will also have the triple index and collection lexicon enabled.
In a production environment, you will want to create additional app servers, REST instances, and XDBC servers. Use these links to find out more:
This section covers loading triples into the database. It includes the following topics:
Use the full sample of Persondata from DBPedia (in English and Turtle serialization) for the examples, or use a different subset of Persondata if you prefer.
DBPedia datasets are licensed under the terms of the of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License. The data is available in localized languages and in N-Triple and N-Quad serialized formats.
C:\space
.There are multiple ways to load triples into MarkLogic, including MarkLogic Content Pump (mlcp), REST endpoints, and XQuery. The recommended way to bulk-load triples is with mlcp. These examples use mlcp on a standalone Windows environment.
cmd.exe
, navigate to the mlcp bin directory for your mlcp installation. For example:cd C:\mlcp-11.0\bin
C:\space
, enter the following single-line command at the prompt:mlcp.bat import -host localhost -port 8000 -username admin ^ -password password -input_file_path c:\space\persondata_en.ttl ^ -mode local -input_file_type RDF -output_uri_prefix /people/
For clarity the long command line is broken into multiple lines using the Windows line continuation character ^. Remove the line continuation characters to use the command.
The modified command for UNIX is:
mlcp.sh import -host localhost -port 8000 -username admin -password\ password -input_file_path /space/persondata_en.ttl -mode local\ -input_file_type RDF -output_uri_prefix /people/
For clarity, the long command line is broken into multiple lines using the UNIX line continuation character \. Remove the line continuation characters to use the command.
The triples will be imported and stored in the default Documents database.
14/09/15 14:35:38 INFO contentpump.ContentPump: Hadoop library version: 2.0.0-alpha 14/09/15 14:35:38 INFO contentpump.LocalJobRunner: Content type: XML 14/09/15 14:35:38 INFO input.FileInputFormat: Total input paths to process : 1 O:file:///home/persondata_en.ttl : persondata_en.ttl 14/09/15 14:35:40 INFO contentpump.LocalJobRunner: completed 0% 14/09/15 14:40:27 INFO contentpump.LocalJobRunner: completed 100% 14/09/15 14:40:28 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats: 14/09/15 14:40:28 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 59595 14/09/15 14:40:28 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0 14/09/15 14:40:28 INFO contentpump.LocalJobRunner: Total execution time: 289 sec
To verify that the RDF triples are successfully loaded into the triples database, do the following.
http://hostname:8000
where hostname is the name of your MarkLogic Server host machine, and 8000
is the default port number for the REST instance that was created when you installed MarkLogic Server.
http://hostname:8000/v1/graphs/things
You can run SPARQL queries in Query Console or via an HTTP endpoint using the /v1/graphs/sparql
endpoint (GET /v1/graphs/sparql and POST /v1/graphs/sparql). This section includes the following topics:
This section assumes you loaded the sample dataset as described in Downloading the Dataset.
You can run queries in Query Console using native SPARQL or the built-in function sem:sparql.
http://hostname:8000/qconsole
PREFIX db: <http://dbpedia.org/resource/> PREFIX onto: <http://dbpedia.org/ontology/> SELECT * WHERE { ?s onto:birthPlace db:Brooklyn }
Use the built-in XQuery function sem:sparql in Query Console to run the same query.
xquery version "1.0-ml"; sem:sparql(' PREFIX db: <http://dbpedia.org/resource/> PREFIX onto: <http://dbpedia.org/ontology/> SELECT * WHERE { ?s onto:birthPlace db:Brooklyn } ')
For more information and examples of SPARQL queries, see Semantic Queries.