This chapter introduces MarkLogic Server, lists the product requirements and supported platforms, and describes the database compatibility with previous releases. It includes the following sections:
MarkLogic Server is a powerful NoSQL database for harnessing your digital content base, complete with Enterprise features demanded by real world, mission-critical applications. MarkLogic enables you to build complex applications that interact with large volumes of content in JSON, XML, SGML, HTML, and other popular content formats, as well as binary formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying content repository. MarkLogic can be configured for a distributed environment, enabling you to scale your infrastructure through hardware expansion.
This installation guide explains the procedures needed to install MarkLogic on your system. It is intended for a technical audience, specifically an IT staff with experience in JSON and XML. This document only explains how to install the software, not how to use the software. To learn how to get started using the software, see the rest of the MarkLogic documentation (available on docs.marklogic.com), including the following documents:
When MarkLogic installs, it sets memory and other settings based on the size of the computer in which it is running. MarkLogic is a scalable, multi-threaded server product, and as such it assumes it has the entire machine available to it, including the cpu and disk I/O capacity. It is important to follow the guidelines set up in this chapter. Furthermore, MarkLogic assumes there is only one MarkLogic Server process running on any given machine, so it is not recommended to run multiple instances of MarkLogic on a single machine.
Before installing the software, be sure that your system meets the following requirements:
The first time it runs, MarkLogic Server automatically configures itself to the amount of memory on the system, reserving as much as it can for its own use. If you need to change the default configuration, you can manually override these defaults at a later time using the Admin Interface.
merge max size
database merge setting is set to the default of 32GB. This translates to approximately 1.5 times the disk space of the source content after it is loaded. *For example, if you plan on loading content that will result in a 200 GB database, reserve at least 300 GB of disk space. The disk space reserve is required for merges.
If you have Huge Pages set up on a Linux system, your swap space on that machine should be equal to the size of your physical memory minus the size of your Huge Page (because Linux Huge Pages are not swapped), or 32GB, whichever is lower. For example, if you have 48GB of physical memory, and if you have Huge Pages set to 18 GB, then you need swap space of 30 GB (48 - 18).
At system startup on Linux machines, MarkLogic Server logs a message to the ErrorLog.txt
file showing the Huge Page size, and the message indicates if the size is below the recommended level.
If you are using Red Hat Enterprise Linux 6 or 7, you must turn off Transparent Huge Pages (Transparent Huge Pages are configured automatically by the operating system).
For example, if you have a Solaris machine with 32 GB of memory, you should ideally configure the swap space to be 64 GB (and at least 32 GB).
* You need at least 2 times the merge max size
of free space per forest, regardless of the forest size. Therefore, with the default merge max size
of 32GB, you need at least 64GB of free space. Additionally, if your journals are not yet created, you need 2 times the journal size of free disk space (if the journal space is not yet allocated). Therefore, to be safe, you need (with the default merge max size
and a 2G journal size) at least 68GB of free space for each forest, no matter what size the forest is.
MarkLogic Server is supported on the following platforms:
* Microsoft Windows 7 (x64) or later is supported for development only. If MarkLogic Server fails to start up on Windows with the error 'the application failed to initialize properly (0xc0150002)', then a dependency is missing from your environment and you need to download and install the following DLL for 64-bit versions of Windows: http://www.microsoft.com/downloads/details.aspx?FamilyID=eb4ebe2d-33c0-4a47-9dd4-b9a6d7bd44da&DisplayLang=en. Additionally, if you get an error on startup saying you need MSVCR100.dll, the install the Microsoft Visual C++ 2010 SP1 Redistributable Package (x64) http://www.microsoft.com/en-us/download/details.aspx?id=13523.
** Either none
, mq-deadline
or kyber
I/O scheduler is required to ensure efficient disk I/O for MarkLogic Server on Linux. When configuring an I/O scheduler with SSDs in a virtualized environment (including any cloud-based virtual machines), the OS I/O scheduling should be set to none
for 4.x kernels or noop
/none
for older 3.x kernels. For more details, see http://help.marklogic.com/Knowledgebase/Article/View/8/0/notes-on-io-schedulers, https://lonesysadmin.net/2013/12/06/use-elevator-noop-for-linux-virtual-machines/, and https://access.redhat.com/solutions/5427.
*** The redhat-lsb
, glibc
, gdb
, and cyrus-sasl-lib
packages are required on Red Hat Enterprise Linux. Additionally, on 64-bit Red Hat Enterprise Linux, both the 32-bit and the 64-bit glibc
packages are required. Red Hat Enterprise Linux/CentOS, Version 6 RPM is the version that should be used to install MarkLogic 8 on Amazon Linux.
**** Red Hat Enterprise Linux 6 (x64) and Red Hat Enterprise Linux 7 (x64) are also supported on VMware ESXi 5.5 and 6.0 (installed on bare metal).
***** Mac OS X is supported for development only. Conversion (Office and PDF) and entity enrichment are not available on Mac OS X. A 64-bit capable processor is required (http://support.apple.com/kb/HT3696).
MarkLogic relies on the operating system for filesystem operations. While any filesystem that works properly (including under heavy load) should work, the following table lists the operating systems along with the filesystems under which they are supported. Other filesystems may work but have not been thoroughly tested by MarkLogic.
Operating System | Supported Filesystems |
---|---|
Linux (all varieties) | XFS (recommended), EXT3, EXT4, as well as the clustered filesystems for shared-disk failover mentioned in Requirements for Shared-Disk Failover in the Scalability, Availability, and Failover Guide. |
Sun Solaris | UFS, as well as the clustered filesystems for shared-disk failover mentioned in Requirements for Shared-Disk Failover in the Scalability, Availability, and Failover Guide. |
Windows | NTFS |
Mac OS | HFS+ |
All | Hadoop HDFS, Amazon S3 (no journaling with S3) |
Additionally, HDFS storage is supported with MarkLogic on the HDFS platforms described in HDFS Storage in the Query Performance and Tuning Guide.
MarkLogic 8 supports upgrades from MarkLogic 5, MarkLogic 6, or from MarkLogic 7 or later databases. If you are upgrading from an earlier version of MarkLogic Server, you must first upgrade to 5, 6, or 7 before moving to MarkLogic 8. For the procedure for upgrading, see Upgrading from Previous Releases.
During the upgrade, the security database, the schemas database, and the configuration files are automatically upgraded. The security database is upgraded with the latest execute privileges and the schemas database is upgraded with the latest version of the schemas used by MarkLogic Server. The upgrade occurs as part of the installation procedure.
Databases that contain your own content are also upgraded to work with MarkLogic 8; once you upgrade to MarkLogic 8, you will no longer be able to use that database with previous versions of MarkLogic. MarkLogic Corporation strongly recommends performing a backup of your databases before upgrading to MarkLogic 8. Additionally, MarkLogic Corporation recommends that you first upgrade to the latest maintenance release of the major version of MarkLogic you are running before upgrading to MarkLogic 8.
For the procedure for upgrading to MarkLogic 8, see Upgrading from Previous Releases. For details about known incompatibilities between MarkLogic 7 and MarkLogic 8, see Known Incompatibilities with Previous Releases in the Release Notes.
This section contains database compatibility information between various releases, and includes the following sections.
When Upgrading from releases prior to MarkLogic 5 to MarkLogic 8, the upgrade reconfigures the Docs and App Services App Servers, which by default are on port 8000 and port 8002 in older releases. In order for those App Servers to be upgraded, the following conditions must be met:
Docs/
.Apps/
or Apps/appbuilder
.If the above conditions are met, then those App Servers are reconfigured during the MarkLogic 8 upgrade and the resulting configurations have the following settings:
App-Services App Server | |
---|---|
Port | 8000 |
Name | App-Services |
Root | Apps/ |
Error Handler | error-handler.xqy |
URL Rewriter | rewriter.xqy |
Database | App-Services |
Manage App Server | |
---|---|
Port | 8002 |
Name | Manage |
Root | Apps/ |
Error Handler | manage/error-handler.xqy |
URL Rewriter | manage/rewriter.xqy |
Database | App-Services |
Privilege | manage |
If the conditions are not met, then the upgrade logs an error to the ErrorLog.txt
file and the Application Services portion of the upgrade is skipped. MarkLogic Server will still operate, but you will not be able to use Query Console, the Management API, and the rest of the Application Services features. To restore the Application Services functionality after a failed upgrade, create two App Servers with the configuration shown above. If you have any problems, contact MarkLogic technical support.
MarkLogic 8 does not require a reindex from MarkLogic 6 or MarkLogic 7 databases. Therefore, if you are upgrading from MarkLogic 6 or MarkLogic 7, the database will not reindex, even if reindex enable
is set to true
.
To take advantage of index improvements, MarkLogic 8 does require a reindex when upgrading from MarkLogic 5 and earlier databases. When you upgrade to MarkLogic 8, all databases with reindex enable
set to true
will automatically begin reindexing immediately. If you do not want the databases to reindex, you must set reindex enable
to false
before upgrading (that is, you must set reindex enable
to false
in MarkLogic 5). You can always reindex your content later by changing this setting back to true
after installing MarkLogic 8.
Also, you can reindex your content a little bit at a time by enabling reindexing for a while, then disabling it for a while, then enabling it, and so on. You might want to use this technique to reindex your database during non-peak hours, for example, over a period of hours or days or weeks, depending on how large your database is. Always ensure that you have the proper system requirements, as defined in Memory, Disk Space, and Swap Space Requirements.
MarkLogic converters are used to convert Microsoft Office Word, Excel, and PowerPoint documents, as well as Adobe PDF files, to XHTML. MarkLogic filters are used to filter a variety of document formats, extract metadata and text from them, and return XHTML. The following MarkLogic XQuery API functions, described in the MarkLogic XQuery and XSLT Function Reference, provide this functionality:
xdmp:word-convert xdmp:excel-convert xdmp:powerpoint-convert xdmp:pdf-convert xdmp:document-filter
Converters/filters are also used as part of conversion pipeline in Content Processing Framework. For more details, see The Default Conversion Option in the Content Processing Framework Guide.
Prior to MarkLogic release 8.0-8, converters/filters were bundled and automatically installed with MarkLogic Server. In MarkLogic version 8 releases starting at 8.0-8, converters/filters are offered as a separate package, MarkLogic Converters.
This change provides better flexibility and enables you to install/uninstall MarkLogic converters/filters separately from MarkLogic Server.
With this change, MarkLogic Server does not include MarkLogic Converters. To use converters/filters, install both packages: MarkLogic Server and MarkLogic Converters. An XDMP-CVTNOTFOUND
error will be thrown upon an attempt to use converters/filters on a MarkLogic node with no MarkLogic Converters installed.
The version of MarkLogic Converters is synchronized with the version of MarkLogic Server. For example, MarkLogic Converters 8.0-8 corresponds to MarkLogic Server 8.0-8 and may be installed with it.
You can obtain the version of MarkLogic Converters installed on a node by calling to MarkLogic server-side API function xdmp:host-status and examining the value of the converters-version
element in the response. If the converters package is not installed on a node, the converters-version
element will be empty.
MarkLogic Converters packages for all supported platforms are available for download at the same location where MarkLogic Server packages are available, namely at http://developer.marklogic.com.
If you want to use the converters package with MarkLogic 8.0-8 or later version 8 release, you will have to perform a two-step installation: first install MarkLogic Server and then install MarkLogic Converters.
For details on MarkLogic Server and MarkLogic Converters installation for all supported platforms, see Installing MarkLogic.
If you want to uninstall MarkLogic 8.0-8 or later version 8 release, and if the converters package was previously installed with it, you will have to perform a two-step uninstall: first uninstall MarkLogic Converters and then uninstall MarkLogic Server.
For details on MarkLogic Server and MarkLogic Converters uninstall for all supported platforms, see Removing MarkLogic.