MarkLogic Server is a powerful NoSQL database for harnessing your digital content base, complete with Enterprise features demanded by real world, mission-critical applications. MarkLogic enables you to build complex applications that interact with large volumes of content in JSON, XML, SGML, HTML, and other popular content formats, as well as binary formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying content repository. MarkLogic can be configured for a distributed environment, enabling you to scale your infrastructure through hardware expansion.
This installation guide explains the procedures needed to install MarkLogic on your system. It is intended for a technical audience, specifically an IT staff with experience in JSON and XML. This document only explains how to install the software, not how to use the software. To learn how to get started using the software, see the rest of the MarkLogic documentation (available on docs.marklogic.com), including the following documents:
When MarkLogic installs, it sets memory and other settings based on the size of the computer in which it is running. MarkLogic is a scalable, multi-threaded server product, and as such it assumes it has the entire machine available to it, including the cpu and disk I/O capacity. It is important to follow the guidelines set up in this chapter. Furthermore, MarkLogic assumes there is only one MarkLogic Server process running on any given machine, so it is not recommended to run multiple instances of MarkLogic on a single machine.
The first time it runs, MarkLogic Server automatically configures itself to the amount of memory on the system, reserving as much as it can for its own use. If you need to change the default configuration, you can manually override these defaults at a later time using the Admin Interface.
merge max sizedatabase merge setting is set to the default of 32GB. This translates to approximately 1.5 times the disk space of the source content after it is loaded. *
If you have Huge Pages set up on a Linux system, your swap space on that machine should be equal to the size of your physical memory minus the size of your Huge Page (because Linux Huge Pages are not swapped), or 32GB, whichever is lower. For example, if you have 48GB of physical memory, and if you have Huge Pages set to 18 GB, then you need swap space of 30 GB (48 - 18).
* You need at least 2 times the
merge max size of free space per forest, regardless of the forest size. Therefore, with the default
merge max size of 32GB, you need at least 64GB of free space. Additionally, if your journals are not yet created, you need 2 times the journal size of free disk space (if the journal space is not yet allocated). Therefore, to be safe, you need (with the default
merge max size and a 2G journal size) at least 68GB of free space for each forest, no matter what size the forest is.
* Microsoft Windows 7 and Windows 8 are supported for development only. If MarkLogic Server fails to start up on Windows with the error 'the application failed to initialize properly (0xc0150002)', then a dependency is missing from your environment and you need to download and install the following DLL for 64-bit versions of Windows: http://www.microsoft.com/downloads/details.aspx?FamilyID=eb4ebe2d-33c0-4a47-9dd4-b9a6d7bd44da&DisplayLang=en. Additionally, if you get an error on startup saying you need MSVCR100.dll, the install the Microsoft Visual C++ 2010 SP1 Redistributable Package (x64) http://www.microsoft.com/en-us/download/details.aspx?id=13523.
noop I/O scheduler is required to ensure efficient disk I/O for MarkLogic Server on Linux. You should not use
noop unless your MarkLogic host has intelligent I/O controllers or is only connected to SSDs. For more details, see http://help.marklogic.com/Knowledgebase/Article/View/8/0/notes-on-io-schedulers.
cyrus-sasl-lib packages are required on Red Hat Enterprise Linux. Additionally, on 64-bit Red Hat Enterprise Linux, both the 32-bit and the 64-bit
glibc packages are required.
*****Mac OS X is supported for development only. Conversion (Office and PDF) and entity enrichment are not available on Mac OS X. Mac OS X 10.8 or 10.9 (Mountain Lion or Mavericks) on a 64-bit capable processor is required (http://support.apple.com/kb/HT3696).
MarkLogic relies on the operating system for filesystem operations. While any filesystem that works properly (including under heavy load) should work, the following table lists the operating systems along with the filesystems under which they are certified. Other filesystems may work but have not been thoroughly tested by MarkLogic.
|Operating System||Certified Filesystems|
|Linux (all varieties)|
|Sun Solaris||UFS, as well as the clustered filesystems for shared-disk failover mentioned in Requirements for Shared-Disk Failover in the Scalability, Availability, and Failover Guide.|
|All||Hadoop HDFS, Amazon S3 (no journaling with S3)|
MarkLogic 8 supports upgrades from MarkLogic 5, MarkLogic 6, or from MarkLogic 7 or later databases. If you are upgrading from an earlier version of MarkLogic Server, you must first upgrade to 5, 6, or 7 before moving to MarkLogic 8. For the procedure for upgrading, see Upgrading from Previous Releases.
During the upgrade, the security database, the schemas database, and the configuration files are automatically upgraded. The security database is upgraded with the latest execute privileges and the schemas database is upgraded with the latest version of the schemas used by MarkLogic Server. The upgrade occurs as part of the installation procedure.
Databases that contain your own content are also upgraded to work with MarkLogic 8; once you upgrade to MarkLogic 8, you will no longer be able to use that database with previous versions of MarkLogic. MarkLogic Corporation strongly recommends performing a backup of your databases before upgrading to MarkLogic 8. Additionally, MarkLogic Corporation recommends that you first upgrade to the latest maintenance release of the major version of MarkLogic you are running before upgrading to MarkLogic 8.
For the procedure for upgrading to MarkLogic 8, see Upgrading from Previous Releases. For details about known incompatibilities between MarkLogic 7 and MarkLogic 8, see Known Incompatibilities with Previous Releases in the Release Notes.
When Upgrading from releases prior to MarkLogic 5 to MarkLogic 8, the upgrade reconfigures the Docs and App Services App Servers, which by default are on port 8000 and port 8002 in older releases. In order for those App Servers to be upgraded, the following conditins must be met:
|App-Services App Server|
|Manage App Server|
If the conditions are not met, then the upgrade logs an error to the
ErrorLog.txt file and the Application Services portion of the upgrade is skipped. MarkLogic Server will still operate, but you will not be able to use Query Console, the Management API, and the rest of the Application Services features. To restore the Application Services functionality after a failed upgrade, create two App Servers with the configuration shown above. If you have any problems, contact MarkLogic technical support.
MarkLogic 8 does not require a reindex from MarkLogic 6 or MarkLogic 7 databases. Therefore, if you are upgrading from MarkLogic 6 or MarkLogic 7, the database will not reindex, even if
reindex enable is set to
To take advantage of index improvements, MarkLogic 8 does require a reindex when upgrading from MarkLogic 5 and earlier databases. When you upgrade to MarkLogic 8, all databases with
reindex enable set to
true will automatically begin reindexing immediately. If you do not want the databases to reindex, you must set
reindex enable to
false before upgrading (that is, you must set
reindex enable to
false in MarkLogic 5). You can always reindex your content later by changing this setting back to
true after installing MarkLogic 8.
Also, you can reindex your content a little bit at a time by enabling reindexing for a while, then disabling it for a while, then enabling it, and so on. You might want to use this technique to reindex your database during non-peak hours, for example, over a period of hours or days or weeks, depending on how large your database is. Always ensure that you have the proper system requirements, as defined in Memory, Disk Space, and Swap Space Requirements.