Loading TOC...
Reference Application Architecture Guide (PDF)

Reference Application Architecture Guide — Chapter 3

Samplestack: A Reference Architecture Instantiation

This chapter provides a high level overview of the Samplestack application. Samplestack is demo application based the MarkLogic Reference Application Architecture.

The following topics are covered:

What is Samplestack?

Samplestack is a full-featured example web application that implements the MarkLogic Reference Application Architecture and the best practices described in this guide. You can deploy Samplestack in your development environment as a learning tool; it is maintained as an open source project.

The technologies selected for use in Samplestack reflect a forward-looking approach to developing enterprise web applications. However, nothing about the architecture, the application, or the best practices requires you to use these specific technologies.

Samplestack models a Question and Answer site using freely available data from stackoverflow.com. A Samplestack user can perform the following tasks:

  • Browse or search questions and answers
  • Submit a question
  • Submit an answer
  • Vote on a question or an answer
  • Accept an answer (submitter only)

Users can browse and search questions and answers anonymously. Submitting questions, answers, or comments; accepting answers; and voting require the user to log into the site. Voting affects the ordering of search results because questions with more votes appear in the search results before questions with fewer votes. Accepting a best answer to a question affects the reputation of the contributor of the accepted answer.

MarkLogic Server provides the database tier of Samplestack. The middle tier is a Java stack, based on Spring Boot and other standard Java libraries. The browser tier is a Single Page Application (SPA) implemented on AngularJS. For details on the libraries and frameworks used by Samplestack, see Technologies Used in Samplestack.

The Samplestack project includes robust project infrastructure that demonstrates the best practices outlined in Recommended Best Practices. For details, see Best Practices Demonstrated by Samplestack.

Samplestack is an open source project maintained on GitHub. You can review, download, and contribute to the project at http://github.com/marklogic/marklogic-samplestack.

Samplestack Implementation Overview

This section discusses how some key features of Samplestack are enabled by MarkLogic Server. Samplestack uses additional MarkLogic features not discussed here. For more details, see Exploring Samplestack in Detail.

The following topics are covered:

UI to MarkLogic Feature Summary

The following table maps visual elements of Samplestack to MarkLogic capabilities and points to a brief related discussion. The graphics after this table tie the same visually tie the same mapping to the Samplestack UI.

Application Feature MarkLogic Capability Where to go for more information
Full-text Search Full-text search, different query styles, indexing Full Text Search
Facets Search constraints, analytics Search Result Filtering
Users and Roles Authentication, security, privileges Users and Roles
User Records and Q&A Documents JSON data model, POJO data binding Document Model
Submitting Questions, Answers, and Comments Inserting and updating documents Document Insertion and Update
Accepting Answers, Reputation Transaction model, data integrity Transactions and Data Integrity

The following image highlights MarkLogic capabilities that enable key elements of the Samplestack search view. From this view, users can search (with or without filters), review search results, log in, and ask questions (if logged in).

The following image highlights MarkLogic capabilities that enable key elements of a Samplestack Q&A view. From this view, users can review questions, answers, and comments. Logged in users can post answers, post comments, and vote on questions and answers. The submitter of a question can also accept a best answer.

Full Text Search

The full-text search box at the top of each page uses MarkLogic's built-in string search grammar, augmented by the definition of some custom bindings between a search term qualifier and specific data properties such as username or tags.

If you click on Search Tips, you see a summary of the search grammar supported by Samplestack:

All the expressiveness in this summary is built into the MarkLogic default string search grammar, including logical operators, relational operators, grouping, and qualified terms (tag:value).

The application-specific prefixes on qualified terms (tag, user, askedBy, answeredBy, commentedBy, votes) represent a binding between a name (answeredBy) and a slice of the database content backed by a range index. The MarkLogic client APIs include simple hooks for defining such bindings.

Search Result Filtering

The Filter Results widget on the left side of the page enables users to narrow a search in several ways. The following picture highlights the types of filtering. The search constraint and faceting features that back these filters are built into MarkLogic. The application simply configures the details, such as which content properties to constrain by or generate facets from.

Users and Roles

The user interactions in Samplestack support the following conceptual roles:

  • Guest: Users who are not logged in can explore questions and answers, but they cannot vote or submit questions, comments, or answers. A guest user cannot see questions that do not yet have accepted answers.
  • Contributor: Users who log in gain the ability to vote and to submit questions, comments, and answers. An authenticated contributor can also accept a best answer to his or her submitted question.
  • Administrator: A user with administrative privileges has all the contributor privileges, plus can make configuration changes and manage the application. These capabilities are not exposed through the Samplestack UI. However, the project infrastructure uses this role to configure MarkLogic Server and deploy the application.

When you set up Samplestack, one Contributor user, 'joe@example.com, and one Administrative user, mary@example.com, are pre-defined. For credential details, see the marklogic-samplestack project on GitHub.

When a user logs into Samplestack, the middle tier authenticates the credentials using LDAP and maps the user to a shared MarkLogic user with appropriate privileges. The following diagram illustrates this flow for joe@example.com, a user with Contributor rights.

The Samplestack setup process creates MarkLogic users and security roles that support the Samplestack user model. For example, setup creates the following roles.

Only users with the samplestack-writer role (or equivalent privileges) can use write operations of the MarkLogic client APIs because only this role includes the rest-writer role. When you install MarkLogic, the rest-writer, rest-reader, rest-admin, and rest-extension-user roles are available for managing access through the MarkLogic client APIs.

The Samplestack setup also creates the following MarkLogic users and assigns them roles appropriate for the access granted to each type of user. The samplestack-guest user has only the samplestack-guest role, while the samplestack-contributor user has both the samplestack-guest and samplestack-writer roles.

The samplestack-admin user has roles that allow use of the REST Management API, as well as the rest-admin role that allows use of the administrative endpoints of the MarkLogic client APIs. This enables the samplestack-admin user to perform operations such as database configuration and installation of any application code that runs inside MarkLogic.

Document Model

The Samplestack data consists of two types of JSON documents:

  • Q&A documents. Each document contains a question, its answers, and any comments.
  • Contributor documents. Each user known to the application has a corresponding contributor document that tracks information such as display name, votes, and reputation.

The contributor documents represent Samplestack domain objects. That is, the middle tier uses the POJO Binding feature of the MarkLogic Java Client API to manipulate contributor data directly as the Java class com.marklogic.samplestack.domain.Contributor.

The Q&A documents are larger and more complex than the contributor documents. Therefore, changes to the Q&A documents are made through the document patch feature of the MarkLogic Java Client API. For details, see Document Insertion and Update.

The reputation for each contributor is stored in the contributor document. However, when displaying a question with all its answers and comments, contributor reputation is included in the information display about each user. For example, the following contributor information, which is displayed with an answer, reflects a user with a reputation of 588:

Joins should be performed close to the data, so an application-specific transformation installed on MarkLogic joins reputation data from contributor documents to Q&A data when a request is received for a specific Q&A document. A similar join is performed when retrieving search results.

The JSON received from the browser tier when users submit questions, answers, and comments is minimally modified by the middle tier. Those modifications that are made are consumable by the browser tier in subsequent fetches, so the data can be fetched from the database and passed back to the browser tier very efficiently.

Document Insertion and Update

When a contributor submits a question, a Q&A document is inserted into the database. The Browser Tier submits the contributor and question to the middle tier as JSON, using the application-specific REST API. The middle tier adds additional information, generates a database URI for the document, and then submits the complete Q&A structure to the database tier through the MarkLogic client API.

The following diagram illustrates the question submission flow.

When a user answers a question, the answer is added to the associated Q&A document in the database. Recall that a question, its answers, and related comments are stored in a single document.

In this flow, the browser tier submits the answerer, question id, and answer to the middle tier as JSON, using the application-specific REST API. The middle tier adds additional information to the answer, and then inserts the answer into the Q&A document using the patch feature of the MarkLogic Client API. The Q&A document need not be fetched from the database tier to perform this update. Submitting comments follows a similar flow.

The following diagram illustrates the answer submission flow.

Transactions and Data Integrity

Accepting an answer requires updating both a Q&A document and a contributor document. When a submitter accepts an answer to his or her question as the best answer, the Q&A document is updated to reflect the acceptance. At the same time, the reputation of the user who contributed the answer is incremented.

The update of the Q&A document and the contributor reputation should be an atomic operation. The answer should not be accepted without updating the reputation, and the reputation should not be incremented without accepting the answer.

Data integrity is preserved by using a MarkLogic multi-statement transaction. When the middle tier receives the answer acceptance from the browser tier, it requests the database client to create a multi-statement transaction on MarkLogic Server. The database client returns a transaction id. By including this transaction id in the update operations for the question and the contributor, the middle tier ensures both operations are part of the same transaction.

When both update operations complete successfully, the middle tier commits the transaction. If either operation fails, the middle tier rolls back the transaction.

Best Practices Demonstrated by Samplestack

The Samplestack project infrastructure demonstrates the following best practices recommended by the MarkLogic Reference Application Architecture:

Project Organization

If you explore the organization of the marklogic-samplestack project on github, you will see that the top level of the project is divided into a folder for each tier, plus a folder for shared assets:


The folder for each tier contains the source code, tests, configuration files, and automation drivers required to build, test, and deploy that tier. To explore the project organization in detail, go to the following URL:


Source Control and Issue Tracking

The Samplestack project uses GitHub and git for source control management. All the assets required to build, test, and deploy the application are under source control, except the large sample data set.

GitHub includes an issue tracking system, and the Samplestack development team uses it to track bugs, tasks, and open design issues.

Configuration and Dependency Management

The Samplestack project makes extensive use of configuration properties files to drive build, testing, and deployment.

For example, marklogic-samplestack/appserver/java-spring/gradle.properties contains configuration properties required for the Java middle tier to connect to MarkLogic Server, including the MarkLogic usernames, and the host and port information used to connect to MarkLogic. These same properties are used for setting up the database tier and testing. Thus, changing this one file and re-running the setup scripts is all that's needed to use a different MarkLogic instance or connect as different users.

To view this file on GitHub, go to the following URL:


Similarly, marklogic-samplestack/appserver/java-spring/build.gradle contains all the external dependencies required by the Samplestack middle tier, and the browser tier maintains a similar list of dependencies in marklogic-samplestack/browser/bowser.json.

In the database tier, the MarkLogic database configuration details are maintained in marklogic-samplestack/database/database-properties.json, in a format suitable for passing directly to MarkLogic using the MarkLogic REST Management API.

Task Automation

The setup, build, deployment, and testing of Samplestack is fully automated. The tiers do not all use the same automation tools because they have different requirements.

The database tier and the Java implementation of the middle tier use the build automation tool gradle. The Samplestack gradle configuration defines a wide variety of task necessary to bring up and test the middle tier, including the following:

  • Creating the MarkLogic roles, users, and REST API instance (App Server and database) required by Samplestack.
  • Configuring database indexes and other properties.
  • Loading the database tier Samplestack assets into the modules database on MarkLogic Server. This includes resource service extensions, content transformations, and persistent query options.
  • Loading seed data into the database.
  • Building the Java implementation of the middle tier.
  • Starting the middle tier.
  • Testing the middle tier.
  • MarkLogic teardown - removing the REST API instance and database.
  • An umbrella task that combines all of these steps into a single step that takes you from a freshly installed MarkLogic to a running, smoke-tested Samplestack middle tier.

For more details, see the following page:


Since the browser tier is implemented in JavaScript and Node.js, it uses a different set of tools for similar purposes: npm, bower, and gulp. For details, see the following page:



The browser and middle tier each have a set of unit tests that can be run independent of the other tiers. These tests include scaffolding that mocks up adjacent tiers.

For example, the middle tier includes JUnit-driven unit tests for the Java implementation that are run as part of the middle tier start up task (the appserver and assemble gradle tasks). This provides a sanity check each time you deploy a change.

The middle tier also includes integration tests that test interactions with each tier independently. That is, interactions between the middle tier and the database tier, and interactions between the middle tier and the browser tier.

The browser tier tests includes end-to-end tests that exercise the whole application stack. These tests use Selenium to drive the browser, and test tools such as gherkin, cucumber, and protractor to define the behaviors and expected results. For details, see the section on testing on the following page:


To further explore the testing in Samplestack, see the following parts of the project:

  • marklogic-samplestack/appserver/java-spring/src/test
  • marklogic-samplestack/browser/test

Technologies Used in Samplestack

This section summarizes some of the technologies used to implement, build, test, and deploy Samplestack. Using these technologies illustrates how easily you can implement the recommended best pratices and integrate MarkLogic into your development process using industry standard technologies.

Your application does not have to use the same tools and frameworks in order to use MarkLogic.

This section covers the following topics:

Samplestack Implementation

The implementation of Samplestack uses key MarkLogic features alongside technologies such as the ones shown in the following diagram::

Samplestack Build, Test and Deployment Automation

The Samplestack project infrastructure demonstrate development best practices such as source code management, automated builds, automated unit, integration, and system testing, and configuration-driven application deployment.

The following tools and technologies are used to configure, build, test, and deploy Samplestack:

  • Source control and issue tracking: GitHub
  • Build, test, and deployment process automation: gradle, gulp, bower, npm
  • Testing: JUnit, Selenium, mocha, PhantomJS, cucumber, protractor, gherkin,
  • Application configuration and deployment: MarkLogic REST Management API, mlcp (MarkLogic content pump)

Exploring Samplestack in Detail

To try Samplestack or explore its implementation in more detail, see the marklogic-samplestack project on GitHub:


The project page includes directions for building and setting up the application. The project wiki includes more information on the implementation, tests, and tooling.

« Previous chapter