Loading TOC...
Search Developer's Guide (PDF)

Search Developer's Guide — Chapter 10

Geospatial Search Applications

This chapter describes how to use the geospatial functions and describes the type of applications that might use these functions, and includes the following sections:

Overview of Geospatial Data in MarkLogic Server

In its most basic form, geospatial data is a set of latitude and longitude coordinates. Geospatial data in MarkLogic Server is marked up in XML elements and/or attributes. There are a variety of ways you can represent geospatial data as XML, and MarkLogic Server supports several different representations. This section provides an overview of how geospatial data and queries work in MarkLogic Server, and includes the following parts:

Terminology

The following terms are used to describe the geospatial features in MarkLogic Server:

  • coordinate system

    A geospatial coordinate system is a set of mappings that map places on Earth to a set of numbers. The vertical axis is represented by a latitude coordinate, and the horizontal axis is represented by a longitude coordinate, and together they make up a coordinate system that is used to map places on the Earth. For more details, see Latitude and Longitude Coordinates in MarkLogic Server.

  • point

    A geospatial point is the spot in the geospatial coordinate system representing the intersection of a given latitude and longitude. For more details, see Points in MarkLogic Server.

  • proximity

    The proximity of search results is how close the results are to each other in a document. Proximity can apply to any type of search terms, included geospatial search terms. For example, you might want to find a search term dog that occurs within 10 words of a point in a given zip code.

  • distance

    The distance between two geospatial objects refers to the geographical closeness of those geospatial objects.

Coordinate System

MarkLogic Server supports two types of coordinate systems for geospatial data:

  • WGS84
  • Raw

By default, MarkLogic Server uses the World Geodetic System (WGS84) as the basis for geocoding. WGS84 sets out a coordinate system that assumes a single map projection of the earth. WGS84 is widely used for mapping locations on the earth, and is used by a wide range of services, including many satellite services (notably: Global Positioning System--GPS) and Google Maps. There are other geocoding systems, some of which have advantages or disadvantages over WGS84 (for example, some are more accurate in a given region, some are less popular); MarkLogic Server uses WGS84, which is a widely accepted standard for global point representation. For details on WGS84, see http://en.wikipedia.org/wiki/World_Geodetic_System.

You can use the raw coordinate system when you want your points mapped onto a flat plane instead of onto the geometry of the earth.

Types of Geospatial Queries

The following types of geospatial queries are supported in MarkLogic Server:

  • point query--matches a single point
  • box query--any point within a rectangular box
  • radius query--any point within a specified distance around a point
  • polygon query--any point within a specified n-sided polygon

Geospatial cts:query constructors are composable just like any other cts:query constructors. For details on composing cts:query constructors, see Composing cts:query Expressions.

In addition to geospatial query constructors, there are built-in functions to perform operations (such as calculating distance) on geospatial data, as enumerated in Geospatial Operations.

Using the geospatial query constructors requires a valid geospatial license key; without a valid license key, searches that include geospatial queries will throw an exception.

XQuery Primitive Types And Constructors for Geospatial Queries

To support geospatial queries, MarkLogic Server has the following XQuery primitive types:

You use these primitive types in geospatial cts:query constructors (for example, cts:element-geospatial-query, cts:element-attribute-pair-geospatial-query, cts:element-pair-geospatial-query, and so on.). Each of the cts:box, cts:circle, cts:complex-polygon, cts:linestring, cts:point, and cts:polygon XQuery primitive types is an instance of the cts:region base type. These types define regions, and then the query returns true if the regions contain matching data in the context of a search.

Additionally, there are constructors for each primitive type which attempt to take data and construct it into the corresponding type. If the data is not constructible, then an exception is thrown. MarkLogic Server parses the data to try and extract points to construct into the type. For example, the following constructs the string into a cts:polygon which includes the points separated by a space:

cts:polygon("38,-10 40,-10 39, -15")

The following constructs these coordinates (represented as numbers) into a cts:point:

cts:point(38.7, -10.3)

Well-Known Text (WKT) Markup Language

MarkLogic supports well-know text (WKT) markup language for representing geospatial data. You can use the following WKT objects in MarkLogic: POINT, POLYGON, LINESTRING, TRIANGLE, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION. To use WKT in a geospatial query, you use the cts:parse-wkt function to convert WKT into a sequence of cts:region items, and then you can use the cts:region items in any of the geospatial cts:query constructors.

For example, the following code returns a cts:complex-polygon that has an outer boundary and an inner boundary:

cts:parse-wkt("
 POLYGON(
  (0 0, 0 10, 10 10, 10 0, 0 0),
  (0 5, 0 7, 5 7, 5 5, 0 5) )" )

The following table shows how the WKT types map to the MarkLogic geospatial types.

WKT Type MarkLogic Geopsatial Type
POINT
cts:point
POINT EMPTY
cts:point (flagged as empty)
POLYGON
cts:complex-polygon | cts:polygon
POLYGON EMPTY
cts:complex-polygon (flagged as empty)
LINESTRING
cts:linestring
LINESTRING EMPTY
cts:linestring (flagged as empty)
TRIANGLE
cts:polygon
TRIANGLE EMPTY
cts:complex-polygon (flagged as empty)
MULTIPOINT
cts:point*
MULTIPOINT EMPTY
()
MULTILINESTRING
cts:linestring*
MULTILINESTRING EMPTY
()
MULTIPOLYGON
(cts:polygon | cts:complex-polygon)*
MULTIPOLYGON EMPTY
()
GEOMETRYCOLLECTION
cts:region*
GEOMETRYCOLLECTION EMPTY
()
others
throws XDMP-BADWKT

Similarly, you can convert a cts:region to WKT with the cts:to-wkt function. For example, the following returns a WKT POINT:

cts:to-wkt(cts:point(1, 2))

You cannot convert a cts:circle or a cts:box to WKT. For more details on WKT, see http://en.wikipedia.org/wiki/Well-known_text.

Understanding Geospatial Coordinates and Regions

This section describes the rules for geospatial coordinates and the various regions (cts:box, cts:circle, cts:complex-polygon, cts:linestring, cts:point, and cts:polygon), and includes the following parts:

Understanding the Basics of Coordinates and Points

To understand how geospatial regions are defined in MarkLogic Server, you should first understand the basics of coordinates and of points. This section describes the following:

Latitude and Longitude Coordinates in MarkLogic Server

Latitudes have north/south coordinates. They start at 0 degrees for the equator and head north to 90 degrees for the north pole and south to -90 degrees for the south pole. If you specify a coordinate that is greater than 90 degrees or less than -90 degrees, the value is truncated to either 90 or -90, respectively.

Longitudes have east/west coordinates. They start at 0 degrees at the Prime Meridian and head east around the Earth to 180 degrees, and head west around the earth (from the Prime Meridian) to -180 degrees. If you travel 360 degrees, it brings you back to the Prime Meridian. If you go west from the Prime Meridian, the numbers go negative. For example, New York City is west of the Prime Meridian, and its longitude is -73.99 degrees. Adding or subtracting any multiple of 360 to a longitude coordinate gives an equivalent coordinate.

Points in MarkLogic Server

A point is simply a pair of latitude and longitude coordinates. Where the points intersect is a place on the Earth. For example, the coordinates of San Francisco, California are a the pair that includes the latitude of 37.655983 and the longitude of -122.425525. The cts:point type is used to define a point in MarkLogic Server. Use the cts:point constructor to construct a point from a set of coordinates. Additionally, points are used to define the other regions in MarkLogic Server (cts:box, cts:polygon, and cts:circle).

Understanding Geospatial Boxes

Geospatial boxes allow you to make a region defined by four coordinates. The four coordinates define a geospatial box which, when projected onto a flat plane, forms a rectangular box. A point is said to be in that geospatial box if it is inside the boundaries of the box. The four coordinates that define a box represent the southern, western, northern, and eastern boundaries of the box. The box is two-dimensional, and is created by taking a projection from the three-dimensional Earth onto a flat surface. On the surface of the Earth, the edges of the box are arcs, but when those arcs are projected into a plane, they become two-dimensional latitude and longitude lines, and the space defined by those lines forms a rectangle (represented by a cts:box), as shown in the following figure.

The following are the assumptions and restrictions associated with geospatial boxes:

  • The four points on a box are south, west, north, and east, in that order.
  • Assuming a projection from the Earth onto a two-dimensional plane, boxes are determined by going from the south western limit to south eastern limit (even if it passes the date line), then north to the north eastern limit (border on the poles), then west to the north western limit, then back south to the south western limit where you started.
  • When determining the west/east boundary of the box, you always start at the western longitude and head east toward the eastern longitude. This means that if your western point is east of the date line, and your eastern point is west of the date line, then you will head east around the Earth until you get back to the eastern point.
  • Similarly, when determining the south/north sides of the box, you always start at the southern latitude and head north to the northern latitude. You cannot cross the pole, however, as it does not make sense to have the northern point south of the southern point. If you do cross a pole, a search that uses that box will throw an XDMP-BADBOX runtime error (because you cannot go north from the north pole). Note that the error will happen at search time, not at box creation time.
  • If the eastern coordinate is equal to the western coordinate, then only that longitude is considered. Similarly, if the northern coordinate is equal to the southern coordinate, only that latitude is considered. The consequence of these facts are the following:
    • If the western and eastern coordinates are the same, the box is a vertical line between the southern and northern coordinates passing through that longitude coordinate.
    • If the southern and northern coordinates are the same, the box is a horizontal line between the western and eastern coordinates passing through that latitude coordinate.
    • If the western and eastern coordinates are the same, and if the southern and northern coordinates are the same, then the box is a point specified by those coordinates.
  • The boundaries on the box are either in or out of the box, depending on query options (there are various boundary options on the geospatial cts:query constructors to control this behavior).

Understanding Geospatial Polygons: Polygons, Complex Polygons, and Linestrings

Geospatial polygons allow you to make a region with n-sided boundaries for your geospatial queries. These boundaries can represent any area on Earth (with the exceptions described below). For example, you might create a polygon to represent a country or a geographical region. There are three ways to construct these types of geospatial regions in MarkLogic: polygons, complex polygons, and linestrings. This section describes some of the charateristics of polygons, and includes the following parts:

Overview of Polygons

Polygons offer a large degree of flexibility compared with circles or boxes. In exchange for the flexibility, geospatial polygons are not quite as fast and not quite as accurate as geospatial boxes. The efficiency of the polygons is proportional to the number of sides to the polygon. For example, a typical 10-sided polygon will likely perform faster than a typical 1000-sided polygon. The speed is dependent on many factors, including where the polygon is, the nature of your geospatial data, and so on.

The following are the assumptions and restrictions associated with geospatial polygons:

  • Assumes the Earth is a sphere, divided by great circle arcs running through the center of the earth, one great circle divided the longitude (running through the Greenwich Meridian, sometimes called the Prime Meridian) and the other dividing the latitude (at the equator).
  • Each side of the polygons are semi-spherical projections from the endpoints onto the spherical Earth surface. Therefore, the lines are not all in a single plane, but instead follow the curve of the Earth (approximated to be a sphere).
  • A polygon cannot include both poles. Therefore, it cannot have both poles as a boundary (regardless of whether the boundaries are included), which means it cannot encompass the full 180 degrees of latitude.
  • A polygon edge must be less than 180 degrees; that is, two adjacent points of a polygon must wrap around less than half of the earth's longitude or latitude. If you need a polygon to wrap around more than 180 degrees, you can still do it, but you must use more than two points. Therefore, adjacent vertices cannot be separated by more than 180 degrees of longitude. As a result, a polygon cannot include the pole, except along one of its edges. Also as a result, if two points that make up a polygon edge are greater than 180 degrees apart, MarkLogic Server will always choose the direction that is less than 180 degrees.
  • Geospatial queries are constrained to elements and attributes named in the cts:query constructors. To cross multiple formats in a single query, use cts:or-query.
  • Some searches will throw a runtime exception if a polygon is not valid for the coordinate system (the coordinate system is specified at search time, not at cts:polygon creation time).
  • The boundaries on the polygon are either in or out of the polygon, depending on query options (there are various boundary options on the geospatial cts:query constructors to control this behavior).
  • Because of the spherical Earth assumption, and because points are represented by floats, results are not exact; polygons are not as accurate as the other methods because they use a sphere as a model of the Earth. While it may not be that intuitive, floats are used to represent points on the Earth because it turns out that there is no benefit in the accuracy if you use doubles (the Earth is just not that big).
Polygons

You can construct a cts:polygon by specifying the points that make up the vertices of the polygon. All points that are bounded by the resulting region are defined to be contained within the region.

Complex Polygons

You can construct a cts:complex-polygon by constructing a polygon within zero or more other polygons, and the resulting complex polygon is the part within the outer polygon but not within the inner polygon(s). You can also cast a cts:complex-polygon with no holes (that is, with no inner polygons) to a cts:polygon. If you specify multiple inner polygons, none of them should overlap each other.

Linestrings

A linestring is a sequence of connected joined arcs that do not necessarily form a closed loop the way a polygon forms a closed loop (although it is permissible for a linestring to form a closed loop). The 'lines' are actually arcs because they are projected onto the earth's surface. A linestring supports equality and inequality: two linestrings are equal if all of their verticies are equal (or if they are both empty). It is possible to cast a cts:linestring to a cts:polygon, which results in a 'flat' polygon that traces the same set of linestrings back to close the polygon.

Understanding Geospatial Circles

Geospatial circles allow you to define a region with boundaries defined by a point with a radius specified in miles. The point and radius define a circle, and anything inside the circle is within the boundaries of the cts:circle, and the boundaries of the circle are either in or out, depending on query options (there are various boundary options on the geospatial cts:query constructors to control this behavior).

Geospatial Indexes

Because you store geospatial data in XML markup within a document, you can query the content constraining on the geospatial XML markup. You can create geospatial indexes to speed up geospatial queries and to enable geospatial lexicon queries, allowing you to take full advantage of having the geospatial data marked up in your content. This section describes the different kinds of geospatial indexes and includes the following parts:

Different Kinds of Geospatial Indexes

Use the Admin Interface to create any of these indexes, under Database > database_name > Geospatial Indexes. The following sections describe how the geospatial data is structured for each of the four types of geospatial indexes, and also describes the geospatial positions option, which is available for each index.

Geospatial Element Indexes

With a geospatial element index, the geospatial data is represented by whitespace or punctuation (except +, -, or .) separated element content:

<element-name>37.52  -122.25</element-name>

For point format, the first entry represents the latitude coordinate, and the second entry represents the longitude coordinate. For long-lat-point format, the first entry represents the longitude coordinate and the second entry represents the latitude coordinate. You can also have other entries, but they are ignored (for example, KML has an additional altitude coordinate, which can be present but is ignored).

Geospatial Element Child Indexes

With a geospatial element child index, the geospatial data comes from whitespace or punctuation (except +, -, or .) separated element content, but only for elements that are a specific child of a specific element.

<element-name1>
  <element-name2>37.52  -122.25</element-name2>
</element-name1>

For point format, the first entry represents the latitude coordinate, and the second entry represents the longitude coordinate. For long-lat-point format, the first entry represents the longitude coordinate and the second entry represents the latitude coordinate.

Geospatial Element Pair Indexes

With a geospatial element pair index, the geospatial data comes from a specific pair of elements that are a child of another specific element.

<element-name>
  <latitude>37.52</latitude>
  <longitude>-122.25</longitude>
</element-name1>
Geospatial Attribute Pair Indexes

With a geospatial attribute pair index, the geospatial data comes from a pair of specific attributes of a specific element.

<element-name latitude="37.52" longitude="-122.25"/>
Geospatial Path Range Indexes

With a geospatial path range index, the geospatial data is expressed in the same manner as a geospatial element index and the element or attribute index is defined by a path expression.

For example, the data:

<a:data>
  <a:geo>37.52  -122.25</a:geo>
</a:data>

is indexed using the following path expression:

/a:data/a:geo

You might also express the geospatial data as an attribute. For example:

<a:data>
  <a:geo data="37.52  -122.25"/>
</a:data>

is indexed using the following path expression:

/a:data/a:geo/@data

Once you have created a geospatial path range index using the Admin Interface, you cannot change the path expression. Instead, you must remove the existing geospatial path range index and create a new one with the updated path expression.

Geospatial Index Positions

Each geospatial index has a positions option. The positions option speeds up queries that constrain the distance in the document between geospatial data in the document (using cts:near-query, for example). Additionally, when element positions are enabled in the database, it improves index resolution (more accurate estimates) for element queries that involve geospatial queries (with a geospatial index with positions enabled for the geospatial data).

Geospatial Lexicons

Geospatial indexes enable geospatial lexicon lookups. The lexicon lookups enable very fast retrieval of geospatial values. For details on geospatial lexicons, see Geospatial Lexicons.

Using the API

This section provides an overview of the Geospatial API, and includes the folloiwng parts:

Basic Procedure for Performing a Geospatial Query

Using the geospatial API is just like using any cts:query constructors, where you use the cts:query as the second parameter (or a building block of the second parameter) of cts:search. The basic procedure involves the following steps:

  1. Load geospatial data into a database.
  2. Create geospatial indexes (optional, speeds performance).
  3. Construct primitive types to use in geospatial cts:query constructors.
  4. Construct the geospatial queries using the geospatial primitive types.
  5. Use the geospatial queries in a cts:search operation.
You can also use the geospatial queries in cts:contains operations, regardless of whether the geospatial data is in the database.

Geospatial Value Constructors for Regions

The following APIs are used to construct geospatial regions. Use these functions with the geospatial cts:query constructors above to construct cts:queries.

For details on these functions, see the MarkLogic XQuery and XSLT Function Reference. These functions are complementary to the type constructors with the same names, which are described in XQuery Primitive Types And Constructors for Geospatial Queries.

Geospatial Format Conversion Functions

There are XQuery library modules to translate Metacarta, GML, KML, and GeoRSS formats to cts:box, cts:circle, cts:point, and cts:polygon formats. The functions in these libraries are designed to take geospatial data in these formats and construct cts:region primitive types to pass into the geospatial cts:query constructors and construct appropriate queries. For the signatures of these functions, see the XQuery Library Module section of the MarkLogic XQuery and XSLT Function Reference.

Geospatial Operations

The following APIs are used to perform various operations and calculations on geospatial data:

For their signatures and for more details on these functions, see the XQuery Library Module section of the MarkLogic XQuery and XSLT Function Reference.

Simple Geospatial Search Example

This section provides an example showing a cts:search that uses a geospatial query.

Assume a document with the URI /geo/zip_labels.xml with the following form:

<labels>
  <label id="1">96044</label>
  ...
  <label id="589">95616</label>
  <label id="712">95616</label>
  <label id="715">95616</label>
  ...
</labels>

Assume you have polygon data in a document with the URI /geo/zip.xml with the following form:

<polygon id="712">
       0.383337584506173E+02,       -0.121659014798844E+03
       0.383133840000000E+02,       -0.121656011000000E+03
       0.383135090000000E+02,       -0.121666647000000E+03
       0.383135090000000E+02,       -0.121666647000000E+03
       0.383135120000000E+02,       -0.121666875000000E+03
       0.383349030000000E+02,       -0.121667035000000E+03
       0.383353510000000E+02,       -0.121657355000000E+03
       0.383496550000000E+02,       -0.121656811000000E+03
       0.383495590000000E+02,       -0.121646955000000E+03
       0.383494950000000E+02,       -0.121645323000000E+03
       0.383473190000000E+02,       -0.121645691000000E+03
       0.383370790000000E+02,       -0.121650187000000E+03
       0.383133840000000E+02,       -0.121656011000000E+03
</polygon>

You can then take the contents of the polygon element and cast it to a cts:polygon using the cts:polygon constructor. For example, the following returns a cts:polygon for the above data:

cts:polygon(fn:data(fn:doc("/geo/zip.xml")//polygon[@id eq "712"]))

Further assume you have content of the following form:

<feature id="1703188" class="School">
  <name>Ralph Waldo Emerson Junior High School</name>
  <state id="06">CA</state>
  <county id="113">Yolo</county>
  <lat dms="383306N">38.5515731</lat>
  <long dms="1214639W">-121.7774624</long>
  <elevation>17</elevation>
  <map>Merritt</map>
</feature>

Now consider the following XQuery:

let $searchterms := ("school", "junior")
let $zip := "95616"
let $ziplabel := fn:doc("/geo/zip_labels.xml")//label[contains(.,$zip)]
let $polygons := 
   for $p in fn:doc("/geo/zip.xml")//polygon[@id=$ziplabel/@id]
   return cts:polygon(fn:data($p))
let $query := 
   cts:and-query((
       for $term in $searchterms return cts:word-query($term),
       cts:element-pair-geospatial-query(xs:QName("feature"), 
             xs:QName("lat"), xs:QName("long"), $polygons) ))
return  (
<h2>{fn:concat("Places with the term '", 
               fn:string-join($searchterms, "' and the term '"), 
               "' in the zipcode ", fn:data($zip), ":")}</h2>,
  <ol>{for $feature in cts:search(//feature, $query)
  order by $feature/name
  return (
  <li><h3>{fn:data($feature/name)," "}   
  <code>({fn:data($feature/lat)},{fn:data($feature/long)})</code></h3>
  <p>{fn:data($feature/@class)} in {fn:data($feature/county)},   
  {fn:data($feature/state)} from map {fn:data($feature/map)}</p></li> )
  }</ol> )

This returns results similar to the following (shown rendered in a browser):

« Previous chapter
Next chapter »