C++ representation of the MarkLogic token. More...

#include <MarkLogic.h>

Public Types
typedef unsigned char	TokenType
	The kind of token being returned. More...

typedef unsigned char	PartOfSpeech
	The part of speech of the token being returned. Part of speech is only relevant for word tokens. More...

Public Member Functions
	Token (unsigned _begin, unsigned _end, TokenType _type, PartOfSpeech _pos)

	Token (unsigned _begin, unsigned _end, TokenType _type)

	Token (const Token &t)

Public Attributes
unsigned	begin

unsigned	end

TokenType	type

PartOfSpeech	pos

Static Public Attributes
static const TokenType	SPACE = 's'

static const TokenType	PUNCT = 'p'

static const TokenType	WORD = 'w'

static const TokenType	SPECIAL = 'x'

static const PartOfSpeech	UNSPECIFIED_POS = '\0'

static const PartOfSpeech	NOUN_POS = 'n'

static const PartOfSpeech	VERB_POS = 'v'

static const PartOfSpeech	ADJECTIVE_POS = 'a'

static const PartOfSpeech	ADVERB_POS = 'r'

static const PartOfSpeech	PRONOUN_POS = 'p'

static const PartOfSpeech	CONJUNCTION_POS = 'c'

static const PartOfSpeech	DETERMINER_POS = 'd'

static const PartOfSpeech	MISC_POS = '?'

Detailed Description

C++ representation of the MarkLogic token.

The offsets point to the position in the codepoint array of the first codepoint in the token and the codepoint following the last codepoint in the token.

Member Typedef Documentation

◆ PartOfSpeech

typedef unsigned char marklogic::Token::PartOfSpeech

The part of speech of the token being returned. Part of speech is only relevant for word tokens.

LexerUDF implementations may return tokens with the followings parts of speech. Returning UNSPECIFIED_POS for every token reduces the storage overhead and processing time. Implementations are free to return other unsigned char values in the range 0x01 to 0x7F as well as any of the values listed here, but the high bit is reserved.

UNSPECIFIED_POS: part of speech is not reported NOUN_POS: a noun, such as "laptop" VERB_POS: a verb, such as "ate" ADJECTIVE_POS: an adjective, such as "green" ADVERB_POS: an adverb, such as "rapidly" PRONOUN_POS: a pronoun, such as "her" CONJUNCTION_POS: a conjunction, such as "and" DETERMINER_POS: a determiner, such as "the" MISC_POS: some other miscellaneous part of speech

◆ TokenType

typedef unsigned char marklogic::Token::TokenType

The kind of token being returned.

LexerUDF implementations will return tokens of the following types.

SPACE: a whitespace token, will not be indexed PUNCT: a punctuation token, will not be indexed WORD: a separate word in the current language, will be indexed SPECIAL: some other kind of token, will be indexed

The documentation for this class was generated from the following file:

MarkLogic.h

Public Types

Public Member Functions

Public Attributes

Static Public Attributes

Detailed Description

Member Typedef Documentation

◆ PartOfSpeech

◆ TokenType