Skip to main content

Getting Started with Optic

To Find a Phrase within Documents in a Collection

So, using fromSearchDocs(), we could have found our full-time employees by diving straight into our documents.

An Optic query like this one finds a case-, punctuation-, and whitespace-insensitive phrase anywhere within a document of a specified collection:

op.fromSearchDocs(
  cts.andQuery([
    cts.collectionQuery('https://example.com/content/employee'),
    cts.wordQuery('full time')
  ]))
  .offsetLimit(0, 100)
  .result();

We used this query to retrieve a 3-column row sequence of all data from each employee-collection document containing some form of the phrase "full time", along with each document's URI and score, limited to 100 results:

  • The Data Accessor Function fromSearchDocs() pulls data from documents matching the cts.collectionQuery() parameter and narrowed down by other parameters into a row sequence with a unique row of these 3 columns for each matching document:

    • uri: Contains the document URI.

    • doc: Contains the document itself.

    • score: Contains the document’s search score, a measure of how relevant this result is with respect to other results. The higher the score, the higher the relevance.

  • The CTS Function cts.andQuery() returns the intersection of documents matching each of its CTS-type parameters:

    • Its first parameter, cts.collectionQuery(), finds all the data from documents in the specified collection: our employee collection.

    • Its second parameter, cts.wordQuery(), finds the word or phrase provided: full time.

    • So, cts.andQuery() returns all the data from documents from our employee collection that contain the phrase full time.

  • The Operator Function offsetLimit() restricts results returned. The first parameter specifies the number of results to skip; the second, the number of results to return. So, (0, 100) returns the first 100 results.

  • The Executor Function result() executes the query and returns the results as a row sequence.

Here is row 1 of the 100-row x 3-column result:

{
 "uri": "/data/employees/5899d871-1261-4057-ab3e-7fea1577ba61.json", 
 "doc": {
  "GUID": "5899d871-1261-4057-ab3e-7fea1577ba61", 
  "Gender": "male", 
  "Title": "Mr.", 
  "GivenName": "Scott", 
  "MiddleInitial": "M", 
  "Surname": "Schaaf", 
  "StreetAddress": "3586 Paradise Lane", 
  "City": "Pomona", 
  "State": "CA", 
  "ZipCode": "91766", 
  "Country": "US", 
  "EmailAddress": "ScottMSchaaf@rhyta.com", 
  "TelephoneNumber": "909-629-3047", 
  "TelephoneCountryCode": "1", 
  "Birthday": "9/25/45", 
  "NationalID": "561-42-6126", 
  "BaseSalary": "79460", 
  "Bonus": "7946", 
  "Department": "Engineering", 
  "Status": "Active - Regular Exempt (Full-time)",               // Found!             
  "ManagerGUID": "3ad0ffbc-3ade-4897-902b-718417a721f5", 
  "point": {
   "lat": 34.014225, 
   "long": -117.843894
  }, 
  "HiredDate": "2021-11-19"
 }, 
 "score": 2048
}
  • This query returned the first 100 results as we specified in offsetLimit().

  • Only one result will be returned per document no matter how many times the phrase occurs within a particular document.

  • A common practice is to add orderBy(op.desc(score)) to order by score from most to least relevant result.