Content Recognizers

Overview

Content Recognizers are responsible for recognizing patterns in answer queries so that Triggers know whether or not some kind of content is in a answer query, and so that Answer Generators have the information they need to output answer results.

What they do

Content Recognizers take as input a user's answer query and the Query Context and output recognition results in the form of key-value pairs. For example, suppose that a user selects the words San Diego is sunny on the page http://blah.com.

Input

The following information is available to Content Recognizers:

Name Example Value Description
q San Diego is sunny The user's answer query, with activation codes and options (like --debug) removed, truncated to 80 characters, and trimmed of leading and trailing whitespace
context { "user" : { ... }, "pageUri" : "http://blah.com", "location" : { ... }, ... } See Query Context

Output

Content Recognizers output a dictionary of < recognition_key, list_of_recognition_results > pairs. Recognition keys should identify the type of content being recognized, and are using written in namespaced format. In our example, we would use the key com.solveforall.recognition.location.UsAddress.

Each recognition result may have the following fields:

Name Example Value Description
matchedText San Diego The portion of the answer query that the Content Recognizer recognized.
localRecognitionLevel 1.0 A number between 0.0 and 1.0 indicating how strongly matchedText matches, with 1.0 meaning the strongest match.
recognitionLevel 1.0 A number between 0.0 and 1.0 indicating how strongly the entire answer query matched, with 1.0 meaning the strongest match. The recognition level is used by Simple Triggers to decide whether or not to activate Answer Generators.

Plus, each recognition result may contain additional fields. In in our example we would have:

Name Example Value Description
city San Diego The Content Recognizer presumably has the knowledge that San Diego is the name of a city.
state CA The Content Recognizer presumably has the knowledge that San Diego is a city in California
country US The Content Recognizer presumably has the knowledge that San Diego is a city in the United States.

For our example above, the Content Recognizer would output the following structure (in JSON format):

{
  "com.solveforall.recognition.location.UsAddress" : [
    {
      "matchedText" : "San Diego",
      "recognitionLevel" : 1.0,
      "city" : "San Diego",
      "state" : "CA",
      "country" : "US"
    }
  ]
}

Kinds of Content Recognizers

Developers can create Content Recognizers implemented in one of two ways:

Either kind of Content Recognizer can be created on the Content Recognizer creation form.

Regex Content Recognizers

Regex Content Recognizers use regular expressions to match answer queries. The basics of regular expressions are easy to master, so you may be able to create Regex Content Recognizers even if you don't know how to program! They output a dictionary with a single recognition key mapped to a list of recognitions results that contain the matched groups for matches found in the answer query.

In addition to the common plugin properties, Regex Content Recognizers have the following properties:

Name Example Value Description
Recognition Key com.solveforall.recognition.location.UsAddress The single recognition key in the output dictionary
Local Recognition level 0.75 The local recognition level of the recognition results. The (global) recognition level is computed by multiplying the local recognition level by the factor length(matched text) / length(entire query).
Regex (\d{5})(?:-(\d{4}))? The regular expression used to match the answer query. Must conform to the supported syntax of RE2.
Group Names zipCode,fiveDigitZip,plusFour A comma-separated list of keys to substitute for group numbers in output keys of recognition results. The first key corresponds to group 0, which is the entire match. The second key corresponds to group 1, which is the contents inside the first matching parentheses. The third key corresponds to group 2, and so on.

Given the answer query 92126-6221 08043, a Regex Content Recognizer configured as above would output a structure like this (in JSON notation):

{
  "com.solveforall.recognition.location.UsAddress" : [
    {
      "matchedText" : "92126-6221",
      "localRecognitionLevel" : 0.46875,
      "recognitionLevel" : 0.75,
      "zipCode" : "92126-6221",
      "fiveDigitZip" : "92126",
      "plusFour" : "6221"
    },
    {
      "matchedText" : "08043",
      "localRecognitionLevel" : 0.234375,
      "recognitionLevel" : 0.75,
      "zipCode" : "08043",
      "fiveDigitZip" : "08043",
      "plusFour" : null
    }
  ]
}

Javascript Content Recognizers

If regular expressions are too limiting for you and you know how to program in Javascript, you can create Content Recognizers backed by Javascript. See Javascript execution for information about the execution environment.

To create a Javascript-backed Content Recognizer, you first decide which recognition key to use. Then just create a file, say recognizer.js that implements the following function:

function recognize(q, context) {
  ...
}
in the top level scope, where
  • q is the answer query as a String
  • context is the Query Context, as a dictionary Object
  • The return value should be an Object (containing no functions, convertible to JSON) that has the same structure as the output in the examples above.

For example, here's an implementation that looks for years that are not in the future:

function recognize(q, context) {
  // No need to handle exceptions, if the script fails, Solve for All will consider
  // that nothing was recognized.
  const year = parseInt(q);
  if ((year >= 0) && (year <= new Date().getFullYear()) {
    return {
      "com.solveforall.recognition.Date" : [
        {
          "matchedText" : '' + year, // in case parseInt() ignored trailing text
          "localRecognitionLevel" : 1.0,
          "year" : '' + year
        }
      ]
    };
  }
  // null means nothing was recognized
  return null;
}

If you don't specify recognitionLevel in a recognition result you output, it will be computed from localRecognitionLevel if it is available. Similarly, if you don't specify localRecognitionLevel in a recognition result you output, it will be computed from recognitionLevel if it is available. The computation uses the heuristic:

  recognitionLevel = localRecognitionLevel * matchedText.length / entireQuery.length

Once you have written this file, follow these steps to publish your Javascript Content Recognizer:

Your new Content Recognizer should become available immediately. Enjoy!

Next Steps

The results of each individual Content Recognizer are reduced into Combined Content Recognition Results, which then are available for Triggers and Answer Generators.