Aang is an extensive, scalable, sophisticated natural language understanding (NLU) system built from scratch. Designed to enable developers to easily create custom, full-featured, fast, robust, and precise natural language interfaces (e.g., virtual assistants, chatbots, and natural language search engines) to integrate with their products.

The design and architecture are formidable, the code quality is supreme, and the documentation is fit for publication.

System Overview

Natural Language API

  1. First, a developer parameterizes types of objects, entities, actions, attributes, relationships, etc., that they want their interface to understand, as well as names for semantic functions they can recognize from the parser's output.
  2. The system uses a natural language API that allows developers to easily design custom natural language interfaces (NLIs) with these simple parameterizations.
  3. Internally, uses a linguistic framework that models fundamental linguistic components and structures at different levels of abstraction, with which the NLIs are constructed. Modeled as integrable building blocks to easily support new grammatical structures and components, new phrasings, new forms of grammatical conjugation, and new semantic structures.
  4. Integrates a semantic framework that uses lambda calculus to represent meaning within the grammar and parse graphs.

Grammar Generator

  1. A context-free grammar (CFG) generator integrates the linguistic framework with the natural language API.
  2. Outputs a CFG that automatically supports various phrasing according to the parametrization, including support for grammatical conjugation, associated semantic functions, lexical and morphological analysis, and ill-formed input (i.e., insertions, deletion, substitutions, and transpositions).
  3. The generator also performs extensive checks for errors, ambiguity, illogical semantics, grammatical errors, and more.

Parser

  1. Using the CFGs designed with the API, the parser outputs the k-best parse trees, semantic trees, and grammatically conjugated display-text for the given textual input.
  2. First, uses the CFG to generate a (Markov) state-transition table, which the parser employs as a precompiled LR(k) parsing table.
  3. Upon receiving input, the parser matches terminal symbols/phrases and performs lexical analysis, morphological analysis, and entity recognition.
  4. From the matched terminal symbols, the shift-reduce parser uses the state table as a Markov decision process (MDP) to generate a parsing stack, which references parsing states in the table. The parser then generalizes the stack into a graph as output. This output graph is a dense parse forest, which is a compact graph representation of all possible parse trees for the given input.
  5. An A* graph-search algorithm efficiently traverses the dense parse forests and calculates cost heuristics.
  6. A parse forest search algorithm efficiently finds the k-best unique and semantically valid parse trees.
  7. Each parse tree has an associated semantic tree (which maps to a lambda calculus semantic representation) and grammatically correct display-text (even if the input is ill-formed).