Skip to content

Using Rasa NLU

For better language coverage of your DDDs, you may want to enable the machine-learning based Rasa NLU.

This guide is written for version 0.14.6 of Rasa NLU.

Before getting started, make sure to use hosted a Rasa NLU with the necessary dependencies installed. For instance, if you plan on using a Spacy pipeline, ensure that it has the appropriate language models and Spacy itself installed.

For more information, read up on the Rasa NLU documentation. We recommend running it in Docker.

Generate training data

In order to use Rasa NLU with TDM, we need to train the model. The Tala SDK can be used to generate training data for your DDD: tala generate rasa my-ddd eng > training_data.yml.

Configure the pipeline

The generated training data comes with the spacy_sklearn pipeline by default. At the head of the training data we find:

language: "en"

pipeline: "spacy_sklearn"

data: |
...

Here, the pre-configured spacy_sklearn pipeline will be used, but there are others to choose from too.

It's also possible to configure the pipeline oneself, by listing the components explicitly. For instance, this is the spacy_sklearn pipeline:

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
- name: "SpacyFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"

Add pre-trained named entity recognizers (NERs)

Rasa NLU supports pre-trained NERs to be part of the pipeline, for instance the NERs from Duckling and Spacy which can be used together with TDM.

Duckling

In this version of TDM, the following Duckling entities are supported:

  • number: maps to the integer sort.
  • time: maps to the datetime sort.

To enable Duckling, make sure it's available to the Rasa server and add its component to an explicit pipeline:

- name: "DucklingHTTPExtractor"
  url: "http://duckling:8000"

Here, Duckling is available to the Rasa server at http://duckling:8000. The spacy_sklearn pipeline with the addition of Duckling then becomes:

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
- name: "SpacyFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
- name: "DucklingHTTPExtractor"
  url: "http://duckling:8000"

Spacy

In this version of TDM, the following Spacy entity types are supported:

  • PERSON and PER: map to the person_name sort.

To enable Spacy, make sure it's available to the Rasa server and use a pipeline that contains SpacyEntityExtractor, e.g.

pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
- name: "SpacyFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
- name: "SpacyEntityExtractor"

Train the model

Once the training data and pipeline are configured, train your model according to the Rasa NLU HTTP API.

For instance with:

curl -XPOST -H 'Content-Type: application/x-yml' 'http://my-rasa-nlu.my-cloud.com:5000/train?project=my-ddd&model=my-model' --data-binary @training_data.yml

In this case, the URL, project and model also need to be specified in the DDD config in the next step.

Configure the DDD

Make sure to configure Rasa NLU in the DDD config, for instance at my_ddd/ddd.config.json, by adding language specific rasa_nlu objects. For instance, for English:

{
    "rasa_nlu": {
        "eng": {
            "url": "http://my-rasa-nlu.my-cloud.com:5000/parse",
            "config": {
                "project": "my-ddd",
                "model": "my-model"
            }
        }
    }
}

The rasa_nlu object contains the following fields:

If Rasa NLU should not be used for a particular language, remove the language altogether:

{
    "rasa_nlu": {}
}

Last update: June 5, 2020