Behaviour-driven development (BDD) of an Alexa Skill with Cucumber.js – Part 1

No Comments

In this blog article we will learn how to do behaviour-driven development (BDD) of an Alexa Skill utilizing the Cucumber.js framework.

Why do you even want to do this?

I didn’t even intend to do a blog article on BDD of Alexa Skills. I just wanted to get into voice UIs and learn more about the current state of the art. I must admit: I even tried to skip writing tests altogether. But I quickly learned that there are just too many things that could possibly go wrong, and the turnaround times from writing code to seeing whether it actually works are just too long:

  • define the voice UI
  • write the code locally
  • build the code
  • upload your code
  • start the Alexa Skill in the Alexa Simulator or on your Echo device
  • Alexa reports an error
  • search for the log file
  • realize you forgot to register your new intent handler
  • fix your code and restart the loop

Imagine we are building an Alexa Skill to write down points in a game of dice. What if we could define our voice interaction upfront like this…

 Scenario: A new game can be started, the number of players is stored
    Given the user has opened the skill
    When the user says: Start a new game with 4 players
    Then Alexa replies with: Okay, I started a new game for 4 players
    When the user says: How many players do I have?
    Then Alexa replies with: You are playing with 4 players

…and then be able to actually run this as an automated acceptance test?

This is exactly what we are going to do in this blog entry.

About Cucumber.js

Cucumber.js is the JavaScript version Cucumber, a framwork for running automated acceptance tests. There are already a lot of good introductions for Cucumber, so I will keep this part short. If you are new to Cucumber, I recommend reading their very good introduction first.

Cucumber allows us to define test scenarios in a text file.

Each test scenario starts with the keyword Scenario, followed by the name of the test.

The test itself consists of different steps. Each step has to be started with one of these keywords:

  • Given: Defines the initial context of the system prior to our test. In the example above, we assume that the user has already opened our Alexa Skill.
  • When: Describes an action taken in the test. In our case, this could be the user saying something to their echo device.
  • Then: Specifies the expected outcome of the prior steps. For us, this would typically be the reply Alexa gave to the user.

There is more to it, but this is a very short description of how a Cucumber test might be structured.

But out of the box Cucumber has no clue how to execute these steps. That’s something we need to do, and that’s what this article is about.

Ingredients of an Alexa Skill

To be able to implement the Cucumber steps, we first need to look at the different parts of an Alexa Skill. To create a new skill, we have to create at least two artifacts:

  • The first part is the Voice Interaction Model. This model defines all commands (intents) a user can trigger in our skill and which speech input (utterances) will trigger which intent.
  • The second part is the code handling these intents. Typically, this code will receive an intent request and will answer with a voice reply (though there are other possible results). It could be any webservice, but Amazon makes our life easier if we use an AWS Lambda for this purpose (which is what we will do here).

So a user interaction is processed as follows:
Processing of a user interaction with an Alexa Skill

  • The user talks to their Echo device.
  • From the device the speech input is sent to the Alexa Voice Service.
  • The Alexa Voice Service uses our Voice Interaction Model to find a fitting intent and (if one is found) creates a proper intent request.
  • Our Lambda function receives the intent request. Executes the appropriate intent handler which generates a response sent back to the device

Our voice interaction model

Before we take care of the Alexa Voice Service in our tests, let’s take a look at our interaction model. We will need this locally for the tests, so we first make a local copy:

Open the Alexa Developer Console, navigate to the Alexa Skill and open the JSON editor. From there you can copy the content and create a local version of the voice interaction model.

The JSON editor of the voice interaction model

The code of this blog article is available in this GitLab repository. The example is localized to English and German, so there are actually two voice interaction models: the English one (interactionModel.en.json) and the German one (interactionModel.de.json).

Our skill for the blog entry has two intents: one to start a new game (remember: it’s about a fictional game of dice) and one to ask how many players there are.

The English voice interaction model looks like this:

{
    "interactionModel": {
    "languageModel": {
        "invocationName": "five of a kind",
            "intents": [
            {
                "name": "AMAZON.FallbackIntent",
                "samples": []
            },
            {
                "name": "AMAZON.CancelIntent",
                "samples": []
            },
            {
                "name": "AMAZON.HelpIntent",
                "samples": []
            },
            {
                "name": "AMAZON.StopIntent",
                "samples": []
            },
            {
                "name": "AMAZON.NavigateHomeIntent",
                "samples": []
            },
            {
                "name": "starteSpiel",
                "slots": [
                    {
                        "name": "spieleranzahl",
                        "type": "AMAZON.NUMBER"
                    },
                    {
                        "name": "players",
                        "type": "playersSlot"
                    }
                ],
                "samples": [
                    "Start a new game with {spieleranzahl} {players}"
                ]
            },
            {
                "name": "wieVieleSpieler",
                "slots": [],
                "samples": [
                    "How many players do we have",
                    "How many players do I have"
                ]
            }
        ],
            "types": [
            {
                "name": "playersSlot",
                "values": [
                    {
                        "name": {
                            "value": "persons"
                        }
                    },
                    {
                        "name": {
                            "value": "people"
                        }
                    },
                    {
                        "name": {
                            "value": "players"
                        }
                    }
                ]
            }
        ]
    }
}
}

We see some default AMAZON intents (like Cancel or Stop) and our game intents starteSpiel (startGame) and wieVieleSpieler. Intent names and slot names are in German, because that is the voice interaction model I started with and localizing the internal values is probably not a good idea (note to myself: start with the English blog next time).

The starteSpiel intent can be triggered with:

Start a new game with {spieleranzahl} {players}

There are two slots (the values in curly braces):

  • spieleranzahl (number of players) of type AMAZON.number and
  • players (in English, because it is specific to the English voice interaction model) of type playersSlot.

The type playersSlot is a user-defined slot type which could be replaced by the values “players”, “persons” or “people”. We don’t actually care which of these words the user uses to address players, we just want to support a large variety of different voice inputs.

The mock voice service

To be able to run our acceptance tests locally, we have to replace Amazon’s Alexa Voice Service with a local mock voice service.

Interaction flow in Cucumber tests with our mock voice service

A generic When the user says step

Of course we could create Cucumber steps specifically for each individual intent, however, we strive for a more generic approach. We will create a generic When the user says step and the mock voice service will decide which intent should be triggered.

To do this, our mock voice service reads the voice interaction model and stores all possible utterances our Alexa Skill understands, and for each utterance the associated intent.

For our Cucumber step, we have to decide whether the given phrase from the test step matches any of our utterances. Therefore, we convert each utterance to a regular expression. For me, regular expressions are always a bit scary, because I seldom use them and therefore find them hard to read. But this is pretty simple, I promise. It’s best described with an example:

The utterance:

Start a new game with {spieleranzahl} {players}

is converted to the following regular expression:

^Start a new game with (.*) (players|persons|people)$

While converting the utterance from the voice interaction model to a regular expression, we did the following:

  • We added a ^ at the beginning and a $ at the end, so our regular expression will only match on input with exact this beginnig and end, we don’t match on substrings.
  • Additionally, we replaced all predefined slots (of a type beginning with AMAZON.) with the expression: (.*), matching any input.
  • And finally we replace all user-defined slots with an expression matching all possible values (in the given example: “players”, “persons” or “people”).

To execute a test, we only have to match the speech input from the test step against all regular expressions generated from our voice interaction model. If we have a match, we know the intent we have to call and the values used for the different slots. With this information, we can build our own IntentRequest which is then given to the Lambda function that is being tested.

To build the IntentRequest (or any other request, like the LaunchRequest), there is a folder with request templates in the GitLab project. These templates are read during test execution and intent names and slot values are replaced with the values we determined before.

The implementation of the When the user says step looks like this:

async function theUserSays(utterance, locale: string) {
    const voiceUiModel = getVoiceUiModel(locale);
 
    const allUtterances = getAllUtterances(voiceUiModel.interactionModel);
 
    const matchingIntent: ?IntentInvocation = findMatchingIntent(allUtterances, utterance);
 
    expect(allUtterances).toHaveMatchingIntentFor(utterance);
    if (!matchingIntent) return;
 
    const slots = matchingIntent && matchingIntent.slots.reduce((acc, cur: Slot) => ({
        ...acc,
        [cur.name]: {
            name: [cur.name],
            value: cur.value,
            confirmationStatus: 'NONE',
            source: 'USER'
        }
    }), {});
 
    const json = fs.readFileSync('src/test/mockVoiceService/requestJsonTemplates/intentRequest.json', 'utf-8');
    const intentRequest = JSON.parse(json);
 
    intentRequest.request.intent.name = matchingIntent.intentName;
    intentRequest.request.intent.slots = slots;
    intentRequest.request.locale = locale;
 
    this.lastRequest = intentRequest;
 
    await executeRequest(this, this.skill, intentRequest);
}
 
When(/^der Anwender sagt[:]? (.*)$/, async function(utterance) {
    await theUserSays.call(this, utterance, 'de');
});
 
When(/^the user says[:]? (.*)$/, async function(utterance) {
    await theUserSays.call(this, utterance, 'en');
});

The function theUserSays does all the heavy lifting. But it is not directly the Cucumber.js step implementation. Actually, two steps are implemented, one for the German and one for the English language. Since the skill is localized, it makes sense that we also localize our acceptance tests. Both step definitions just call the theUserSays function and set locale parameter.

In the theUserSays function, we start with getAllUtterances, which will get us all utterances of our Alexa Skill as regular expressions according to the description above.

The function findMatchingIntent will return the matching intent and all slot values.

expect(allUtterances).toHaveMatchingIntentFor(utterance) is a custom matcher (we use the expect package from jest here) to make sure we actually have a match, otherwise the test will fail.

In the following reduce, we build a dictionary object with all given slot values (as expected by the skill request).

Finally we execute the built request with executeRequest.

Checking the response of our Alexa Skill

So far, we can successfully call the Lambda function of our skill from within our test. But we also need to look at the response to be able to check whether it meets our expectations.

Let’s take a look at the executeRequest method first:

async function executeRequest(world, skill, request) {
    return new Promise((resolve) => {
        skill(request, {}, (error, result) => {
            world.lastError = error;
            world.lastResult = result;
            resolve();
        });
    });
}

The executeRequest function is a Promise which will be resolved as soon as we got a reply from our skill. This reply is given by calling the callback function we provide as the third parameter. The callback function receives two parameters. Depending on the success of the call, either the parameter error or the parameter result is set. Either way, we store both values in the global world object. This world object is where we store the current state of our test and Cucumber.js will take care of providing each Cucumber step implementation with this world object.

So to implement an Alexa replies with step is now actually quite easy:

function alexaReplies(expectedResponse) {
    expect(this.lastResult.response.outputSpeech.ssml).toEqual(`<speak>${expectedResponse}</speak>`);
}
Then(/^antwortet Alexa mit[:]? (.*)$/, alexaReplies);
Then(/^Alexa replies with[:]? (.*)$/, alexaReplies);

Again, the main implementation is an extra function called by Cucumber steps for each language. The expectation is using the this reference to get access to the world object, which is Cucumber.js’s way of providing us with the test context (that’s the reason why we don’t use arrow functions for Cucumber steps since arrow functions have a different way of handling the this keyword).

Since we expect speech output in this step, we know this is given in the outputSpeech attribute of the response object.

Conclusion and outlook

With the described technique we already have a good base framework to work with Cucumber.js acceptance tests against our Alexa Skill. Adding more Cucumber steps for slot confirmation or screen output is very straightforward and you can see more implemented steps in the linked GitLab project.

Handling attributes for storing values (per request, session, or permanent) is something we’ll look into in part 2 of this blog series. Until then I’m happy to hear your thoughts and feedback in the comments below.

Stefan Spittank

Stefan joined codecentric in 2016 and works from the office in Solingen.
Creating usable applications and optimizing the user experience is his daily business.

Comment

Your email address will not be published. Required fields are marked *