How We Test Step Functions At Choco

QA Engineer Anaïs on testing Step Functions using AWS Step Functions Local, Docker, Testcontainers & Jest.


In the first part of our blog series, Alex and Oscar explained how we use AWS Step Functions to create a file processing system at Choco. Step Functions is a deployed AWS orchestration service that has dependencies on other AWS services, which presents testing challenges.

What challenges do we face when testing Step Functions? How can AWS Step Functions Local help, and what are its corresponding benefits and limitations? In this article, we’ll be exploring these questions based on our experience testing our file processing service.

The Complexity of Step Functions

To determine an effective and efficient test strategy, we first need to understand how Step Functions work. In part one, we described how AWS Step Functions help to build workflows.

A workflow is a state machine. Each step is a state and can perform a variety of functions using an AWS service, like a Lambda Function. The result of a state defines the next step and, ultimately, the path of the workflow.

If we look at the diagram below of the Step Function created for our file processing system, we see one scenario when the file check fails (follow the green tasks). In addition, we see many other scenarios that the workflow might encounter. This characterizes the complexity of a Step Function.


What to verify with a Step Function test?

There are different types of testing that can be performed on AWS Step Functions. Various outcomes for a given input of an individual state (Lambda functions for example) are verified with unit tests. This level of testing occurs at the Lambda level, a level ‘deeper’ than the Step Function. A test on Step Function level involves verifying the control flow as a whole, ensuring that all the steps work together seamlessly.

If we look at the complexity of Step Function, it boils down to the following aspects that we want to verify:

  • Branching logic Verify different possible paths in the workflow, both happy paths and edge cases. Focus on the outcome of the Step Function in the form of data and the state. This also includes parallel execution in a map state that can impact the course of the workflow.
  • Side effects of the Step Function flow The execution of a Step Function might lead to a new record in a DynamoDB or message published to an SNS topic or SQS queue.
  • Error handling Errors can take place in different scopes; at step and step function level (like a Lambda transient error). Define test cases based on the likelihood and impact of these errors, ensuring that your workflow can handle unexpected events. Also take performance into consideration. For example, what if a part of the workflow is executed ten times in parallel?
  • Retry functionality Every step in a Step Function workflow can be configured to retry on failures. If for example, one step is retried two times, you’ll need to verify that the execution can be successful after a second retry.

To understand the control flow and what exactly needs to be tested, it is helpful to become familiar with the states language for Step Functions.

Challenges of testing Step Functions

It is difficult to run and test an AWS Step Function locally because they often have many dependencies and transitive dependencies on other AWS cloud services. This means we first have to deploy the Step Function in order to run or test it.

Before executing the Step Function via the AWS console, we need the right input to verify the desired path in the workflow. It can take quite some effort to set up the right preconditions for a test case. Especially if you want to simulate edge cases and service integration failures, which might require some “hacky” adjustments in the configuration.

All of this results in a slow feedback loop and increases the cost of testing and delivery. The impact and importance of these challenges increase, as the use and complexity of AWS Step Functions within Choco increase.

Introducing AWS Step Functions Local

AWS Step Function Local makes it possible to run and test AWS Step functions locally via a Docker container or Java executable. Because we don’t work with Java, it made the most sense for us to use Docker container. During the execution of a Step Function, AWS services are called with every individual step.

There are three ways of invoking these AWS services, which impact the context of the test:

  • Make calls to actual AWS services
  • Make calls to other local AWS services (using AWS SAM Local)
  • Mock responses of AWS services

We chose to isolate the context of the Step Function test by mocking the responses of each involved AWS service. When a mock configuration file is put on the right file path in the AWS Step Function Local container, it will automatically use the mocked responses instead of making actual service integration calls during the test. The mock file contains the mocked responses that are defined per test case.

How to create a Step Functions test

Jest is our test runner, which means that our Step Function test is structured according to the Jest lifecycle. In order to interact with a local version of the AWS Step Function, we first need to start a Docker container.

This is set up in the beforeAll, which executes before the tests of one Jest test file are run. Then we start a state machine in that container. During the test, we connect to the state machine to execute the Step Function for a given input event. The output and state of the state machine are extracted for our assertions.

The diagram below shows the complete lifecycle of the Step function test.


Let’s look at how to do this step by step:

1. Run a Docker client

Get an application like Podman or Docker Desktop (check whether you need a business license) to run Docker containers on your local machine.

2. Run the Docker container

There are several ways to run a Docker container. AWS describes how to pull and run the Docker image via the command line. These commands can be put in a script or a Dockerfile that runs before the test.

Another option is to start a Docker container programmatically within the context of the test. This is possible with Testcontainers; a tool originally developed for Java, but other languages are also supported, like Node, which is exactly what we need. It spins up the required Docker containers for the duration of the tests and tears down once the test execution has finished.

We chose this tool because it allows full control over the Docker container within the context of the test. You can start isolated instances of containers with a clean or known state, embedding these dependencies in the self-contained test. This enables parallel test execution and a faster feedback loop.

The configuration of the AWS Step Functions Local Docker container is impacted by the choice of how the dependent AWS services are invoked. Because we chose to mock the responses of all AWS services in the Step Function, we need a dummy AWS account (region, access key ID, secret access key), mount the mock file within the container and set an environment variable with the path to this mock file.

Testcontainers provide a GenericContainer object to compose a container based on the Docker image, files that can be mounted, environment variables, and other configurable options. It’s also possible to set up a wait strategy, in order to easily start testing when the Docker container is ready.

The example below shows the setup of the GenericContainer object with a dummy AWS account, mock file, and wait strategy based on the console output of the Docker container.

import { GenericContainer, Wait } from 'testcontainers';

export const exposedPort = 8083;
export const region = 'xx-yyyy-1';
export const accessKeyId = 'dummyAccessKey';
export const secretAccessKey = 'dummySecretAccessKey';
export const stepFunctionTestContainerName = 'StepFunctionTestContainer';

export const mockFileLocalPath = resolve(
export const mockFileContainerPath =

export const stepFunctionContainerFactory = (): GenericContainer => {
 return new GenericContainer('amazon/aws-stepfunctions-local')
 .withBindMount(mockFileLocalPath, mockFileContainerPath, 'ro')
 .withEnv('AWS_DEFAULT_REGION', region)
 .withEnv('AWS_ACCESS_KEY_ID', accessKeyId)
 .withEnv('AWS_SECRET_ACCESS_KEY', secretAccessKey)
 .withEnv('SFN_MOCK_CONFIG', mockFileContainerPath)
 Wait.forLogMessage(RegExp(`.*Starting server on port ${exposedPort}.*`))

Despite the fact we use a waiting strategy, we need to set a hard wait when starting the Docker container in order to avoid nondeterministic behavior. This appears to be a generic issue not yet addressed (more info Then we can retrieve the host and port from the Docker container.

The example below is a simplified part from our testSetup file:

const startedStepFunctionContainer = await stepFunctionContainerFactory().start();

// Avoid non deterministic behavior - reasoning:, 46:33
await new Promise((f) => setTimeout(f, 2000));

const host = startedStepFunctionContainer.getHost();
const port = startedStepFunctionContainer.getMappedPort(exposedPort);

3. Create a state machine in the container

Before we can create a state machine in the container, we need an instruction on how this state machine should be structured. This is a state machine definition in JSON format, which can be retrieved in different ways, where the first one is probably but regrettably the easiest.

    • Option 1: Go to AWS Console > Step Function > Search for the relevant step function > Copy the definition from the Definition tab (this can also be done programmatically using AWS-SDK)
    • Option 2: Create an AWS CloudFormation template with CDK synth, which generates a template.json in the directory cdk.out. Use the tool cdk-asl-extractor to generate a state machine definition JSON from the template.json.

    WARNING: If a state machine is filled with simple Pass states, CDK already synthesizes it into a definition string. When this occurs, this code breaks the entire cdk-asl-extractor process, more info. Workaround: use this definition string as state machine definition. Get it from the template.json under the key DefinitionString, and convert this string to a JSON.

    On the endpoint of the Docker container (based on host & port) we start a step function client, using the AWS-SDK library. Because we chose to mock the responses of all AWS services in the Step Function, a dummy role ARN will suffice. Let’s now start the state machine (the example below shows also a simplified part from our testSetup file)!

    After the creation of the state machine, we return the Docker container, step function client, and state machine ARN, in order to interact with the state machine during the test.

    import { SFN } from '@aws-sdk/client-sfn';
    export const dummyRoleArn = 'arn:aws:iam::123456789012:role/DummyRole';
    export const stateMachineName = 'MacStepFunctionTest';
    import stateMachineDefinition from '../assets/state-machine-definition.json';
    export const sfnClientFactory = ({ host, port }: SFNClientFactoryProps) => {
     return new SFN({
     credentials: {
     accessKeyId: accessKeyId,
     secretAccessKey: secretAccessKey,
     region: region,
     endpoint: `http://${host}:${port}`,
    const stateMachine = await sfnClient.createStateMachine({
     name: stateMachineName,
     roleArn: dummyRoleArn,
     definition: JSON.stringify(stateMachineDefinition),
    console.log('Container with step function is started 🚀');

    4. Run the test

    Given-When-Then is a useful way to structure test cases. It acts as a reminder to think about the precondition (Given), and the action (When) which determines the outcome of the test (Then).

    Because the structure of every step function test is the same, we created a parameterized Jest test, looking something like this:

    1. Given: Create an input event in JSON format to trigger the step function. This would be the same event as you would trigger the step function in the AWS console.
    2. When: Execute the step function based on the state machine ARN, mocked test case, step function client, and the input event. By polling and checking the status of the step function, we can wait for it to finish. Then we extract the state and output for evaluation.
    3. Then: Assert the state and output of the step function in JSON format.
     'should return the correct status and output when %p',
     async ({
     }) => {
     // GIVEN
     const event = JSON.stringify(sfnInput);
     // WHEN
     const responseStepFunction = await executeStepFunction({
     test: mockTestCaseName,
     const { status, output } = await waitForStepFunctionToFinish({
     executionArn: responseStepFunction.executionArn!,
     // THEN
     expect(JSON.parse(output ?? '')).toEqual(sfnExpectedOutput);

    How to mock for a Step Function Test?

    All AWS Step Function tasks that are invoked during the Step Function Flow have to be mocked.

    The diagram above indicates that only the Lambda and SNS/SQS blocks are included. For the green highlighted scenario where the file check fails, we needed to create three mocked responses. This also means that in an error path, only around three tasks are invoked, while in a full happy path, ten or more are invoked.

    💡 Tip: Start testing an error path to minimize the mocks that you have to create in the beginning. Expand your test cases from there.

    Mocking the involved AWS services during a Step Function test has to be very precise. If it is not exactly matched with the test case and structure of the real service, your test will fail.

    For us, it was quite a puzzle to find the matching mocks. Luckily, this came with a lot of learnings that we’re going to share with you.

    Pictured below is our mock file (in JSON) highlighting the matching. It consists of two parts StateMachines and MockedResponses:

    • StateMachines This contains the metadata of the mock file, starting with a state machine name which should exactly match the same-named property when starting a state machine (step 3). Next, the test cases (HappyPath, ErrorPathFileCheck, and ErrorPathFileChunk) are linked with the corresponding mocks. Note here that the keys of the mocks should exactly match the task names of the mocked AWS services, mentioned in the state machine definition. The name of the test case appends to the state machine ARN during runtime, which ensures that the correct mock is loaded.
    • MockedResponses

    It won’t come as a surprise that the mocked responses must also match exactly with the expected outcomes of the different tasks, depending on what kind of AWS service is used. AWS Lambda Functions should match the following structure:

    "0": {
     "Return": {
     "StatusCode": 200,
     "Payload": {
     ${AWS Lambda Payload}

    In order to determine what the actual AWS Lambda Payload is, check out the actual output in an AWS console.

    To do this, go to Step Functions, execute the step function for the flow you want to mock, then click on a task you can mock to find the actual output of that task.

    The structure of an AWS SQS/SNS mock is a bit simpler:

    "0": {
     "Return": {
     ${AWS SQS/SNS Message}

    It is also possible to mock Map states, this is AWS functionality that enables parallel execution in a Step Function. In that case, the same number of mocks must be defined as there are invoked.

    Imagine an Excel file split up into two chunks of twenty-five rows (fileChunk). We’d then need two mocks to load these chunks in parallel (chunkLoad):

    "FileChunkMock": {
     "0": {
     "Return": {
     "StatusCode": 200,
     "Payload": {
     "result": "Success",
     "data": {
     // 2 chunks
     "chunks": [
     "index": 0
     "index": 1
    "ChunkLoadMock": {
     // mock for first chunk
     "0": {
     "Return": {
     "StatusCode": 200,
     "Payload": {
     "result": "Success",
     // mock for second chunk
     "1": {
     "Return": {
     "StatusCode": 200,
     "Payload": {
     "result": "Success",

    More information and examples of mocks can be found in the official documentation, and this useful blog by Sam Dengler and Dhiraj Mahapatro.

    Debugging errors in the mock

    Errors due to mocking can be thrown quickly and can be hard to debug, some examples;

    • Error when reading the mock config file: Reading mock config file from /home/StepFunctionsLocal/MockConfigFile.json Error handling request because null [500] StartExecution <= null
      • Possible root cause
        • There is a typo in your MockConfigFile, f.e. the name of the MockedResponse doesn’t match the name in your test case
    • A typo in the mocked response
    • You forgot to mock an invocation NOTE: if you mock 3 invocations, and only the first 2 are consumed, NO ERRORS are thrown
    • Error unable to execute HTTP request: {"Type":"ExecutionFailed","PreviousEventId":5,"ExecutionFailedEventDetails":{"Error":"SNS.SdkClientException","Cause":"Unable to execute HTTP request: [](<>)"}}
      • Possible root cause
        • There is a typo in the AWS Step Function task name in mockConfigFile under StateMachines

    Looking back at the implementation to test a Step Function locally in isolation, we see both benefits and limitations.

    Benefits of the Step Function Test

    • Full control over the Step Function context. This decreases the amount of effort we put in to set up preconditions of a test case, like test data.
    • Synthesise service integration failures. This again increases our efficiency in setting up preconditions, because we can mock the error we want to test. It even enables us to test edge cases like transient errors, a situation we can’t simulate in the cloud.
    • Fast feedback loop. We don’t have to wait to test our changes until it is deployed in the cloud.
    • Allows us to build the expected outcome before deploying, running the step function locally, and setting up the mocks for the situations we want to implement.
    • Easy to create new mocks and tests, due to a similar structure.

    Limitations of the Step Function Test

    • The initial setup required high effort. Working with the Docker container adds complexity to logging and debugging errors. All moving parts together resulted in unexpected delays and errors to account for, like some typos in the mocked response that took a long time to pinpoint.
    • The developer experience is not optimal, due to some loose ends in the solution. There is no ideal automated solution to update the state machine definition, which now works best when updated manually using the AWS console. Also, the hard wait during the start-up of the Docker container in order to avoid non-deterministic behavior feels a bit wonky.
    • Mocking can be tricky, as it leads to generic errors when you make typos or the mocked responses don’t exactly match the expected outcomes.
    • The amount of mocked responses can blow up pretty quickly. For five tests, we already have five-hundred lines of code in our mockConfigFile.json


    AWS Step Functions unlock the ability to create serverless workflows, but it’s difficult to run and test locally. AWS Step Function Local is a way to speed up the feedback loop but there are some tradeoffs. Weigh out the benefits and limitations to find out whether this solution is suitable for your context.

    What are your experiences with AWS Step Function Local? Reach out to Anaïs or meet her at one of her upcoming speaking events (stay tuned!).

    We're hiring! Check out our open roles here!


    Ready to combine your love of tech with your passion for a healthy planet? Now's your chance. Join our mission!