Evaluate your app’s performance before deploying it to production. Test your app offline with historical data or a custom dataset. Discover the testing results in the phospho dashboard.

Overview

The testing framework supports two types of testing:

  • Backtesting: Backtesting uses the logged phospho data to evaluate your app’s performance. This is great to quickly have representative data to test with.
  • Dataset: Dataset testing allows testing with a custom dataset, providing more control over input scenarios. It’s also useful when you don’t have a lot of historical data.

The output of the test is a report in the phospho dashboard. To set up what’s in the report, you can specify the metrics you want to evaluate. The two main metrics supported by the module are:

  • Evaluate: Evaluates the output of the app as success or failure based on the provided inputs (this is the same as phospho evaluation)
  • Compare: Compares the current output of the app with a reference output

Getting Started

Testing is meant to feel like running unit tests in your codebase. A good practice is to run the tests before deploying your app to production, for example in the CI/CD pipeline.

1

The easiest way to get started is to use the phospho Python module.

pip install --upgrade phospho
2

In your project code repository, create a new file called phospho_testing.py. This file will contain the test cases for your app.

3

The test cases are defined in phospho_testing.py as functions with the @phospho_test.test decorator.

phospho_testing.py
import phospho
from typing import Dict, List

# Create a new test setup
phospho_test = phospho.PhosphoTest()

# Define the test cases
@phospho_test.test(
    source_loader="backtest",
    source_loader_params={"sample_size": 10},
    metrics=["evaluate", "compare"],
)
def test_my_agent(messages: List[Dict[str, str]]):
    # Here, call your agent
    # new_ouput = call_my_agent(messages)
    # We will just return a str for now
    return "The response from the agent"

# Run the test
phospho_test.run(executor_type="parallel")
4

Run the tests with the following command:

python phospho_testing.py
5

Discover the test report in the Test tab in the phospho dashboard.

Customize your tests with more options

Read the full guide about testing in Python

View the report

The report can be viewed in the phospho dashboard. To view the report, go to the phospho dashboard and click on the Tests tab.