POST
/
run
/
backtest

Evaluate the performance of a new system prompt on historical data using the backtest endpoint.

This endpoint lets you call an LLM model on multiple messages in parallel, evaluate the results, and get a performance report.

Usage

To run a backtest, you need to:

  1. Log data to a phospho project
  2. Setup analytics
  3. Call the backtest endpoint
  4. Discover the results

Log data to a phospho project

To log data to a phospho project, you can use the phospho API, import a file to the platform, or use one of the phospho SDK.

Setup analytics in the project

In the phospho platform, go to the Analytics tab to setup the analytics you want to use in the backtest.

These analytics can be:

  • Tagger: eg. topic of discussion
  • Scorer: eg. sentiment
  • Classifier: eg. user intent

Use the analytics to define how to evaluate the results.

Learn more about the available analytics in phospho here.

Call the backtest endpoint

The backtest endpoint does the following:

  1. Fetch the phospho project data using filters
  2. Create a system prompt from the system_prompt_template and system_prompt_variables
  3. Call the LLM model specified in provider_and_model with the OPENAI_API_KEY set in the request’ headers
  4. Log back the results to the phospho project using the version_id

How does the template work?

The system prompt is created using a template and variables.

For example, the template could be:

system_prompt_template = "The user is asking about {topic}. The user is {username}.""
system_prompt_variables = {
    "topic": "the weather",
    "username": "John"
}

The implementation is as follows:

async def run_model(message: phospho.lab.Message) -> Optional[str]:
    """
    This is what the backtest endpoint does:
    """
    # Create a system prompt from the template and variables
    system_prompt = system_prompt_template.format(**system_prompt_variables)
    # Call the LLM model
    response = await client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": message.role, "content": message.content},
        ],
    )
    response_text = response.choices[0].message.content

How to filter the data called during backtesting?

Use the optional filters parameter to select which data to use in the backtest.

Results

Results are available in the AB tests tab of the phospho platform.

This tab lets you compare the analytics on the original version, with the candidate version generated by the backtest.

AB tests

Learn more about AB tests in phospho

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

openai-api-key
string
required

Body

application/json
project_id
string
required

The phospho project_id

system_prompt_template
string
required

The system prompt template. Templated variables can be passed in the format of {variable}.

system_prompt_variables
object | null

The system prompt variables as a dictionary.

provider_and_model
string
required

The provider and model slug that will be used for the backtest.

version_id
string

A nickname for the candidate version, currently testing.

filters
object | null

The filters to be applied to the project data. If None, no filters are applied.

Response

200 - application/json

The response is of type object.