Backtesting
Evaluate new system prompt on historical data
Evaluate the performance of a new system prompt on historical data using the backtest endpoint.
This endpoint lets you call an LLM model on multiple messages in parallel, evaluate the results, and get a performance report.
Usage
To run a backtest, you need to:
- Log data to a phospho project
- Setup analytics
- Call the backtest endpoint
- Discover the results
Log data to a phospho project
To log data to a phospho project, you can use the phospho API, import a file to the platform, or use one of the phospho SDK.
Setup analytics in the project
In the phospho platform, go to the Analytics tab to setup the analytics you want to use in the backtest.
These analytics can be:
- Tagger: eg. topic of discussion
- Scorer: eg. sentiment
- Classifier: eg. user intent
Use the analytics to define how to evaluate the results.
Learn more about the available analytics in phospho here.
Call the backtest endpoint
The backtest endpoint does the following:
- Fetch the phospho project data using
filters
- Create a system prompt from the
system_prompt_template
andsystem_prompt_variables
- Call the LLM model specified in
provider_and_model
with theOPENAI_API_KEY
set in the request’ headers - Log back the results to the phospho project using the
version_id
How does the template work?
The system prompt is created using a template and variables.
For example, the template could be:
The implementation is as follows:
How to filter the data called during backtesting?
Use the optional filters
parameter to select which data to use in the backtest.
Results
Results are available in the AB tests tab of the phospho platform.
This tab lets you compare the analytics on the original version, with the candidate version generated by the backtest.
AB tests
Learn more about AB tests in phospho
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Headers
Body
The phospho project_id
The system prompt template. Templated variables can be passed in the format of {variable}.
The provider and model slug that will be used for the backtest.
The system prompt variables as a dictionary.
A nickname for the candidate version, currently testing.
The filters to be applied to the project data. If None, no filters are applied.
Response
The response is of type object
.
Was this page helpful?