AB testing lets you compare different versions of your app to see which one performs better.

What is AB testing

AB testing is a method used to compare two versions of a product to determine which performs better.

Comparing on a single criteria is hard, especially for LLM apps. Indeed, the performance of a product can be measured in many ways.

In phosho, the way AB testing is done is by comparing the analytics distribution of two versions: the candidate one and the control one.

Prerequisites to run an AB test

You need to have setup event detection in your project. This will run analytics to measure the performance of your app:

  • Tags: eg. topic of the conversation
  • Scores: eg. sentiment of the conversation (between 1 and 5)
  • Classifiers: eg. user intent (“buy”, “ask for help”, “complain”)

Run an AB test from the platform

  1. Click on the button “Create an AB test” on the phospho platform. If you want, customize the version_id, which is the name of the test.

  2. Send data to the platform by using an SDK, an integration, a file, or more. All new incomming messages will be tagged with the version_id.

Alternative: Specify the version_id in your code

Alternatively, you can specify the version_id in your code. This will override the version_id set in the platform.

When logging to phospho, add a field version_id with the name of your version in metadata. See the example below:

Run offline tests

If you want to run offline tests, you can use the phospho command line interface. Results of the offline tests are also available in the AB test tab.

phospho CLI

Learn more about the phospho command line interface