Use the phospho Python package to run custom analytics jobs on your logs.

Setup

Instal the package and set your API key and project ID as environment variables.

pip install phospho pandas
export PHOSPHO_API_KEY=your_api_key
export PHOSPHO_PROJECT_ID=your_project_id

Load logs as a DataFrame

The best way to analyze your logs is to load them into a pandas DataFrame. This format is compatible with most analytics libraries.

One row = one (task, event) pair

Phospho provides a tasks_df function to load the logs into a flattened DataFrame. Note that you need to have the pandas package installed to use this function.

import phospho 

phospho.init()
phospho.tasks_df(limit=1000) # Load the latest 1000 tasks

This will return a DataFrame where one row is one (task, event) pair.

Example:

task_idtask_inputtask_outputtask_metadatatask_evaltask_eval_sourcetask_eval_attask_created_atsession_idsession_lengthevent_nameevent_created_at
b58aacc6102f4a5e9d2364202ce23bf2Some inputSome output{‘client_created_at’: 1709925970, ‘last_update…successowner2024-03-08 19:27:492024-03-09 15:09:3171ee278ab2874666ae157c28a69c16792correction by user2024-03-08 19:27:43
b58aacc6102f4a5e9d2364202ce23bf2Some inputSome output{‘client_created_at’: 1709925970, ‘last_update…successowner2024-03-08 19:27:492024-03-09 15:09:3171ee278ab2874666ae157c28a69c16792user frustration indication2024-03-08 19:27:43
b58aacc6102f4a5e9d2364202ce23bf2Some inputSome output{‘client_created_at’: 1709925970, ‘last_update…successowner2024-03-08 19:27:492024-03-09 15:09:3171ee278ab2874666ae157c28a69c16792follow-up question2024-03-08 19:27:43

This means that:

  • If a task has multiple events, there will be multiple rows with the same task_id and different event_name.
  • If a task has no events, it will have one row with event_name as None.

One row = one task

If you want one row to be one task, pass the parameter with_events=False.

phospho.tasks_df(limit=1000, with_events=False)

Result:

task_idtask_inputtask_outputtask_metadatatask_evaltask_eval_sourcetask_eval_attask_created_atsession_idsession_length
21f3b21e8646402d930f1a02159e942fSome inputSome output{‘client_created_at’:42f’…failureowner2024-03-08 19:53:592024-03-09 16:45:18a6b1b4224f874608b6037d41d582286a2
64382c6093b04a028a97a14131a4ab32Some inputSome output{‘client_created_at’:42f’…successowner2024-03-08 19:27:482024-03-09 15:51:079d13562051a84d6c806d4e6f6a58fb371
b58aacc6102f4a5e9d2364202ce23bf2Some inputSome output{‘client_created_at’:42f’…successowner2024-03-08 19:27:492024-03-09 15:09:3171ee278ab2874666ae157c28a69c16793

Ignore session features

To ignore the sessions features, pass the parameter with_sessions=False.

phospho.tasks_df(limit=1000, with_sessions=False)

Run custom analytics jobs

To run custom analytics jobs, you can leverage all the power of the Python ecosystem.

If you have a lot of complex ML models to run and LLM calls to make, consider the phospho lab that streamlines some of the work for you.

Discover the phospho lab

Set up the phospho lab to run custom analytics jobs on your logs

Update logs from a DataFrame

After running your analytics jobs, you might want to update the logs with the results.

You can use the push_tasks_df function to push the updated data back to Phospho. This will override the specified fields in the logs.

# Fetch the 3 latest tasks
tasks_df = phospho.tasks_df(limit=3)

Update columns

Make changes to columns. Not all columns are updatable. This is to prevent accidental data loss.

Here is the list of updatable columns:

  • task_eval: Literal["success", "failure"]
  • task_eval_source: str
  • task_eval_at: datetime
  • task_metadata: Dict[str, object] (Note: this will override the whole metadata object, not just the specified keys)

If you need to update more fields, feel free to open an issue on the GitHub repository, submit a PR, or directly reach out.

# Make some changes
tasks_df["task_eval"] = "success"
tasks_df["task_metadata"] = tasks_df["task_metadata"].apply(
    # To avoid overriding the whole metadata object, use **x to unpack the existing metadata
    lambda x: {**x, "new_key": "new_value", "stuff": 44}
)

Push updated data

To push the updated data back to Phospho, use the push_tasks_df function.

  • You need to pass the task_id
  • As a best practice, pass only the columns you want to update.
# Select only the columns you want to update
phospho.push_tasks_df(tasks_df[["task_id", "task_eval"]])

# To check that the data has been updated
phospho.tasks_df(limit=3)

You’re all set. Your custom analytics are now also available in the Phospho UI.