Python SDK
The Plato Python SDK provides an API for running evaluations on the Plato platform
Plato Python SDK
The Plato Python SDK provides an API for running evaluations on the Plato platform. It allows you to configure tasks, manage browser sessions, and compute custom scores with ease.
Overview
Task
Define what needs to be evaluated and how
Configure Run
Configure task execution and browser management settings
Scores
Compute and track evaluation metrics
Eval Results
Access detailed evaluation results and summaries
Installation
Task Definition
The Task
class is the fundamental building block for defining what needs to be evaluated. Here’s the structure:
Field | Type | Description |
---|---|---|
name | str | Unique identifier for the task |
prompt | str | The main instruction or content to be evaluated |
start_url | Optional[str] | Initial URL to navigate to when starting the task |
output_schema | Optional[Any] | Schema definition for expected task output |
extra | dict | Additional task-specific configuration options |
Getting Started
1. Basic Runner Configuration
Here’s a simple example to get started:
2. Advanced Configuration with Custom Browser
If you need more control over browser management, you can provide a custom_browser
function. This is optional but useful when you want to use your own browser automation setup:
3. Running the Evaluation
API Reference
PlatoRunnerConfig
Field | Type | Description |
---|---|---|
name | str | Name of the evaluation run |
data | List[Task] | List of tasks to be evaluated |
task | Callable[[Task, PlatoSession], Awaitable[Any]] | Async function that processes a task |
trial_count | int | Number of trials per task |
timeout | int | Overall timeout in milliseconds |
max_concurrency | int | Maximum concurrent task executions |
custom_browser | Optional[Callable[[Task], Awaitable[str]]] | Function returning a CDP URL |
custom_scores | List[Callable[[Dict[str, Any]], Awaitable[float]]] | Custom scoring functions |
PlatoSession
Method | Description |
---|---|
start(plato: Plato, task: Task) | Starts a new browser session |
terminate(plato: Plato, session_id: str) | Terminates an API-created session |
log(message: str) | Sends a log message |
score() | Sends computed score |
close() | Closes the browser session |
Custom Browser Integration
When using your own browser management (e.g., Playwright), provide a custom_browser
function in your configuration. This function should:
- Accept a
Task
parameter - Return a CDP URL (Chrome DevTools Protocol WebSocket URL)
- Handle browser lifecycle management
The SDK will automatically use this function during session initialization.
Custom Scores
Custom scoring functions allow you to define metrics for evaluating task performance. Each function receives the task output and returns a score between 0 and 1.
Eval Results
The evaluation results contain detailed information about task execution and scoring:
Contributing
We welcome contributions and feedback! Feel free to open issues or submit pull requests on our GitHub repository.
License
This SDK is available under the MIT License.