ClearTrace Client

Attributes

Name	Type	Description
`model_id`	`str \\| None`	Model ID for the current job.
`feature_names`	`list[str] \\| None`	Column names captured during fit.
`status`	`StatusResult`	Current job status.

Methods

`outerproduct.client.ClearTrace`

Client for the OuterProduct ClearTrace API.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	API server URL. Defaults to `OUTERPRODUCT_API_URL` env var, then `https://cleartrace-api.onrender.com`.	`None`
`api_key`	`str`	Bearer token. Defaults to `OUTERPRODUCT_API_KEY` env var.	`None`
`timeout`	`float`	Max seconds for HTTP requests and async job polling.	`300`
`poll_interval`	`float`	Initial delay between status polls (exponential backoff).	`2.0`
`max_retries`	`int`	Retries on transient HTTP errors (502/503/504).	`3`

Methods:

Name	Description
`create_upload`	Request a presigned upload URL from the API.
`upload_fileobj`	Upload data to S3 using a presigned URL.
`upload_file`	Upload a data file to S3 via a presigned URL.
`fit`	Train a ClearTrace model on labelled data.
`fit_distill`	Distill a black-box model via its predict URL.
`predict`	Batch predictions for X. Returns a numpy array.
`explain`	Batch prediction + AGOP-based explanation.
`predict_and_explain`	Batch predict and explain in one call.
`interpret`	Global feature importance via AGOP.
`get_schema`	Retrieve the persisted schema manifest for the loaded model.
`scenario`	Counterfactual search with constraints.
`segment`	Supervised segmentation (async).
`get_segments`	Retrieve completed segmentation results.
`narrative`	LLM-generated natural language summary (async).
`get_narrative`	Retrieve completed narrative results.
`health`	Check API server health.
`load`	Attach to an existing model by ID.

`create_upload(file_format, *, model_id=None)`

Request a presigned upload URL from the API.

This is the first step of a multi-step upload flow. The returned UploadResult contains the upload_url to PUT data to.

Parameters:

Name	Type	Description	Default
`file_format`	`str`	File format enum: `"csv"`, `"parquet"`, or `"pkl"`.	required
`model_id`	`str`	Custom model ID. Auto-generated by the server if omitted.	`None`

Returns:

Type	Description
`UploadResult`

`upload_fileobj(upload, fileobj)`

Upload data to S3 using a presigned URL.

Parameters:

Name	Type	Description	Default
`upload`	`UploadResult`	The result returned by create_upload.	required
`fileobj`	`bytes or binary file-like object`	The data to upload. Pass raw `bytes` or an open file in binary mode.	required

`upload_file(path)`

Upload a data file to S3 via a presigned URL.

Convenience method that combines create_upload and upload_fileobj into a single call.

The file is uploaded as raw bytes — CSV, Parquet, or pickle format is auto-detected from the file extension.

After uploading, call fit or fit_distill with target to specify the label column in the uploaded file.

Parameters:

Name	Type	Description	Default
`path`	`str or PathLike`	Path to the file to upload. Accepted formats: `.csv`, `.parquet`, `.pq`, `.pkl`, `.pickle`.	required

Returns:

Type	Description
`UploadResult`

`fit(data=None, *, target, feature_fields=None, feature_schema=None, wait=True, **config)`

Train a ClearTrace model on labelled data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame, ndarray, or None`	Dataset containing features (and optionally the target column). Omit when using pre-uploaded data (see upload_file).	`None`
`target`	`str, array-like, or None`	If data is a DataFrame, the name of the target column. If data is an ndarray, the target values directly. Omit when using pre-uploaded data.	required
`feature_fields`	`list[str]`	Column names to use as features. Only applies when data is a DataFrame. If omitted, all columns except target are used.	`None`
`wait`	`bool`	Block until training completes (default `True`).	`True`
`**config`	`Any`	Forwarded to the server as flat fields (e.g. `n_hyperopt_steps=6`, `mode="fast"`, `base_model_type="xgboost"`).	`{}`

Returns:

Type	Description
`JobResult`

`fit_distill(data=None, predict_url=None, *, target=None, predict_headers=None, labels=None, feature_fields=None, feature_schema=None, wait=True, **config)`

Distill a black-box model via its predict URL.

The server calls predict_url to obtain teacher predictions, then trains an xRFM student model.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame, ndarray, or None`	Feature matrix. Omit when using pre-uploaded data (see upload_file).	`None`
`predict_url`	`str`	URL of the black-box model's predict endpoint.	`None`
`target`	`str`	Name of the label column in the uploaded file. Only used with pre-uploaded data. Optional for distill (the teacher predictions can drive training alone).	`None`
`predict_headers`	`dict[str, str]`	Headers to include when calling predict_url.	`None`
`labels`	`array - like`	Optional ground-truth labels for evaluation.	`None`
`feature_fields`	`list[str]`	Column names to use as features. Only applies when data is a DataFrame. If omitted, all columns are used.	`None`
`wait`	`bool`	Block until training completes (default `True`).	`True`
`**config`	`Any`	Forwarded to the server (same options as fit).	`{}`

Returns:

Type	Description
`JobResult`

`predict(X)`

Batch predictions for X. Returns a numpy array.

`explain(X, *, feature_names=None, use_sqrt=True, raw_gradient=True)`

Batch prediction + AGOP-based explanation.

Parameters:

Name	Type	Description	Default
`X`	`DataFrame, ndarray, or nested list`	2-D feature matrix (n_samples, n_features).	required
`feature_names`	`list[str]`	Feature names. Auto-extracted from DataFrame columns.	`None`
`use_sqrt`	`bool`	Use sqrt scaling (default `True`).	`True`
`raw_gradient`	`bool`	Return raw gradient (default `True`).	`True`

Returns:

Type	Description
`ExplanationResult`

`predict_and_explain(X, *, feature_names=None, use_sqrt=False, raw_gradient=True, with_persona=False, rule_kwargs=None)`

Batch predict and explain in one call.

Parameters:

Name	Type	Description	Default
`X`	`DataFrame, ndarray, or nested list`	2-D feature matrix (n_samples, n_features).	required
`feature_names`	`list[str]`	Feature names. Auto-extracted from DataFrame columns.	`None`
`use_sqrt`	`bool`	Use sqrt scaling (default `False`).	`False`
`raw_gradient`	`bool`	Return raw gradient (default `True`).	`True`
`with_persona`	`bool`	Include persona information (default `False`).	`False`
`rule_kwargs`	`dict[str, Any]`	When provided, enables local-rule computation. Pass `{}` for library defaults, or a populated dict (e.g. `{"selector": "lift_threshold", "lift_threshold": 0.9}`). Omit or pass `None` to skip rules.	`None`

Returns:

Type	Description
`PredictAndExplainResult`

`interpret()`

Global feature importance via AGOP.

Returns:

Type	Description
`InterpretResult`

`get_schema()`

Retrieve the persisted schema manifest for the loaded model.

Returns:

Type	Description
`SchemaResult`

`scenario(queries, *, feature_names=None, desired_class=1, n_walks=500, max_steps=30, epsilon=0.2, random_state=42, constraints=None)`

Counterfactual search with constraints.

Finds counterfactual points that flip the model prediction to desired_class while respecting any feature constraints.

Parameters:

Name	Type	Description	Default
`queries`	`DataFrame, ndarray, list[dict], or nested list`	Query records. Accepts a pandas DataFrame, numpy array, list of `{feature_name: value}` dicts, or a 2-D nested list. When dicts or a DataFrame are passed, feature names are extracted automatically.	required
`feature_names`	`list[str]`	Feature names. Auto-extracted from DataFrame columns or dict keys. Required when queries is an ndarray or nested list.	`None`
`desired_class`	`int`	Target class for the counterfactual (default `1`).	`1`
`n_walks`	`int`	Number of random walks (default `500`).	`500`
`max_steps`	`int`	Maximum steps per walk (default `30`).	`30`
`epsilon`	`float`	Step size (default `0.2`).	`0.2`
`random_state`	`int or None`	Random seed for reproducibility (default `42`).	`42`
`constraints`	`dict[str, dict[str, Any]]`	Per-feature constraints. Each value may contain keys: `immutable` (bool), `monotonic` ("increase"/"decrease"), `value_range` ([min, max]), `allowed_values` (list).	`None`

Returns:

Type	Description
`ScenarioResult`	Supports indexing (`result[i]`) and iteration to access individual ScenarioResultItem entries.

Examples:

From a DataFrame:

>>> result = ct.scenario(df[["age", "income"]].head(2))

From dicts:

>>> result = ct.scenario(
...     [{"age": 25, "income": 50000}],
...     constraints={"age": {"immutable": True}},
... )
>>> result[0].baseline_prediction
0.3
>>> for candidate in result[0]:
...     print(candidate.changes)

`segment(*, data=None, target_values=None, feature_names=None, min_clusters=4, max_clusters=10, n_search_steps=50, use_agent=None, kpi_field=None, problem_context=None, wait=True)`

Supervised segmentation (async).

Groups the data into clusters where each cluster has distinct explanation patterns.

If wait=True (default), polls GET /v1/models/{model_id}/segments until complete and returns a SegmentationResult. If wait=False, returns a JobResult immediately.

`get_segments()`

Retrieve completed segmentation results.

`narrative(data, *, feature_names=None, kpi_name, context=None, max_tool_calls=6, wait=True)`

LLM-generated natural language summary (async).

If wait=True (default), polls until complete and returns a Narrative. If wait=False, returns a JobResult.

`get_narrative()`

Retrieve completed narrative results.

`health()`

Check API server health.

`load(model_id)`

Attach to an existing model by ID.