Skip to content

ClearTrace Client

Attributes

Name Type Description
model_id str \| None Model ID for the current job.
feature_names list[str] \| None Column names captured during fit.
status StatusResult Current job status.

Methods

outerproduct.client.ClearTrace

Client for the OuterProduct ClearTrace API.

Parameters:

Name Type Description Default
base_url str

API server URL. Defaults to OUTERPRODUCT_API_URL env var, then https://cleartrace-api.onrender.com.

None
api_key str

Bearer token. Defaults to OUTERPRODUCT_API_KEY env var.

None
timeout float

Max seconds for HTTP requests and async job polling.

300
poll_interval float

Initial delay between status polls (exponential backoff).

2.0
max_retries int

Retries on transient HTTP errors (502/503/504).

3

Methods:

Name Description
create_upload

Request a presigned upload URL from the API.

upload_fileobj

Upload data to S3 using a presigned URL.

upload_file

Upload a data file to S3 via a presigned URL.

fit

Train a ClearTrace model on labelled data.

fit_distill

Distill a black-box model via its predict URL.

predict

Batch predictions for X. Returns a numpy array.

explain

Batch prediction + AGOP-based explanation.

predict_and_explain

Batch predict and explain in one call.

interpret

Global feature importance via AGOP.

get_schema

Retrieve the persisted schema manifest for the loaded model.

scenario

Counterfactual search with constraints.

segment

Supervised segmentation (async).

get_segments

Retrieve completed segmentation results.

narrative

LLM-generated natural language summary (async).

get_narrative

Retrieve completed narrative results.

health

Check API server health.

load

Attach to an existing model by ID.

create_upload(file_format, *, model_id=None)

Request a presigned upload URL from the API.

This is the first step of a multi-step upload flow. The returned UploadResult contains the upload_url to PUT data to.

Parameters:

Name Type Description Default
file_format str

File format enum: "csv", "parquet", or "pkl".

required
model_id str

Custom model ID. Auto-generated by the server if omitted.

None

Returns:

Type Description
UploadResult

upload_fileobj(upload, fileobj)

Upload data to S3 using a presigned URL.

Parameters:

Name Type Description Default
upload UploadResult

The result returned by create_upload.

required
fileobj bytes or binary file-like object

The data to upload. Pass raw bytes or an open file in binary mode.

required

upload_file(path)

Upload a data file to S3 via a presigned URL.

Convenience method that combines create_upload and upload_fileobj into a single call.

The file is uploaded as raw bytes — CSV, Parquet, or pickle format is auto-detected from the file extension.

After uploading, call fit or fit_distill with target to specify the label column in the uploaded file.

Parameters:

Name Type Description Default
path str or PathLike

Path to the file to upload. Accepted formats: .csv, .parquet, .pq, .pkl, .pickle.

required

Returns:

Type Description
UploadResult

fit(data=None, *, target, feature_fields=None, feature_schema=None, wait=True, **config)

Train a ClearTrace model on labelled data.

Parameters:

Name Type Description Default
data DataFrame, ndarray, or None

Dataset containing features (and optionally the target column). Omit when using pre-uploaded data (see upload_file).

None
target str, array-like, or None

If data is a DataFrame, the name of the target column. If data is an ndarray, the target values directly. Omit when using pre-uploaded data.

required
feature_fields list[str]

Column names to use as features. Only applies when data is a DataFrame. If omitted, all columns except target are used.

None
wait bool

Block until training completes (default True).

True
**config Any

Forwarded to the server as flat fields (e.g. n_hyperopt_steps=6, mode="fast", base_model_type="xgboost").

{}

Returns:

Type Description
JobResult

fit_distill(data=None, predict_url=None, *, target=None, predict_headers=None, labels=None, feature_fields=None, feature_schema=None, wait=True, **config)

Distill a black-box model via its predict URL.

The server calls predict_url to obtain teacher predictions, then trains an xRFM student model.

Parameters:

Name Type Description Default
data DataFrame, ndarray, or None

Feature matrix. Omit when using pre-uploaded data (see upload_file).

None
predict_url str

URL of the black-box model's predict endpoint.

None
target str

Name of the label column in the uploaded file. Only used with pre-uploaded data. Optional for distill (the teacher predictions can drive training alone).

None
predict_headers dict[str, str]

Headers to include when calling predict_url.

None
labels array - like

Optional ground-truth labels for evaluation.

None
feature_fields list[str]

Column names to use as features. Only applies when data is a DataFrame. If omitted, all columns are used.

None
wait bool

Block until training completes (default True).

True
**config Any

Forwarded to the server (same options as fit).

{}

Returns:

Type Description
JobResult

predict(X)

Batch predictions for X. Returns a numpy array.

explain(X, *, feature_names=None, use_sqrt=True, raw_gradient=True)

Batch prediction + AGOP-based explanation.

Parameters:

Name Type Description Default
X DataFrame, ndarray, or nested list

2-D feature matrix (n_samples, n_features).

required
feature_names list[str]

Feature names. Auto-extracted from DataFrame columns.

None
use_sqrt bool

Use sqrt scaling (default True).

True
raw_gradient bool

Return raw gradient (default True).

True

Returns:

Type Description
ExplanationResult

predict_and_explain(X, *, feature_names=None, use_sqrt=False, raw_gradient=True, with_persona=False, rule_kwargs=None)

Batch predict and explain in one call.

Parameters:

Name Type Description Default
X DataFrame, ndarray, or nested list

2-D feature matrix (n_samples, n_features).

required
feature_names list[str]

Feature names. Auto-extracted from DataFrame columns.

None
use_sqrt bool

Use sqrt scaling (default False).

False
raw_gradient bool

Return raw gradient (default True).

True
with_persona bool

Include persona information (default False).

False
rule_kwargs dict[str, Any]

When provided, enables local-rule computation. Pass {} for library defaults, or a populated dict (e.g. {"selector": "lift_threshold", "lift_threshold": 0.9}). Omit or pass None to skip rules.

None

Returns:

Type Description
PredictAndExplainResult

interpret()

Global feature importance via AGOP.

Returns:

Type Description
InterpretResult

get_schema()

Retrieve the persisted schema manifest for the loaded model.

Returns:

Type Description
SchemaResult

scenario(queries, *, feature_names=None, desired_class=1, n_walks=500, max_steps=30, epsilon=0.2, random_state=42, constraints=None)

Counterfactual search with constraints.

Finds counterfactual points that flip the model prediction to desired_class while respecting any feature constraints.

Parameters:

Name Type Description Default
queries DataFrame, ndarray, list[dict], or nested list

Query records. Accepts a pandas DataFrame, numpy array, list of {feature_name: value} dicts, or a 2-D nested list. When dicts or a DataFrame are passed, feature names are extracted automatically.

required
feature_names list[str]

Feature names. Auto-extracted from DataFrame columns or dict keys. Required when queries is an ndarray or nested list.

None
desired_class int

Target class for the counterfactual (default 1).

1
n_walks int

Number of random walks (default 500).

500
max_steps int

Maximum steps per walk (default 30).

30
epsilon float

Step size (default 0.2).

0.2
random_state int or None

Random seed for reproducibility (default 42).

42
constraints dict[str, dict[str, Any]]

Per-feature constraints. Each value may contain keys: immutable (bool), monotonic ("increase"/"decrease"), value_range ([min, max]), allowed_values (list).

None

Returns:

Type Description
ScenarioResult

Supports indexing (result[i]) and iteration to access individual ScenarioResultItem entries.

Examples:

From a DataFrame:

>>> result = ct.scenario(df[["age", "income"]].head(2))

From dicts:

>>> result = ct.scenario(
...     [{"age": 25, "income": 50000}],
...     constraints={"age": {"immutable": True}},
... )
>>> result[0].baseline_prediction
0.3
>>> for candidate in result[0]:
...     print(candidate.changes)

segment(*, data=None, target_values=None, feature_names=None, min_clusters=4, max_clusters=10, n_search_steps=50, use_agent=None, kpi_field=None, problem_context=None, wait=True)

Supervised segmentation (async).

Groups the data into clusters where each cluster has distinct explanation patterns.

If wait=True (default), polls GET /v1/models/{model_id}/segments until complete and returns a SegmentationResult. If wait=False, returns a JobResult immediately.

get_segments()

Retrieve completed segmentation results.

narrative(data, *, feature_names=None, kpi_name, context=None, max_tool_calls=6, wait=True)

LLM-generated natural language summary (async).

If wait=True (default), polls until complete and returns a Narrative. If wait=False, returns a JobResult.

get_narrative()

Retrieve completed narrative results.

health()

Check API server health.

load(model_id)

Attach to an existing model by ID.