Skip to main content
The outerproduct.reasoning module is the entry point for training reasoning models and analysing their behaviour across populations. op.reasoning.fit() trains a ReasoningModel that produces both predictions and feature-level attributions. op.reasoning.pattern_tracker.fit() takes a fitted ReasoningModel and condenses its explanations over a target prediction band into a small set of named, executable filter patterns you can apply to any new data.

op.reasoning.fit()

op.reasoning.fit(
    dataset: Dataset,
    task: Task | None = None,
    teacher: Model | Predictor | None = None,
    model_types: list[str] | None = None,
    metric: Metric | list[Metric] | None = None,
    n_hyperopt_steps: int = 5,
    random_state: int = 42,
) -> ReasoningFitJob
Submits a training job and returns a non-blocking Job handle immediately. Call .wait() on the handle to block until training completes and receive the trained ReasoningModel.

Parameters

dataset
Dataset
required
Training data wrapped in an op.Dataset. Build one with op.LocalDataset.from_pandas(df).upload(), op.LocalDataset.from_csv(...).upload(), or any other LocalDataset constructor, or reference a connector. The column you name in task’s label_column (or the labels supplied by teacher) is used as the target; all remaining columns are treated as features.
task
Task | None
The supervised learning task, which carries the target label_column. Pass the config that matches your problem:
ConfigUse when
op.Binclass(label_column=...)Binary classification (e.g. churn, fraud, approval)
op.Multiclass(label_column=...)Multi-class classification (three or more discrete classes)
op.Regression(label_column=...)Continuous numerical target
op.Forecasting(label_column=..., id_column=..., timestamp_column=..., horizon=..., lookback=...)Time-series forecasting
op.SequenceBinclass(label_column=..., id_column=..., timestamp_column=...)Per-entity binary classification over each entity’s sequence
op.SequenceMulticlass(label_column=..., id_column=..., timestamp_column=...)Per-entity multi-class classification over each entity’s sequence
op.SequenceRegression(label_column=..., id_column=..., timestamp_column=...)Per-entity regression over each entity’s sequence (accepted, not yet executable)
Required unless teacher is provided. When a teacher is set, OuterProduct queries it to generate training labels and task becomes optional.
model_types
list[str] | None
Restrict the model family search to a specific list, e.g. ["tabm", "xgboost"]. When None, OuterProduct selects from its full portfolio of model families. Providing this list is useful when you have a latency or interpretability constraint that rules out certain architectures.
metric
Metric | list[Metric] | None
The optimization target(s). Pass a single op.Metric to optimize directly, or a list to trigger a Pareto sweep across their weightings. When None, OuterProduct picks a sensible default for the task.
n_hyperopt_steps
int
default:"5"
Number of hyperparameter optimisation trials to run. Higher values improve model quality at the cost of longer training time. Defaults to 5.
random_state
int
default:"42"
Seed for the training search, for reproducible runs. Defaults to 42.
teacher
Model | Predictor | None
A teacher model for knowledge distillation. Accepts either a previously trained OuterProduct Model / ReasoningModel or a Predictor wrapping an external HTTP scoring endpoint. When set, OuterProduct trains the new ReasoningModel to mimic the teacher’s output, adding full reasoning to a black-box predictor.

Return value

returns
Job[ReasoningModel]
A non-blocking job handle. The job runs on OuterProduct’s hosted infrastructure. Use the methods below to interact with it.

Examples

import outerproduct as op

op.init(api_key="your-api-key")

dataset = op.LocalDataset.from_csv("customers.csv").upload()

model = op.reasoning.fit(
    dataset,
    task=op.Binclass(label_column="churn"),
).wait()  # ReasoningModel

predictions = model.predict(op.LocalDataset.from_pandas(X_new).upload())
reasoning   = model.explain(op.LocalDataset.from_pandas(X_new).upload())
For large datasets or long hyperopt runs, use the non-blocking pattern and poll job.status() so your script can do other work while training proceeds.

op.reasoning.pattern_tracker.fit()

op.reasoning.pattern_tracker.fit(
    model: ReasoningModel,
    dataset: Dataset,
    target_range: tuple[float | None, float | None],
) -> Job[PatternTracker]
Distils a ReasoningModel’s explanation behaviour on a specific prediction band into a compact, portable set of named filter patterns. The fitted PatternTracker can then be applied to any schema-compatible dataset to score rows against those patterns.

Parameters

model
ReasoningModel
required
A trained ReasoningModel (produced by op.reasoning.fit().wait()). The tracker learns from this model’s explanations over the supplied dataset.
dataset
Dataset
required
The dataset used to fit the tracker. The tracker analyses the model’s predictions and attributions over these rows to extract recurring patterns within the target_range.
target_range
tuple[float | None, float | None]
required
An inclusive prediction band that defines which rows are considered “positive” examples for pattern extraction. Either bound may be None for an open-ended range; at least one bound must be set.
target_rangeSelects
(0.5, None)pred >= 0.5, likely-positive cohort
(None, 0.5)pred <= 0.5, likely-negative cohort
(0.4, 0.6)0.4 <= pred <= 0.6, borderline / uncertain band

Return value

returns
Job[PatternTracker]
A non-blocking job handle. Call .wait() to block until fitting completes and receive the PatternTracker. You can also poll with .status() or access the raw payload with .results().
import outerproduct as op

op.init(api_key="your-api-key")

dataset = op.LocalDataset.from_csv("customers.csv").upload()

model = op.reasoning.fit(
    dataset, task=op.Binclass(label_column="churn")
).wait()

pt = op.reasoning.pattern_tracker.fit(
    model,
    dataset,
    target_range=(0.5, None),  # analyse rows where pred >= 0.5
).wait()

print(f"{len(pt.patterns)} patterns; coverage={pt.coverage_fit:.0%}")

PatternTracker

PatternTracker is produced by op.reasoning.pattern_tracker.fit().wait(). It holds a set of named filter patterns and can score any schema-compatible dataset against them.

Attributes

patterns
list[FilterPattern]
The patterns discovered during fitting. Each entry is a FilterPattern with a human-readable label and quality metrics. See FilterPattern below.
coverage_fit
float
Fraction of rows in the fitting dataset that are matched by at least one pattern. A value of 0.82 means 82 % of the fitting set falls under at least one named pattern.

Methods

pt.transform()

pt.transform(X: Dataset) -> pd.DataFrame
Returns a boolean DataFrame of shape (n_rows, n_patterns). Each column corresponds to one pattern (named by FilterPattern.label); a cell is True if that row matches that pattern. Rows can match multiple patterns simultaneously.
X
Dataset
required
New data to score. Must be schema-compatible with the dataset used to fit the tracker.
returns
pd.DataFrame
Boolean DataFrame of shape (n_rows, n_patterns). Column names match the FilterPattern.label values in pt.patterns.

pt.distribution()

pt.distribution(X: Dataset) -> pd.Series
Returns match rates (the fraction of rows in X that match each pattern) as a pd.Series indexed by pattern label.
X
Dataset
required
New data to score.
returns
pd.Series
Float Series of shape (n_patterns,), indexed by FilterPattern.label. Values are in [0, 1].

pt.partition()

pt.partition(X: Dataset) -> dict[str, np.ndarray]
Returns a dict mapping each pattern label to the integer row indices in X that match it. Useful when you want to extract the actual rows belonging to each pattern segment.
X
Dataset
required
New data to score.
returns
dict[str, np.ndarray]
Mapping from pattern label to a 1-D integer array of matching row indices. A row may appear under multiple pattern labels.

Full usage example

import outerproduct as op
import pandas as pd

op.init(api_key="your-api-key")

# --- Fit ---
dataset = op.LocalDataset.from_csv("customers.csv").upload()

model = op.reasoning.fit(
    dataset, task=op.Binclass(label_column="churn")
).wait()

pt = op.reasoning.pattern_tracker.fit(
    model,
    dataset,
    target_range=(0.5, None),
).wait()

# Inspect the discovered patterns
print(f"{len(pt.patterns)} patterns; coverage={pt.coverage_fit:.0%}")
for fp in pt.patterns:
    print(f"  {fp.label}: precision={fp.precision:.2f}, lift={fp.lift:.2f}")

# --- Apply to new data ---
X_new = op.LocalDataset.from_pandas(pd.read_csv("new_customers.csv")).upload()

# Boolean match matrix
match_matrix = pt.transform(X_new)
print(match_matrix.head())

# Match rates per pattern
print(pt.distribution(X_new))

# Row indices per pattern
segments = pt.partition(X_new)
for label, indices in segments.items():
    print(f"{label}: {len(indices)} matching rows")
transform(), distribution(), and partition() require X to be schema-compatible with the dataset used during pattern_tracker.fit(). Mismatched column names will raise a local validation error before any network call is made.

FilterPattern

A single named pattern discovered by the PatternTracker.
label
str
A human-readable name for the pattern, generated by OuterProduct to summarise the feature conditions that define it (e.g. "high_credit_low_income").
precision
float
Fraction of rows matching this pattern whose prediction also falls within target_range. Higher precision means the pattern is a more reliable indicator of the target cohort. Values are in [0, 1].
lift
float
Ratio of the pattern’s precision to the base rate of target_range in the fitting dataset. A lift of 2.0 means rows matching this pattern are twice as likely to fall in the target band as a randomly chosen row.
for fp in pt.patterns:
    print(f"{fp.label}")
    print(f"  precision : {fp.precision:.2%}")
    print(f"  lift      : {fp.lift:.2f}x")
# high_credit_low_income
#   precision : 91.30%
#   lift      : 2.41x
# recent_late_payments
#   precision : 87.50%
#   lift      : 2.31x