Skip to main content
Training in OuterProduct is a server-side search over model families, hyperparameters, and metrics. Our infrastructure complies and executes this configuration, then retrieves the best model the search found.
op.reasoning.fit() is the fastest path to a model that produces both predictions and feature-level explanations. The same model-family × hyperparameter × metric search runs.
import outerproduct as op

op.init()

model = op.reasoning.fit(
    dataset, task=op.Binclass(label_column="churn")
).wait()  # ReasoningModel
Use this path when you need model.explain(), model.get_global_drivers(), or model.scenario().

How the Trainer works

Trainer is an orchestrator that coordinates a search over model families × hyperparameters × metrics. When you call run():
  • All candidate model families are tuned in parallel, not one at a time.
  • Each family’s hyperparameters are searched with the strategy you select.
  • When you pass multiple metrics, the search explores the weighting between them rather than treating one as primary, so you don’t have to guess the right trade-off up front.
  • Parallelism and per-trial timeouts are managed server-side; there are no SDK-level knobs.
The result is a single trained model representing the best configuration found. op.reasoning.fit() runs this same search and returns a ReasoningModel.

Building a Dataset for training

Use op.LocalDataset.upload() to connect datasets that live locally on your machine to our platform. This method it sends your file to OuterProduct via a presigned URL and returns a server-backed Dataset the trainer reads at training time.
# From a pandas DataFrame
dataset = op.LocalDataset.from_pandas(df).upload()

# From a CSV (loaded locally)
dataset = op.LocalDataset.from_csv("customers.csv").upload()

# From NumPy arrays
dataset = op.LocalDataset.from_numpy(X, y, feature_names=["sqft", "bedrooms"]).upload()
Each column carries a Column schema. The schema is inferred automatically by default; override it via the columns= argument when needed. The target column is always specified at fit time via the task config (e.g. op.Binclass(label_column=...)), not on the dataset itself.

Default training call

The smallest valid call accepts a dataset and the target column name:
# ReasoningModel (predictions + explanations)
model = op.reasoning.fit(
    dataset, task=op.Binclass(label_column="churn")
).wait()

# Plain Model (predictions only)
trainer = op.Trainer.configure(dataset, task=op.Binclass(label_column="churn"))
model = trainer.run().wait()

Choosing model types

Pass model_types as a list of string identifiers resolved server-side. If you omit it, OuterProduct selects a curated set based on your dataset. All listed families are tuned together in the same search.
trainer = op.Trainer.configure(
    dataset,
    task=op.Binclass(label_column="churn"),
    model_types=["tabm", "xgboost"],
)
model = trainer.run().wait()

Metrics

Pass a single op.Metric() or a list. With a single metric, training optimizes it directly. With multiple metrics, the search explores all weightings between them and returns the best model found across the Pareto frontier that trades off the listed metrics.
trainer = op.Trainer.configure(
    dataset,
    task=op.Binclass(label_column="churn"),
    metric=[op.Metric("auc"), op.Metric("accuracy")],
)
model = trainer.run().wait()
Multi-metric sweeps are especially useful when you care about both calibration and ranking quality: you get the best achievable trade-off without having to pick a fixed weighting up front.

Search strategy

Trainer.run() accepts a strategy string. Use "random" for fast (parallel) exploration or "optuna" for adaptive tuning (in serial).
model = trainer.run(strategy="optuna", n_trials=50, grid_size=0.1).wait()
  • n_trials: per-family trial budget.
  • grid_size: resolution of the metric-weighting sweep when you pass multiple metrics.
For op.reasoning.fit(), control search depth via n_hyperopt_steps:
model = op.reasoning.fit(
    dataset,
    task=op.Binclass(label_column="churn"),
    model_types=["tabm", "xgboost"],
    n_hyperopt_steps=10,
).wait()

Jobs API

run() and fit() submit work and return a job handle immediately; neither call blocks. Retrieve the trained model when you need it.
job = trainer.run()

job.status()    # "pending" | "running" | "completed" | "failed"
model = job.wait()   # block until done, return the trained model
job.results()        # raw result payload (ids, URIs) once completed
.wait() is the blocking convenience used throughout these examples. Use .status() or .results() to poll or fetch the outcome without blocking your process.

Distillation

Pass a teacher to train a student model that mimics an external model’s predictions. The teacher can be a trained Model or a Predictor wrapping an HTTP endpoint.
teacher = op.model.Predictor(
    "https://api.example.com/predict",
    headers={"Authorization": "Bearer ..."},
)

# Plain distilled Model (label_column is optional when a teacher is set):
trainer = op.Trainer.configure(dataset, teacher=teacher)
model = trainer.run().wait()
To get a distilled student that also explains the teacher’s behavior, use op.reasoning.fit():
reasoning_model = op.reasoning.fit(dataset, teacher=teacher).wait()
The teacher endpoint must return predictions in the same format the trainer expects. Mismatched output shapes will cause the distillation job to fail.