outerproduct.Dataset is the typed, server-backed data handle that all OuterProduct training and inference functions accept. You don’t construct it from raw data directly. You stage local data in an op.LocalDataset (from a file, DataFrame, or array) and call .upload(), or you reference a connector (S3, Snowflake, Databricks). The target column is not marked on the dataset itself; you name it later via the task config (e.g. op.Binclass(label_column=...)) when you configure a trainer or call reasoning.fit().
You cannot build a
Dataset from a raw pandas.DataFrame or numpy.ndarray directly (op.Dataset(df) is not supported). Stage the data through op.LocalDataset and call .upload().Building a dataset
Stage local data in anop.LocalDataset and call .upload(). It serializes the rows, uploads them via a presigned URL, and returns a server-backed Dataset:
LocalDataset constructors
Each constructor returns aLocalDataset holding your rows. Call .upload() (or .aupload() for async) on it to get a Dataset.
LocalDataset.from_csv()
Stage a local CSV file.
Path to the CSV file on your local filesystem.
LocalDataset.from_pandas()
Stage an existing pandas DataFrame.
A pandas DataFrame. The DataFrame index is ignored; all columns become features.
LocalDataset.from_polars()
Stage an existing polars DataFrame.
A polars DataFrame. All columns become features.
LocalDataset.from_parquet()
Stage a local Parquet file.
Path to the Parquet file on your local filesystem.
LocalDataset.from_numpy()
Stage NumPy arrays. Useful when your data is already in memory as arrays rather than a DataFrame.
Feature matrix of shape
(n_samples, n_features).Optional label array of shape
(n_samples,). When provided, it is appended to the dataset as an additional column alongside the features.Optional list of column names for the feature matrix. Must have the same length as
X.shape[1]. If omitted, columns are named x0, x1, etc.op.LocalDataset also accepts a torch.Tensor, a 2-D numpy.ndarray, and nested lists, all normalized the same way.Datasets from connectors
ADataset can also reference data in a cloud source without uploading. Configure the credential once in the Console, then build a connector-backed Dataset with .table(...):
Column Schema
Every column in a dataset carries aColumn schema object that captures its data type. By default the schema is inferred automatically from the data, so you rarely need to set it manually. Inspect it via dataset.columns.
The target (label) column is not declared on the dataset. You specify it later using the
task config (e.g. op.Binclass(label_column=...)) in Trainer.configure() or op.reasoning.fit().