Skip to main content
outerproduct.Dataset is the typed, server-backed data handle that all OuterProduct training and inference functions accept. You don’t construct it from raw data directly. You stage local data in an op.LocalDataset (from a file, DataFrame, or array) and call .upload(), or you reference a connector (S3, Snowflake, Databricks). The target column is not marked on the dataset itself; you name it later via the task config (e.g. op.Binclass(label_column=...)) when you configure a trainer or call reasoning.fit().
You cannot build a Dataset from a raw pandas.DataFrame or numpy.ndarray directly (op.Dataset(df) is not supported). Stage the data through op.LocalDataset and call .upload().

Building a dataset

Stage local data in an op.LocalDataset and call .upload(). It serializes the rows, uploads them via a presigned URL, and returns a server-backed Dataset:
import pandas as pd
import outerproduct as op

df = pd.DataFrame({"age": [25, 32, 47], "income": [40000, 75000, 120000]})
dataset = op.LocalDataset.from_pandas(df).upload()

LocalDataset constructors

Each constructor returns a LocalDataset holding your rows. Call .upload() (or .aupload() for async) on it to get a Dataset.

LocalDataset.from_csv()

Stage a local CSV file.
op.LocalDataset.from_csv(path)
path
string
required
Path to the CSV file on your local filesystem.
import outerproduct as op

dataset = op.LocalDataset.from_csv("customers.csv").upload()

LocalDataset.from_pandas()

Stage an existing pandas DataFrame.
op.LocalDataset.from_pandas(df)
df
pd.DataFrame
required
A pandas DataFrame. The DataFrame index is ignored; all columns become features.
import pandas as pd
import outerproduct as op

df = pd.read_csv("loans.csv")
dataset = op.LocalDataset.from_pandas(df).upload()

LocalDataset.from_polars()

Stage an existing polars DataFrame.
op.LocalDataset.from_polars(df)
df
pl.DataFrame
required
A polars DataFrame. All columns become features.
import polars as pl
import outerproduct as op

df = pl.read_csv("loans.csv")
dataset = op.LocalDataset.from_polars(df).upload()

LocalDataset.from_parquet()

Stage a local Parquet file.
op.LocalDataset.from_parquet(path)
path
string
required
Path to the Parquet file on your local filesystem.
import outerproduct as op

dataset = op.LocalDataset.from_parquet("loans.parquet").upload()

LocalDataset.from_numpy()

Stage NumPy arrays. Useful when your data is already in memory as arrays rather than a DataFrame.
op.LocalDataset.from_numpy(X, y=None, feature_names=None)
X
np.ndarray
required
Feature matrix of shape (n_samples, n_features).
y
np.ndarray
Optional label array of shape (n_samples,). When provided, it is appended to the dataset as an additional column alongside the features.
feature_names
list[str]
Optional list of column names for the feature matrix. Must have the same length as X.shape[1]. If omitted, columns are named x0, x1, etc.
import numpy as np
import outerproduct as op

X = np.array([[1200, 3, 2], [850, 2, 1], [2100, 4, 3]])
y = np.array([0, 0, 1])

dataset = op.LocalDataset.from_numpy(
    X,
    y,
    feature_names=["sqft", "bedrooms", "bathrooms"],
).upload()
op.LocalDataset also accepts a torch.Tensor, a 2-D numpy.ndarray, and nested lists, all normalized the same way.

Datasets from connectors

A Dataset can also reference data in a cloud source without uploading. Configure the credential once in the Console, then build a connector-backed Dataset with .table(...):
import outerproduct as op

connector = op.S3Connector(connector_credential_name="prod-aws", region="us-east-1")
dataset = connector.table("s3://my-bucket/data/customers.parquet")
See the Connectors reference for S3, Snowflake, and Databricks.

Column Schema

Every column in a dataset carries a Column schema object that captures its data type. By default the schema is inferred automatically from the data, so you rarely need to set it manually. Inspect it via dataset.columns.
The target (label) column is not declared on the dataset. You specify it later using the task config (e.g. op.Binclass(label_column=...)) in Trainer.configure() or op.reasoning.fit().

Complete Example

import outerproduct as op

op.init()

dataset = op.LocalDataset.from_csv("loans.csv").upload()
model = op.reasoning.fit(dataset, task=op.Binclass(label_column="approved")).wait()