outerproduct.Dataset - OuterProduct

outerproduct.Dataset is the typed, server-backed data handle that all OuterProduct training and inference functions accept. You don’t construct it from raw data directly. You stage local data in an op.LocalDataset (from a file, DataFrame, or array) and call .upload(), or you reference a connector (S3, Snowflake, Databricks). The target column is not marked on the dataset itself; you name it later via the task config (e.g. op.Binclass(label_column=...)) when you configure a trainer or call reasoning.fit().

You cannot build a Dataset from a raw pandas.DataFrame or numpy.ndarray directly (op.Dataset(df) is not supported). Stage the data through op.LocalDataset and call .upload().

Building a dataset

Stage local data in an op.LocalDataset and call .upload(). It serializes the rows, uploads them via a presigned URL, and returns a server-backed Dataset:

import pandas as pd
import outerproduct as op

df = pd.DataFrame({"age": [25, 32, 47], "income": [40000, 75000, 120000]})
dataset = op.LocalDataset.from_pandas(df).upload()

LocalDataset constructors

Each constructor returns a LocalDataset holding your rows. Call .upload() (or .aupload() for async) on it to get a Dataset.

`LocalDataset.from_csv()`

Stage a local CSV file.

op.LocalDataset.from_csv(path)

path

string

required

Path to the CSV file on your local filesystem.

import outerproduct as op

dataset = op.LocalDataset.from_csv("customers.csv").upload()

`LocalDataset.from_pandas()`

Stage an existing pandas DataFrame.

op.LocalDataset.from_pandas(df)

pd.DataFrame

required

A pandas DataFrame. The DataFrame index is ignored; all columns become features.

import pandas as pd
import outerproduct as op

df = pd.read_csv("loans.csv")
dataset = op.LocalDataset.from_pandas(df).upload()

`LocalDataset.from_polars()`

Stage an existing polars DataFrame.

op.LocalDataset.from_polars(df)

pl.DataFrame

required

A polars DataFrame. All columns become features.

import polars as pl
import outerproduct as op

df = pl.read_csv("loans.csv")
dataset = op.LocalDataset.from_polars(df).upload()

`LocalDataset.from_parquet()`

Stage a local Parquet file.

op.LocalDataset.from_parquet(path)

path

string

required

Path to the Parquet file on your local filesystem.

import outerproduct as op

dataset = op.LocalDataset.from_parquet("loans.parquet").upload()

`LocalDataset.from_numpy()`

Stage NumPy arrays. Useful when your data is already in memory as arrays rather than a DataFrame.

op.LocalDataset.from_numpy(X, y=None, feature_names=None)

np.ndarray

required

Feature matrix of shape (n_samples, n_features).

np.ndarray

Optional label array of shape (n_samples,). When provided, it is appended to the dataset as an additional column alongside the features.

feature_names

list[str]

Optional list of column names for the feature matrix. Must have the same length as X.shape[1]. If omitted, columns are named x0, x1, etc.

import numpy as np
import outerproduct as op

X = np.array([[1200, 3, 2], [850, 2, 1], [2100, 4, 3]])
y = np.array([0, 0, 1])

dataset = op.LocalDataset.from_numpy(
    X,
    y,
    feature_names=["sqft", "bedrooms", "bathrooms"],
).upload()

op.LocalDataset also accepts a torch.Tensor, a 2-D numpy.ndarray, and nested lists, all normalized the same way.

Datasets from connectors

A Dataset can also reference data in a cloud source without uploading. Configure the credential once in the Console, then build a connector-backed Dataset with .table(...):

import outerproduct as op

connector = op.S3Connector(connector_credential_name="prod-aws", region="us-east-1")
dataset = connector.table("s3://my-bucket/data/customers.parquet")

See the Connectors reference for S3, Snowflake, and Databricks.

Column Schema

Every column in a dataset carries a Column schema object that captures its data type. By default the schema is inferred automatically from the data, so you rarely need to set it manually. Inspect it via dataset.columns.

The target (label) column is not declared on the dataset. You specify it later using the task config (e.g. op.Binclass(label_column=...)) in Trainer.configure() or op.reasoning.fit().

Complete Example

import outerproduct as op

op.init()

dataset = op.LocalDataset.from_csv("loans.csv").upload()
model = op.reasoning.fit(dataset, task=op.Binclass(label_column="approved")).wait()

​Building a dataset

​LocalDataset constructors

​LocalDataset.from_csv()

​LocalDataset.from_pandas()

​LocalDataset.from_polars()

​LocalDataset.from_parquet()

​LocalDataset.from_numpy()

​Datasets from connectors

​Column Schema

​Complete Example

Building a dataset

LocalDataset constructors

`LocalDataset.from_csv()`

`LocalDataset.from_pandas()`

`LocalDataset.from_polars()`

`LocalDataset.from_parquet()`

`LocalDataset.from_numpy()`

Datasets from connectors

Column Schema

Complete Example