Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::ColumnToTensor Class Reference

Provides static methods to convert Parquet column data into tensor form. More...

#include <tensor_bridge.hpp>

Static Public Member Functions

static expected< TensorViewwrap_column (const void *column_data, int64_t num_values, PhysicalType physical_type, int32_t type_length=-1)
 Wrap a contiguous numeric Parquet column as a 1D TensorView.
 
static expected< TensorViewwrap_vectors (const void *column_data, int64_t num_vectors, uint32_t dimension)
 Wrap a contiguous FLOAT32_VECTOR column as a 2D TensorView.
 
static expected< OwnedTensorcopy_column (const void *column_data, int64_t num_values, PhysicalType physical_type, TensorDataType target_dtype, int32_t type_length=-1)
 Read column data and produce an OwnedTensor with the requested type.
 
static expected< OwnedTensorcast (const TensorView &src, TensorDataType target_dtype)
 Cast a tensor view to a different element type, producing an OwnedTensor.
 
static expected< TensorDataTypeparquet_to_tensor_dtype (PhysicalType pt)
 Map a Parquet physical type to the natural TensorDataType.
 

Detailed Description

Provides static methods to convert Parquet column data into tensor form.

Two primary paths:

  1. Zero-copy (wrap_column / wrap_vectors): constructs a TensorView pointing directly into the Parquet page buffer. No allocation, no memcpy. The caller must ensure the page buffer outlives the view.
  2. Copy (copy_column / cast): allocates a new OwnedTensor and converts data into the requested type. Use when the Parquet physical type does not match the desired tensor type, or when the data must outlive the source buffer.

Definition at line 662 of file tensor_bridge.hpp.

Member Function Documentation

◆ cast()

static expected< OwnedTensor > signet::forge::ColumnToTensor::cast ( const TensorView src,
TensorDataType  target_dtype 
)
inlinestatic

Cast a tensor view to a different element type, producing an OwnedTensor.

Uses a type-dispatched inner loop. Supported source and target types: FLOAT32, FLOAT64, INT32, INT64, INT8, UINT8, INT16, BOOL. FLOAT16 as a source or target is not currently supported by cast().

Parameters
srcSource tensor view.
target_dtypeDesired output element type.
Returns
OwnedTensor with the cast data.

Definition at line 885 of file tensor_bridge.hpp.

◆ copy_column()

static expected< OwnedTensor > signet::forge::ColumnToTensor::copy_column ( const void *  column_data,
int64_t  num_values,
PhysicalType  physical_type,
TensorDataType  target_dtype,
int32_t  type_length = -1 
)
inlinestatic

Read column data and produce an OwnedTensor with the requested type.

Supports all numeric Parquet physical types. BYTE_ARRAY (variable- length strings) cannot be represented as a dense tensor and returns an error.

Parameters
column_dataSource data pointer.
num_valuesNumber of values.
physical_typeParquet physical type.
target_dtypeDesired tensor element type.
type_lengthFor FIXED_LEN_BYTE_ARRAY only.
Returns
OwnedTensor on success, Error otherwise.

Definition at line 795 of file tensor_bridge.hpp.

◆ parquet_to_tensor_dtype()

static expected< TensorDataType > signet::forge::ColumnToTensor::parquet_to_tensor_dtype ( PhysicalType  pt)
inlinestatic

Map a Parquet physical type to the natural TensorDataType.

Definition at line 923 of file tensor_bridge.hpp.

◆ wrap_column()

static expected< TensorView > signet::forge::ColumnToTensor::wrap_column ( const void *  column_data,
int64_t  num_values,
PhysicalType  physical_type,
int32_t  type_length = -1 
)
inlinestatic

Wrap a contiguous numeric Parquet column as a 1D TensorView.

Supported physical types: INT32, INT64, FLOAT, DOUBLE, FIXED_LEN_BYTE_ARRAY (returned as a 2D view of raw bytes or typed data when type_length aligns to a primitive size).

No data is copied. The returned view points directly into column_data.

Parameters
column_dataPointer to the column's contiguous value buffer.
num_valuesNumber of values in the column.
physical_typeParquet physical type of the column.
type_lengthByte length per value (only for FIXED_LEN_BYTE_ARRAY).
Returns
TensorView on success, Error on unsupported type.

Definition at line 681 of file tensor_bridge.hpp.

◆ wrap_vectors()

static expected< TensorView > signet::forge::ColumnToTensor::wrap_vectors ( const void *  column_data,
int64_t  num_vectors,
uint32_t  dimension 
)
inlinestatic

Wrap a contiguous FLOAT32_VECTOR column as a 2D TensorView.

The data is assumed to be densely packed float32 vectors, each of the given dimension. The returned shape is {num_vectors, dimension}.

Parameters
column_dataPointer to contiguous float data.
num_vectorsNumber of vectors.
dimensionElements per vector.
Returns
2D TensorView of shape {num_vectors, dimension}.

Definition at line 760 of file tensor_bridge.hpp.


The documentation for this class was generated from the following file: