Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
onnx_bridge.hpp
Go to the documentation of this file.
1// SPDX-License-Identifier: AGPL-3.0-or-later
2// Copyright 2026 Johnson Ogundeji
3#pragma once
4
18
20#include "signet/error.hpp"
21
22#include <cstdint>
23#include <string>
24#include <utility>
25#include <vector>
26
27namespace signet::forge {
28
39enum class OnnxTensorType : int32_t {
40 UNDEFINED = 0,
41 FLOAT = 1,
42 UINT8 = 2,
43 INT8 = 3,
44 UINT16 = 4,
45 INT16 = 5,
46 INT32 = 6,
47 INT64 = 7,
48 STRING = 8,
49 BOOL = 9,
50 FLOAT16 = 10,
51 DOUBLE = 11,
52 UINT32 = 12,
53 UINT64 = 13,
54 BFLOAT16 = 16
55};
56
59
82
113
115
139 void* data = nullptr;
140 std::vector<int64_t> shape;
142 size_t byte_size = 0;
143 bool is_owner = false;
144
150 [[nodiscard]] bool is_valid() const {
151 return data != nullptr
152 && byte_size > 0
153 && !shape.empty()
155 }
156};
157
160
177 if (!tensor.is_valid()) {
179 "cannot prepare invalid tensor for ONNX"};
180 }
181
182 if (!tensor.is_contiguous()) {
184 "non-contiguous tensors cannot be exported to ONNX; "
185 "call clone() first to produce a contiguous copy"};
186 }
187
188 // CWE-20: Improper Input Validation — all ONNX dimensions must be positive
189 for (auto d : tensor.shape().dims) {
190 if (d <= 0) {
192 "ONNX tensor dimensions must be positive"};
193 }
194 }
195
196 OnnxTensorInfo info;
197 // M28 WARNING: ONNX Runtime requires non-const void*. The caller MUST ensure
198 // the source tensor data is not backed by read-only memory (e.g., mmap PROT_READ).
199 // If the tensor originates from an mmap'd file, copy it first via OwnedTensor.
200 info.data = const_cast<void*>(tensor.data());
201 info.shape = tensor.shape().dims;
202 info.element_type = to_onnx_type(tensor.dtype());
203 info.byte_size = tensor.byte_size();
204 info.is_owner = false; // zero-copy: TensorView owns the data
205
208 "tensor dtype maps to UNDEFINED ONNX type"};
209 }
210
211 return info;
212}
213
223 return prepare_for_onnx(tensor.view());
224}
225
227
236 std::vector<std::string> names;
237 std::vector<OnnxTensorInfo> tensors;
238
243 [[nodiscard]] bool is_valid() const {
244 if (names.empty() || names.size() != tensors.size()) return false;
245 for (const auto& t : tensors) {
246 if (!t.is_valid()) return false;
247 }
248 return true;
249 }
250};
251
265 const std::vector<std::pair<std::string, TensorView>>& inputs)
266{
267 if (inputs.empty()) {
269 "cannot prepare empty input set for ONNX"};
270 }
271
272 OnnxInputSet result;
273 result.names.reserve(inputs.size());
274 result.tensors.reserve(inputs.size());
275
276 for (const auto& [name, tensor] : inputs) {
277 auto info = prepare_for_onnx(tensor);
278 if (!info) {
279 return Error{info.error().code,
280 "failed to prepare ONNX input '" + name + "': "
281 + info.error().message};
282 }
283 result.names.push_back(name);
284 result.tensors.push_back(std::move(*info));
285 }
286
287 return result;
288}
289
298inline const char* onnx_type_name(OnnxTensorType t) {
299 switch (t) {
300 case OnnxTensorType::UNDEFINED: return "UNDEFINED";
301 case OnnxTensorType::FLOAT: return "FLOAT";
302 case OnnxTensorType::UINT8: return "UINT8";
303 case OnnxTensorType::INT8: return "INT8";
304 case OnnxTensorType::UINT16: return "UINT16";
305 case OnnxTensorType::INT16: return "INT16";
306 case OnnxTensorType::INT32: return "INT32";
307 case OnnxTensorType::INT64: return "INT64";
308 case OnnxTensorType::STRING: return "STRING";
309 case OnnxTensorType::BOOL: return "BOOL";
310 case OnnxTensorType::FLOAT16: return "FLOAT16";
311 case OnnxTensorType::DOUBLE: return "DOUBLE";
312 case OnnxTensorType::UINT32: return "UINT32";
313 case OnnxTensorType::UINT64: return "UINT64";
314 case OnnxTensorType::BFLOAT16: return "BFLOAT16";
315 default: return "UNKNOWN";
316 }
317}
318
319} // namespace signet::forge
An owning tensor that manages its own memory via a std::vector<uint8_t> buffer.
TensorView view()
Get a mutable non-owning view.
A lightweight, non-owning view into a contiguous block of typed memory, interpreted as a multi-dimens...
bool is_valid() const noexcept
True if the view points to valid data.
bool is_contiguous() const noexcept
True if the data is densely packed (no stride gaps).
size_t byte_size() const noexcept
Total byte size of the tensor data (num_elements * element_size).
const TensorShape & shape() const noexcept
The shape of this tensor view.
TensorDataType dtype() const noexcept
The element data type.
void * data() noexcept
Raw mutable pointer to the underlying data buffer.
A lightweight result type that holds either a success value of type T or an Error.
Definition error.hpp:145
expected< OnnxTensorInfo > prepare_for_onnx(const TensorView &tensor)
Prepare a TensorView for ONNX Runtime consumption (zero-copy).
const char * onnx_type_name(OnnxTensorType t)
Return a human-readable string for an OnnxTensorType value.
OnnxTensorType
ONNX tensor element data types, mirroring OrtTensorElementDataType.
@ UNDEFINED
No type (invalid / uninitialized)
@ UINT32
32-bit unsigned integer
@ UINT16
16-bit unsigned integer
@ INT64
64-bit signed integer
@ INT16
16-bit signed integer
@ STRING
Variable-length string.
@ INT32
32-bit signed integer
@ UINT64
64-bit unsigned integer
@ BFLOAT16
Brain floating-point (bfloat16)
@ FLOAT16
16-bit IEEE float (float16)
@ FLOAT
32-bit IEEE float (float32)
@ UINT8
8-bit unsigned integer
@ INT8
8-bit signed integer
@ DOUBLE
64-bit IEEE float (float64)
OnnxTensorType to_onnx_type(TensorDataType dtype)
Convert a Signet TensorDataType to the corresponding OnnxTensorType.
expected< TensorDataType > from_onnx_type(OnnxTensorType ort_type)
Convert an OnnxTensorType back to a Signet TensorDataType.
@ UNSUPPORTED_TYPE
The file contains a Parquet physical or logical type that is not implemented.
@ INTERNAL_ERROR
An unexpected internal error that does not fit any other category.
@ INVALID_ARGUMENT
A caller-supplied argument is outside the valid range or violates a precondition.
expected< OnnxInputSet > prepare_inputs_for_onnx(const std::vector< std::pair< std::string, TensorView > > &inputs)
Prepare a batch of named TensorViews for ONNX Runtime inference.
TensorDataType
Element data type for tensor storage, mapping to ONNX/PyTorch/TF type enums.
@ FLOAT64
IEEE 754 double-precision (8 bytes)
@ INT64
Signed 64-bit integer.
@ INT16
Signed 16-bit integer.
@ INT32
Signed 32-bit integer.
@ FLOAT32
IEEE 754 single-precision (4 bytes)
@ FLOAT16
IEEE 754 half-precision (2 bytes)
@ UINT8
Unsigned 8-bit integer.
@ INT8
Signed 8-bit integer.
Lightweight error value carrying an ErrorCode and a human-readable message.
Definition error.hpp:101
ErrorCode code
The machine-readable error category.
Definition error.hpp:103
A set of named ONNX tensors for multi-input model inference.
std::vector< OnnxTensorInfo > tensors
Prepared tensor infos (parallel with names)
bool is_valid() const
Check whether all tensors are valid and the set is non-empty.
std::vector< std::string > names
Model input names (parallel with tensors)
Contains all information needed to create an OrtValue externally.
bool is_valid() const
Check whether this info is ready to be used with OrtApi::CreateTensorWithDataAsOrtValue.
OnnxTensorType element_type
ONNX element data type.
size_t byte_size
Total data size in bytes (product of shape * element size)
bool is_owner
If true, the data was allocated by the bridge and the caller must free it.
void * data
Pointer to contiguous tensor data (non-owning unless is_owner)
std::vector< int64_t > shape
ONNX shape dimensions (e.g. {batch, features})
std::vector< int64_t > dims
Dimension sizes (e.g. {32, 768} for a 32x768 matrix)
Zero-copy tensor bridge: maps Parquet column data directly into ML-framework-compatible tensor views ...