Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::FeatureReader Class Reference

Point-in-time correct ML feature store reader over Parquet files. More...

#include <feature_reader.hpp>

Public Types

using Options = FeatureReaderOptions
 Alias for the options struct.
 

Public Member Functions

 FeatureReader ()=default
 Default-construct an empty reader (use open() factory instead).
 
 FeatureReader (FeatureReader &&o) noexcept
 
FeatureReaderoperator= (FeatureReader &&o) noexcept
 
 FeatureReader (const FeatureReader &)=delete
 
FeatureReaderoperator= (const FeatureReader &)=delete
 
expected< std::optional< FeatureVector > > as_of (const std::string &entity_id, int64_t timestamp_ns, const std::vector< std::string > &project={}) const
 Retrieve the latest version of an entity at or before the given timestamp.
 
expected< std::optional< FeatureVector > > get (const std::string &entity_id, const std::vector< std::string > &project={}) const
 Retrieve the latest version of an entity regardless of timestamp.
 
expected< std::vector< FeatureVector > > history (const std::string &entity_id, int64_t start_ns, int64_t end_ns, const std::vector< std::string > &project={}) const
 Retrieve all versions of an entity in the inclusive timestamp range.
 
expected< std::vector< FeatureVector > > as_of_batch (const std::vector< std::string > &entity_ids, int64_t timestamp_ns, const std::vector< std::string > &project={}) const
 Batch as_of query: retrieve the latest version of multiple entities at one timestamp.
 
const std::vector< std::string > & feature_names () const
 Return the ordered feature column names discovered from the first readable file.
 
size_t num_features () const
 Number of feature columns in the schema.
 
size_t num_entities () const
 Number of distinct entities in the index.
 
size_t total_rows () const
 Total number of rows indexed across all files and row groups.
 
size_t failed_file_count () const
 L22: Number of files that failed to open during build_index().
 

Static Public Member Functions

static expected< FeatureReaderopen (Options opts)
 Open all Parquet files and build the in-memory entity/timestamp index.
 

Detailed Description

Point-in-time correct ML feature store reader over Parquet files.

Opens one or more Parquet files written by FeatureWriter, builds an in-memory index at open() time, then serves O(log N) queries with no additional disk I/O.

Usage:

auto r = FeatureReader::open({.parquet_files = fw.output_files()});
auto fv = r->as_of("BTCUSDT", now_ns); // latest version <= now_ns
auto hist = r->history("BTCUSDT", t0, t1); // all versions in [t0, t1]
static expected< FeatureReader > open(Options opts)
Open all Parquet files and build the in-memory entity/timestamp index.
int64_t now_ns()
Return the current time as nanoseconds since the Unix epoch (UTC).
Note
Movable but not copyable. Internally holds mutable ParquetReader instances because read_column() updates a decompression cache.
See also
FeatureReaderOptions, FeatureWriter, FeatureVector

Definition at line 71 of file feature_reader.hpp.

Member Typedef Documentation

◆ Options

Alias for the options struct.

Definition at line 74 of file feature_reader.hpp.

Constructor & Destructor Documentation

◆ FeatureReader() [1/3]

signet::forge::FeatureReader::FeatureReader ( )
default

Default-construct an empty reader (use open() factory instead).

◆ FeatureReader() [2/3]

signet::forge::FeatureReader::FeatureReader ( FeatureReader &&  o)
inlinenoexcept

Definition at line 104 of file feature_reader.hpp.

◆ FeatureReader() [3/3]

signet::forge::FeatureReader::FeatureReader ( const FeatureReader )
delete

Member Function Documentation

◆ as_of()

expected< std::optional< FeatureVector > > signet::forge::FeatureReader::as_of ( const std::string &  entity_id,
int64_t  timestamp_ns,
const std::vector< std::string > &  project = {} 
) const
inline

Retrieve the latest version of an entity at or before the given timestamp.

Uses binary search over the sorted index. O(log N) per entity.

Parameters
entity_idThe entity key to look up (e.g. "BTCUSDT").
timestamp_nsUpper bound timestamp (inclusive) in nanoseconds.
projectOptional subset of feature names to return; empty means all.
Returns
The matching FeatureVector, or nullopt if the entity is unknown or all its entries have timestamps after the query time.

Definition at line 148 of file feature_reader.hpp.

◆ as_of_batch()

expected< std::vector< FeatureVector > > signet::forge::FeatureReader::as_of_batch ( const std::vector< std::string > &  entity_ids,
int64_t  timestamp_ns,
const std::vector< std::string > &  project = {} 
) const
inline

Batch as_of query: retrieve the latest version of multiple entities at one timestamp.

Entities not found are silently omitted from the result.

Parameters
entity_idsVector of entity keys to look up.
timestamp_nsUpper bound timestamp (inclusive) in nanoseconds.
projectOptional subset of feature names; empty means all.
Returns
A vector of FeatureVectors for all found entities.

Definition at line 244 of file feature_reader.hpp.

◆ failed_file_count()

size_t signet::forge::FeatureReader::failed_file_count ( ) const
inline

L22: Number of files that failed to open during build_index().

Exposed for error observability — callers can detect partial index builds caused by missing/corrupt files and alert or retry accordingly.

Definition at line 279 of file feature_reader.hpp.

◆ feature_names()

const std::vector< std::string > & signet::forge::FeatureReader::feature_names ( ) const
inline

Return the ordered feature column names discovered from the first readable file.

Definition at line 267 of file feature_reader.hpp.

◆ get()

expected< std::optional< FeatureVector > > signet::forge::FeatureReader::get ( const std::string &  entity_id,
const std::vector< std::string > &  project = {} 
) const
inline

Retrieve the latest version of an entity regardless of timestamp.

Equivalent to as_of(entity_id, INT64_MAX, project).

Parameters
entity_idThe entity key to look up.
projectOptional subset of feature names; empty means all.
Returns
The latest FeatureVector, or nullopt if the entity is unknown.

Definition at line 187 of file feature_reader.hpp.

◆ history()

expected< std::vector< FeatureVector > > signet::forge::FeatureReader::history ( const std::string &  entity_id,
int64_t  start_ns,
int64_t  end_ns,
const std::vector< std::string > &  project = {} 
) const
inline

Retrieve all versions of an entity in the inclusive timestamp range.

Parameters
entity_idThe entity key to look up.
start_nsLower bound timestamp (inclusive) in nanoseconds.
end_nsUpper bound timestamp (inclusive) in nanoseconds.
projectOptional subset of feature names; empty means all.
Returns
A vector of FeatureVectors sorted by ascending timestamp, or an empty vector if the entity is unknown.

Definition at line 207 of file feature_reader.hpp.

◆ num_entities()

size_t signet::forge::FeatureReader::num_entities ( ) const
inline

Number of distinct entities in the index.

Definition at line 273 of file feature_reader.hpp.

◆ num_features()

size_t signet::forge::FeatureReader::num_features ( ) const
inline

Number of feature columns in the schema.

Definition at line 271 of file feature_reader.hpp.

◆ open()

static expected< FeatureReader > signet::forge::FeatureReader::open ( Options  opts)
inlinestatic

Open all Parquet files and build the in-memory entity/timestamp index.

Parameters
optsReader options specifying which files to open.
Returns
A fully-indexed FeatureReader, or an Error on failure.

Definition at line 84 of file feature_reader.hpp.

◆ operator=() [1/2]

FeatureReader & signet::forge::FeatureReader::operator= ( const FeatureReader )
delete

◆ operator=() [2/2]

FeatureReader & signet::forge::FeatureReader::operator= ( FeatureReader &&  o)
inlinenoexcept

Definition at line 115 of file feature_reader.hpp.

◆ total_rows()

size_t signet::forge::FeatureReader::total_rows ( ) const
inline

Total number of rows indexed across all files and row groups.

Definition at line 275 of file feature_reader.hpp.


The documentation for this class was generated from the following file: