Point-in-time correct ML feature store reader over Parquet files.
More...
#include <feature_reader.hpp>
|
| | FeatureReader ()=default |
| | Default-construct an empty reader (use open() factory instead).
|
| |
| | FeatureReader (FeatureReader &&o) noexcept |
| |
| FeatureReader & | operator= (FeatureReader &&o) noexcept |
| |
| | FeatureReader (const FeatureReader &)=delete |
| |
| FeatureReader & | operator= (const FeatureReader &)=delete |
| |
| expected< std::optional< FeatureVector > > | as_of (const std::string &entity_id, int64_t timestamp_ns, const std::vector< std::string > &project={}) const |
| | Retrieve the latest version of an entity at or before the given timestamp.
|
| |
| expected< std::optional< FeatureVector > > | get (const std::string &entity_id, const std::vector< std::string > &project={}) const |
| | Retrieve the latest version of an entity regardless of timestamp.
|
| |
| expected< std::vector< FeatureVector > > | history (const std::string &entity_id, int64_t start_ns, int64_t end_ns, const std::vector< std::string > &project={}) const |
| | Retrieve all versions of an entity in the inclusive timestamp range.
|
| |
| expected< std::vector< FeatureVector > > | as_of_batch (const std::vector< std::string > &entity_ids, int64_t timestamp_ns, const std::vector< std::string > &project={}) const |
| | Batch as_of query: retrieve the latest version of multiple entities at one timestamp.
|
| |
| const std::vector< std::string > & | feature_names () const |
| | Return the ordered feature column names discovered from the first readable file.
|
| |
| size_t | num_features () const |
| | Number of feature columns in the schema.
|
| |
| size_t | num_entities () const |
| | Number of distinct entities in the index.
|
| |
| size_t | total_rows () const |
| | Total number of rows indexed across all files and row groups.
|
| |
| size_t | failed_file_count () const |
| | L22: Number of files that failed to open during build_index().
|
| |
Point-in-time correct ML feature store reader over Parquet files.
Opens one or more Parquet files written by FeatureWriter, builds an in-memory index at open() time, then serves O(log N) queries with no additional disk I/O.
Usage:
auto fv = r->as_of(
"BTCUSDT",
now_ns);
auto hist = r->history("BTCUSDT", t0, t1);
static expected< FeatureReader > open(Options opts)
Open all Parquet files and build the in-memory entity/timestamp index.
int64_t now_ns()
Return the current time as nanoseconds since the Unix epoch (UTC).
- Note
- Movable but not copyable. Internally holds mutable ParquetReader instances because read_column() updates a decompression cache.
- See also
- FeatureReaderOptions, FeatureWriter, FeatureVector
Definition at line 71 of file feature_reader.hpp.
◆ Options
◆ FeatureReader() [1/3]
| signet::forge::FeatureReader::FeatureReader |
( |
| ) |
|
|
default |
Default-construct an empty reader (use open() factory instead).
◆ FeatureReader() [2/3]
| signet::forge::FeatureReader::FeatureReader |
( |
FeatureReader && |
o | ) |
|
|
inlinenoexcept |
◆ FeatureReader() [3/3]
| signet::forge::FeatureReader::FeatureReader |
( |
const FeatureReader & |
| ) |
|
|
delete |
◆ as_of()
| expected< std::optional< FeatureVector > > signet::forge::FeatureReader::as_of |
( |
const std::string & |
entity_id, |
|
|
int64_t |
timestamp_ns, |
|
|
const std::vector< std::string > & |
project = {} |
|
) |
| const |
|
inline |
Retrieve the latest version of an entity at or before the given timestamp.
Uses binary search over the sorted index. O(log N) per entity.
- Parameters
-
| entity_id | The entity key to look up (e.g. "BTCUSDT"). |
| timestamp_ns | Upper bound timestamp (inclusive) in nanoseconds. |
| project | Optional subset of feature names to return; empty means all. |
- Returns
- The matching FeatureVector, or nullopt if the entity is unknown or all its entries have timestamps after the query time.
Definition at line 148 of file feature_reader.hpp.
◆ as_of_batch()
| expected< std::vector< FeatureVector > > signet::forge::FeatureReader::as_of_batch |
( |
const std::vector< std::string > & |
entity_ids, |
|
|
int64_t |
timestamp_ns, |
|
|
const std::vector< std::string > & |
project = {} |
|
) |
| const |
|
inline |
Batch as_of query: retrieve the latest version of multiple entities at one timestamp.
Entities not found are silently omitted from the result.
- Parameters
-
| entity_ids | Vector of entity keys to look up. |
| timestamp_ns | Upper bound timestamp (inclusive) in nanoseconds. |
| project | Optional subset of feature names; empty means all. |
- Returns
- A vector of FeatureVectors for all found entities.
Definition at line 244 of file feature_reader.hpp.
◆ failed_file_count()
| size_t signet::forge::FeatureReader::failed_file_count |
( |
| ) |
const |
|
inline |
L22: Number of files that failed to open during build_index().
Exposed for error observability — callers can detect partial index builds caused by missing/corrupt files and alert or retry accordingly.
Definition at line 279 of file feature_reader.hpp.
◆ feature_names()
| const std::vector< std::string > & signet::forge::FeatureReader::feature_names |
( |
| ) |
const |
|
inline |
Return the ordered feature column names discovered from the first readable file.
Definition at line 267 of file feature_reader.hpp.
◆ get()
| expected< std::optional< FeatureVector > > signet::forge::FeatureReader::get |
( |
const std::string & |
entity_id, |
|
|
const std::vector< std::string > & |
project = {} |
|
) |
| const |
|
inline |
Retrieve the latest version of an entity regardless of timestamp.
Equivalent to as_of(entity_id, INT64_MAX, project).
- Parameters
-
| entity_id | The entity key to look up. |
| project | Optional subset of feature names; empty means all. |
- Returns
- The latest FeatureVector, or nullopt if the entity is unknown.
Definition at line 187 of file feature_reader.hpp.
◆ history()
| expected< std::vector< FeatureVector > > signet::forge::FeatureReader::history |
( |
const std::string & |
entity_id, |
|
|
int64_t |
start_ns, |
|
|
int64_t |
end_ns, |
|
|
const std::vector< std::string > & |
project = {} |
|
) |
| const |
|
inline |
Retrieve all versions of an entity in the inclusive timestamp range.
- Parameters
-
| entity_id | The entity key to look up. |
| start_ns | Lower bound timestamp (inclusive) in nanoseconds. |
| end_ns | Upper bound timestamp (inclusive) in nanoseconds. |
| project | Optional subset of feature names; empty means all. |
- Returns
- A vector of FeatureVectors sorted by ascending timestamp, or an empty vector if the entity is unknown.
Definition at line 207 of file feature_reader.hpp.
◆ num_entities()
| size_t signet::forge::FeatureReader::num_entities |
( |
| ) |
const |
|
inline |
◆ num_features()
| size_t signet::forge::FeatureReader::num_features |
( |
| ) |
const |
|
inline |
◆ open()
Open all Parquet files and build the in-memory entity/timestamp index.
- Parameters
-
| opts | Reader options specifying which files to open. |
- Returns
- A fully-indexed FeatureReader, or an Error on failure.
Definition at line 84 of file feature_reader.hpp.
◆ operator=() [1/2]
◆ operator=() [2/2]
◆ total_rows()
| size_t signet::forge::FeatureReader::total_rows |
( |
| ) |
const |
|
inline |
Total number of rows indexed across all files and row groups.
Definition at line 275 of file feature_reader.hpp.
The documentation for this class was generated from the following file: