Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::ColumnStatistics Class Reference

Per-column-chunk statistics tracker. More...

#include <statistics.hpp>

Public Member Functions

 ColumnStatistics ()
 Default constructor – initializes all counters to zero.
 
template<typename T >
void update (const T &value)
 Update statistics with a non-null typed value.
 
void update_null ()
 Record a null value (increments null count only, no min/max update).
 
void reset ()
 Reset all statistics to initial state.
 
int64_t null_count () const
 Number of null values recorded.
 
int64_t num_values () const
 Number of non-null values recorded.
 
std::optional< int64_t > distinct_count () const
 Optional distinct-value count (invalidated on merge).
 
bool has_min_max () const
 Whether at least one non-null value has been recorded (min/max valid).
 
const std::vector< uint8_t > & min_bytes () const
 Raw little-endian bytes of the minimum value.
 
const std::vector< uint8_t > & max_bytes () const
 Raw little-endian bytes of the maximum value.
 
template<typename T >
min_as () const
 Reconstruct the typed minimum value from stored bytes.
 
template<typename T >
max_as () const
 Reconstruct the typed maximum value from stored bytes.
 
void set_distinct_count (int64_t count)
 Set the distinct-value count (e.g.
 
void set_type (PhysicalType t)
 Set the physical type for type-aware min/max comparison during merge.
 
PhysicalType type () const
 Get the physical type associated with these statistics.
 
void merge (const ColumnStatistics &other)
 Merge another ColumnStatistics into this one.
 

Detailed Description

Per-column-chunk statistics tracker.

Tracks min/max values (stored as little-endian byte vectors), null count, value count, and an optional distinct count. Values of any supported Parquet physical type can be fed to update(); min/max are maintained via typed comparison (not raw byte comparison for numeric types).

ColumnStatistics instances can be merged to combine page-level statistics into chunk-level statistics.

See also
ColumnWriter (primary producer of statistics)
ColumnIndex (consumes min/max per page for predicate pushdown)

Definition at line 94 of file statistics.hpp.

Constructor & Destructor Documentation

◆ ColumnStatistics()

signet::forge::ColumnStatistics::ColumnStatistics ( )
inline

Default constructor – initializes all counters to zero.

Definition at line 97 of file statistics.hpp.

Member Function Documentation

◆ distinct_count()

std::optional< int64_t > signet::forge::ColumnStatistics::distinct_count ( ) const
inline

Optional distinct-value count (invalidated on merge).

Definition at line 156 of file statistics.hpp.

◆ has_min_max()

bool signet::forge::ColumnStatistics::has_min_max ( ) const
inline

Whether at least one non-null value has been recorded (min/max valid).

Definition at line 158 of file statistics.hpp.

◆ max_as()

template<typename T >
T signet::forge::ColumnStatistics::max_as ( ) const
inline

Reconstruct the typed maximum value from stored bytes.

Template Parameters
TThe original value type used during update().
Note
Caller is responsible for matching T to the PhysicalType tracked by type_. Mismatched types produce undefined byte-reinterpretation (CWE-843). M25: See min_as() note on runtime type-safety verification.

Definition at line 190 of file statistics.hpp.

◆ max_bytes()

const std::vector< uint8_t > & signet::forge::ColumnStatistics::max_bytes ( ) const
inline

Raw little-endian bytes of the maximum value.

Definition at line 163 of file statistics.hpp.

◆ merge()

void signet::forge::ColumnStatistics::merge ( const ColumnStatistics other)
inline

Merge another ColumnStatistics into this one.

Null counts and value counts are summed. Min/max are updated via byte-level comparison (the caller must ensure both instances track the same column/type). Distinct count is invalidated on merge because the union cannot be computed without the full distinct set.

Parameters
otherThe statistics to merge into this instance.

Definition at line 223 of file statistics.hpp.

◆ min_as()

template<typename T >
T signet::forge::ColumnStatistics::min_as ( ) const
inline

Reconstruct the typed minimum value from stored bytes.

Template Parameters
TThe original value type used during update().
Note
Caller is responsible for matching T to the PhysicalType tracked by type_. Mismatched types produce undefined byte-reinterpretation (CWE-843). M25: There is no runtime type check because the stored bytes are type-erased. If a runtime check is needed, compare sizeof(T) against min_bytes().size() or use the type() accessor to verify the PhysicalType before calling this method.

Definition at line 174 of file statistics.hpp.

◆ min_bytes()

const std::vector< uint8_t > & signet::forge::ColumnStatistics::min_bytes ( ) const
inline

Raw little-endian bytes of the minimum value.

Definition at line 161 of file statistics.hpp.

◆ null_count()

int64_t signet::forge::ColumnStatistics::null_count ( ) const
inline

Number of null values recorded.

Definition at line 152 of file statistics.hpp.

◆ num_values()

int64_t signet::forge::ColumnStatistics::num_values ( ) const
inline

Number of non-null values recorded.

Definition at line 154 of file statistics.hpp.

◆ reset()

void signet::forge::ColumnStatistics::reset ( )
inline

Reset all statistics to initial state.

Definition at line 140 of file statistics.hpp.

◆ set_distinct_count()

void signet::forge::ColumnStatistics::set_distinct_count ( int64_t  count)
inline

Set the distinct-value count (e.g.

from a dictionary encoder).

Parameters
countThe number of distinct values.

Definition at line 204 of file statistics.hpp.

◆ set_type()

void signet::forge::ColumnStatistics::set_type ( PhysicalType  t)
inline

Set the physical type for type-aware min/max comparison during merge.

Parameters
tThe Parquet physical type of the column.

Definition at line 208 of file statistics.hpp.

◆ type()

PhysicalType signet::forge::ColumnStatistics::type ( ) const
inline

Get the physical type associated with these statistics.

Definition at line 211 of file statistics.hpp.

◆ update()

template<typename T >
void signet::forge::ColumnStatistics::update ( const T &  value)
inline

Update statistics with a non-null typed value.

Dispatches to the appropriate internal updater based on T. Supported types: bool, float, double, std::string, and all other arithmetic types (int32_t, int64_t, etc.).

Template Parameters
TThe value type (must be arithmetic or std::string).
Parameters
valueThe non-null value to incorporate.

Definition at line 110 of file statistics.hpp.

◆ update_null()

void signet::forge::ColumnStatistics::update_null ( )
inline

Record a null value (increments null count only, no min/max update).

Definition at line 135 of file statistics.hpp.


The documentation for this class was generated from the following file: