![]() |
Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
|
DEMO |
Per-column-chunk statistics tracker. More...
#include <statistics.hpp>
Public Member Functions | |
| ColumnStatistics () | |
| Default constructor – initializes all counters to zero. | |
| template<typename T > | |
| void | update (const T &value) |
| Update statistics with a non-null typed value. | |
| void | update_null () |
| Record a null value (increments null count only, no min/max update). | |
| void | reset () |
| Reset all statistics to initial state. | |
| int64_t | null_count () const |
| Number of null values recorded. | |
| int64_t | num_values () const |
| Number of non-null values recorded. | |
| std::optional< int64_t > | distinct_count () const |
| Optional distinct-value count (invalidated on merge). | |
| bool | has_min_max () const |
| Whether at least one non-null value has been recorded (min/max valid). | |
| const std::vector< uint8_t > & | min_bytes () const |
| Raw little-endian bytes of the minimum value. | |
| const std::vector< uint8_t > & | max_bytes () const |
| Raw little-endian bytes of the maximum value. | |
| template<typename T > | |
| T | min_as () const |
| Reconstruct the typed minimum value from stored bytes. | |
| template<typename T > | |
| T | max_as () const |
| Reconstruct the typed maximum value from stored bytes. | |
| void | set_distinct_count (int64_t count) |
| Set the distinct-value count (e.g. | |
| void | set_type (PhysicalType t) |
| Set the physical type for type-aware min/max comparison during merge. | |
| PhysicalType | type () const |
| Get the physical type associated with these statistics. | |
| void | merge (const ColumnStatistics &other) |
| Merge another ColumnStatistics into this one. | |
Per-column-chunk statistics tracker.
Tracks min/max values (stored as little-endian byte vectors), null count, value count, and an optional distinct count. Values of any supported Parquet physical type can be fed to update(); min/max are maintained via typed comparison (not raw byte comparison for numeric types).
ColumnStatistics instances can be merged to combine page-level statistics into chunk-level statistics.
Definition at line 94 of file statistics.hpp.
|
inline |
Default constructor – initializes all counters to zero.
Definition at line 97 of file statistics.hpp.
|
inline |
Optional distinct-value count (invalidated on merge).
Definition at line 156 of file statistics.hpp.
|
inline |
Whether at least one non-null value has been recorded (min/max valid).
Definition at line 158 of file statistics.hpp.
|
inline |
Reconstruct the typed maximum value from stored bytes.
| T | The original value type used during update(). |
Definition at line 190 of file statistics.hpp.
|
inline |
Raw little-endian bytes of the maximum value.
Definition at line 163 of file statistics.hpp.
|
inline |
Merge another ColumnStatistics into this one.
Null counts and value counts are summed. Min/max are updated via byte-level comparison (the caller must ensure both instances track the same column/type). Distinct count is invalidated on merge because the union cannot be computed without the full distinct set.
| other | The statistics to merge into this instance. |
Definition at line 223 of file statistics.hpp.
|
inline |
Reconstruct the typed minimum value from stored bytes.
| T | The original value type used during update(). |
Definition at line 174 of file statistics.hpp.
|
inline |
Raw little-endian bytes of the minimum value.
Definition at line 161 of file statistics.hpp.
|
inline |
Number of null values recorded.
Definition at line 152 of file statistics.hpp.
|
inline |
Number of non-null values recorded.
Definition at line 154 of file statistics.hpp.
|
inline |
Reset all statistics to initial state.
Definition at line 140 of file statistics.hpp.
|
inline |
Set the distinct-value count (e.g.
from a dictionary encoder).
| count | The number of distinct values. |
Definition at line 204 of file statistics.hpp.
|
inline |
Set the physical type for type-aware min/max comparison during merge.
| t | The Parquet physical type of the column. |
Definition at line 208 of file statistics.hpp.
|
inline |
Get the physical type associated with these statistics.
Definition at line 211 of file statistics.hpp.
|
inline |
Update statistics with a non-null typed value.
Dispatches to the appropriate internal updater based on T. Supported types: bool, float, double, std::string, and all other arithmetic types (int32_t, int64_t, etc.).
| T | The value type (must be arithmetic or std::string). |
| value | The non-null value to incorporate. |
Definition at line 110 of file statistics.hpp.
|
inline |
Record a null value (increments null count only, no min/max update).
Definition at line 135 of file statistics.hpp.