Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::ColumnIndex Struct Reference

Per-page min/max statistics for predicate pushdown. More...

#include <column_index.hpp>

Public Types

enum class  BoundaryOrder : int32_t { UNORDERED = 0 , ASCENDING = 1 , DESCENDING = 2 }
 Ordering of min values across pages, used to short-circuit filtering. More...
 

Public Member Functions

bool valid () const
 Check if deserialization was successful.
 
void serialize (thrift::CompactEncoder &enc) const
 Serialize this ColumnIndex to a Thrift compact encoder.
 
void deserialize (thrift::CompactDecoder &dec)
 Deserialize this ColumnIndex from a Thrift compact decoder.
 
std::vector< size_t > filter_pages (const std::string &min_val, const std::string &max_val, PhysicalType physical_type=PhysicalType::BYTE_ARRAY) const
 Filter pages by a value range for predicate pushdown.
 

Public Attributes

bool valid_ = true
 False if deserialization failed (M-V7).
 
std::vector< bool > null_pages
 True if the corresponding page is all nulls.
 
std::vector< std::string > min_values
 Binary-encoded minimum value per page.
 
std::vector< std::string > max_values
 Binary-encoded maximum value per page.
 
BoundaryOrder boundary_order = BoundaryOrder::UNORDERED
 Boundary order of min values.
 
std::vector< int64_t > null_counts
 Null count per page (optional).
 

Detailed Description

Per-page min/max statistics for predicate pushdown.

Stores binary-encoded min/max values for each data page in a column chunk, along with null-page flags, boundary ordering, and optional null counts. Readers use filter_pages() to eliminate pages whose value ranges do not overlap the query predicate.

See also
OffsetIndex (companion for page offsets)
ColumnIndexBuilder (builder pattern for constructing during writes)

Definition at line 147 of file column_index.hpp.

Member Enumeration Documentation

◆ BoundaryOrder

enum class signet::forge::ColumnIndex::BoundaryOrder : int32_t
strong

Ordering of min values across pages, used to short-circuit filtering.

Enumerator
UNORDERED 

Min values have no particular order.

ASCENDING 

Min values are non-decreasing across pages.

DESCENDING 

Min values are non-increasing across pages.

Definition at line 154 of file column_index.hpp.

Member Function Documentation

◆ deserialize()

void signet::forge::ColumnIndex::deserialize ( thrift::CompactDecoder dec)
inline

Deserialize this ColumnIndex from a Thrift compact decoder.

Parameters
decThe decoder to read from.

Definition at line 215 of file column_index.hpp.

◆ filter_pages()

std::vector< size_t > signet::forge::ColumnIndex::filter_pages ( const std::string &  min_val,
const std::string &  max_val,
PhysicalType  physical_type = PhysicalType::BYTE_ARRAY 
) const
inline

Filter pages by a value range for predicate pushdown.

Given a range [min_val, max_val] (binary-encoded, same encoding as min_values/max_values), returns page indices that might contain matching data. A page is excluded only if its max is strictly less than min_val or its min is strictly greater than max_val. All-null pages are always excluded.

Parameters
min_valLower bound of the query range (binary-encoded).
max_valUpper bound of the query range (binary-encoded).
physical_typePhysical type for typed comparison (default: BYTE_ARRAY = lexicographic).
Returns
A vector of page indices (0-based) that may contain matches.

Definition at line 294 of file column_index.hpp.

◆ serialize()

void signet::forge::ColumnIndex::serialize ( thrift::CompactEncoder enc) const
inline

Serialize this ColumnIndex to a Thrift compact encoder.

Parameters
encThe encoder to write to.

Definition at line 168 of file column_index.hpp.

◆ valid()

bool signet::forge::ColumnIndex::valid ( ) const
inline

Check if deserialization was successful.

Definition at line 164 of file column_index.hpp.

Member Data Documentation

◆ boundary_order

BoundaryOrder signet::forge::ColumnIndex::boundary_order = BoundaryOrder::UNORDERED

Boundary order of min values.

Definition at line 159 of file column_index.hpp.

◆ max_values

std::vector<std::string> signet::forge::ColumnIndex::max_values

Binary-encoded maximum value per page.

Definition at line 151 of file column_index.hpp.

◆ min_values

std::vector<std::string> signet::forge::ColumnIndex::min_values

Binary-encoded minimum value per page.

Definition at line 150 of file column_index.hpp.

◆ null_counts

std::vector<int64_t> signet::forge::ColumnIndex::null_counts

Null count per page (optional).

Definition at line 161 of file column_index.hpp.

◆ null_pages

std::vector<bool> signet::forge::ColumnIndex::null_pages

True if the corresponding page is all nulls.

Definition at line 149 of file column_index.hpp.

◆ valid_

bool signet::forge::ColumnIndex::valid_ = true

False if deserialization failed (M-V7).

Definition at line 148 of file column_index.hpp.


The documentation for this struct was generated from the following file: