Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
column_index.hpp File Reference

ColumnIndex, OffsetIndex, and ColumnIndexBuilder for predicate pushdown. More...

#include "signet/thrift/compact.hpp"
#include "signet/statistics.hpp"
#include <cstdint>
#include <cstring>
#include <string>
#include <vector>

Go to the source code of this file.

Classes

struct  signet::forge::PageLocation
 File offset and size descriptor for a single data page. More...
 
struct  signet::forge::OffsetIndex
 Page locations for random access within a column chunk. More...
 
struct  signet::forge::ColumnIndex
 Per-page min/max statistics for predicate pushdown. More...
 
class  signet::forge::ColumnIndexBuilder
 Builder that accumulates per-page statistics during column writing. More...
 

Namespaces

namespace  signet
 
namespace  signet::forge
 

Detailed Description

ColumnIndex, OffsetIndex, and ColumnIndexBuilder for predicate pushdown.

Per-column-chunk structures for predicate pushdown and random page access. Written after the row groups but before the footer in the Parquet file.

ColumnIndex stores per-page min/max statistics, enabling readers to skip pages that cannot contain matching data. OffsetIndex stores page locations for efficient random access into column chunks.

Thrift field IDs follow the canonical parquet.thrift specification:

  • ColumnIndex: 1=null_pages, 2=min_values, 3=max_values, 4=boundary_order, 5=null_counts
  • OffsetIndex: 1=page_locations (list of PageLocation)
  • PageLocation: 1=offset, 2=compressed_page_size, 3=first_row_index

Definition in file column_index.hpp.