Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::WriterOptions Struct Reference

Configuration options for ParquetWriter. More...

#include <writer.hpp>

Public Attributes

int64_t row_group_size = 64 * 1024
 Target number of rows per row group.
 
std::string created_by = SIGNET_CREATED_BY
 Value written into the Parquet footer's "created_by" field.
 
std::vector< thrift::KeyValuefile_metadata
 Custom key-value metadata pairs embedded in the Parquet footer.
 
Encoding default_encoding = Encoding::PLAIN
 Default encoding applied to every column that does not have a per-column override in column_encodings.
 
Compression compression = Compression::UNCOMPRESSED
 Compression codec applied to every data and dictionary page.
 
std::unordered_map< std::string, Encodingcolumn_encodings
 Per-column encoding overrides keyed by column name.
 
bool auto_encoding = false
 When true, the writer automatically selects the best encoding for each column based on its physical type (e.g.
 
bool auto_compression = false
 When true, the writer samples page data and selects the most effective compression codec automatically.
 
bool enable_page_index = false
 When true, a ColumnIndex and OffsetIndex are written for each column chunk, enabling predicate pushdown during reads.
 
bool enable_bloom_filter = false
 When true, a Split Block Bloom Filter is written for each column (or for the subset named in bloom_filter_columns).
 
double bloom_filter_fpr = 0.01
 Target false-positive rate for bloom filters. Default: 1 %.
 
std::unordered_set< std::string > bloom_filter_columns
 Column names for which bloom filters should be generated.
 

Detailed Description

Configuration options for ParquetWriter.

Controls row group sizing, encoding, compression, bloom filters, page indexes, file-level metadata, and (optionally) Parquet Modular Encryption. An instance of this struct is passed to ParquetWriter::open(). All fields have sensible defaults so a default-constructed WriterOptions produces uncompressed, PLAIN-encoded Parquet files.

See also
ParquetWriter::open

Definition at line 188 of file writer.hpp.

Member Data Documentation

◆ auto_compression

bool signet::forge::WriterOptions::auto_compression = false

When true, the writer samples page data and selects the most effective compression codec automatically.

Definition at line 222 of file writer.hpp.

◆ auto_encoding

bool signet::forge::WriterOptions::auto_encoding = false

When true, the writer automatically selects the best encoding for each column based on its physical type (e.g.

DELTA_BINARY_PACKED for INT32/INT64, BYTE_STREAM_SPLIT for FLOAT/DOUBLE, RLE for BOOLEAN). Per-column overrides still take priority.

Definition at line 218 of file writer.hpp.

◆ bloom_filter_columns

std::unordered_set<std::string> signet::forge::WriterOptions::bloom_filter_columns

Column names for which bloom filters should be generated.

An empty set means all columns get a bloom filter when enable_bloom_filter is true.

Definition at line 242 of file writer.hpp.

◆ bloom_filter_fpr

double signet::forge::WriterOptions::bloom_filter_fpr = 0.01

Target false-positive rate for bloom filters. Default: 1 %.

Definition at line 237 of file writer.hpp.

◆ column_encodings

std::unordered_map<std::string, Encoding> signet::forge::WriterOptions::column_encodings

Per-column encoding overrides keyed by column name.

Entries here take priority over default_encoding and auto_encoding.

Definition at line 212 of file writer.hpp.

◆ compression

Compression signet::forge::WriterOptions::compression = Compression::UNCOMPRESSED

Compression codec applied to every data and dictionary page.

Default: UNCOMPRESSED.

Definition at line 208 of file writer.hpp.

◆ created_by

std::string signet::forge::WriterOptions::created_by = SIGNET_CREATED_BY

Value written into the Parquet footer's "created_by" field.

Definition at line 195 of file writer.hpp.

◆ default_encoding

Encoding signet::forge::WriterOptions::default_encoding = Encoding::PLAIN

Default encoding applied to every column that does not have a per-column override in column_encodings.

Default: PLAIN.

Definition at line 204 of file writer.hpp.

◆ enable_bloom_filter

bool signet::forge::WriterOptions::enable_bloom_filter = false

When true, a Split Block Bloom Filter is written for each column (or for the subset named in bloom_filter_columns).

Definition at line 234 of file writer.hpp.

◆ enable_page_index

bool signet::forge::WriterOptions::enable_page_index = false

When true, a ColumnIndex and OffsetIndex are written for each column chunk, enabling predicate pushdown during reads.

Definition at line 228 of file writer.hpp.

◆ file_metadata

std::vector<thrift::KeyValue> signet::forge::WriterOptions::file_metadata

Custom key-value metadata pairs embedded in the Parquet footer.

Definition at line 198 of file writer.hpp.

◆ row_group_size

int64_t signet::forge::WriterOptions::row_group_size = 64 * 1024

Target number of rows per row group.

When the row-based API accumulates this many pending rows, flush_row_group() is called automatically. Default: 65 536.

Definition at line 192 of file writer.hpp.


The documentation for this struct was generated from the following file: