![]() |
Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
|
DEMO |
Configuration options for ParquetWriter. More...
#include <writer.hpp>
Public Attributes | |
| int64_t | row_group_size = 64 * 1024 |
| Target number of rows per row group. | |
| std::string | created_by = SIGNET_CREATED_BY |
| Value written into the Parquet footer's "created_by" field. | |
| std::vector< thrift::KeyValue > | file_metadata |
| Custom key-value metadata pairs embedded in the Parquet footer. | |
| Encoding | default_encoding = Encoding::PLAIN |
| Default encoding applied to every column that does not have a per-column override in column_encodings. | |
| Compression | compression = Compression::UNCOMPRESSED |
| Compression codec applied to every data and dictionary page. | |
| std::unordered_map< std::string, Encoding > | column_encodings |
| Per-column encoding overrides keyed by column name. | |
| bool | auto_encoding = false |
| When true, the writer automatically selects the best encoding for each column based on its physical type (e.g. | |
| bool | auto_compression = false |
| When true, the writer samples page data and selects the most effective compression codec automatically. | |
| bool | enable_page_index = false |
| When true, a ColumnIndex and OffsetIndex are written for each column chunk, enabling predicate pushdown during reads. | |
| bool | enable_bloom_filter = false |
| When true, a Split Block Bloom Filter is written for each column (or for the subset named in bloom_filter_columns). | |
| double | bloom_filter_fpr = 0.01 |
| Target false-positive rate for bloom filters. Default: 1 %. | |
| std::unordered_set< std::string > | bloom_filter_columns |
| Column names for which bloom filters should be generated. | |
Configuration options for ParquetWriter.
Controls row group sizing, encoding, compression, bloom filters, page indexes, file-level metadata, and (optionally) Parquet Modular Encryption. An instance of this struct is passed to ParquetWriter::open(). All fields have sensible defaults so a default-constructed WriterOptions produces uncompressed, PLAIN-encoded Parquet files.
Definition at line 188 of file writer.hpp.
| bool signet::forge::WriterOptions::auto_compression = false |
When true, the writer samples page data and selects the most effective compression codec automatically.
Definition at line 222 of file writer.hpp.
| bool signet::forge::WriterOptions::auto_encoding = false |
When true, the writer automatically selects the best encoding for each column based on its physical type (e.g.
DELTA_BINARY_PACKED for INT32/INT64, BYTE_STREAM_SPLIT for FLOAT/DOUBLE, RLE for BOOLEAN). Per-column overrides still take priority.
Definition at line 218 of file writer.hpp.
| std::unordered_set<std::string> signet::forge::WriterOptions::bloom_filter_columns |
Column names for which bloom filters should be generated.
An empty set means all columns get a bloom filter when enable_bloom_filter is true.
Definition at line 242 of file writer.hpp.
| double signet::forge::WriterOptions::bloom_filter_fpr = 0.01 |
Target false-positive rate for bloom filters. Default: 1 %.
Definition at line 237 of file writer.hpp.
| std::unordered_map<std::string, Encoding> signet::forge::WriterOptions::column_encodings |
Per-column encoding overrides keyed by column name.
Entries here take priority over default_encoding and auto_encoding.
Definition at line 212 of file writer.hpp.
| Compression signet::forge::WriterOptions::compression = Compression::UNCOMPRESSED |
Compression codec applied to every data and dictionary page.
Default: UNCOMPRESSED.
Definition at line 208 of file writer.hpp.
| std::string signet::forge::WriterOptions::created_by = SIGNET_CREATED_BY |
Value written into the Parquet footer's "created_by" field.
Definition at line 195 of file writer.hpp.
| Encoding signet::forge::WriterOptions::default_encoding = Encoding::PLAIN |
Default encoding applied to every column that does not have a per-column override in column_encodings.
Default: PLAIN.
Definition at line 204 of file writer.hpp.
| bool signet::forge::WriterOptions::enable_bloom_filter = false |
When true, a Split Block Bloom Filter is written for each column (or for the subset named in bloom_filter_columns).
Definition at line 234 of file writer.hpp.
| bool signet::forge::WriterOptions::enable_page_index = false |
When true, a ColumnIndex and OffsetIndex are written for each column chunk, enabling predicate pushdown during reads.
Definition at line 228 of file writer.hpp.
| std::vector<thrift::KeyValue> signet::forge::WriterOptions::file_metadata |
Custom key-value metadata pairs embedded in the Parquet footer.
Definition at line 198 of file writer.hpp.
| int64_t signet::forge::WriterOptions::row_group_size = 64 * 1024 |
Target number of rows per row group.
When the row-based API accumulates this many pending rows, flush_row_group() is called automatically. Default: 65 536.
Definition at line 192 of file writer.hpp.