![]() |
Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
|
DEMO |
Streaming Parquet file writer with row-based and column-based APIs. More...
#include <writer.hpp>
Public Types | |
| using | Options = WriterOptions |
Alias for WriterOptions, usable as ParquetWriter::Options. | |
Public Member Functions | |
| expected< void > | write_row (const std::vector< std::string > &values) |
| Write a single row as a vector of string values. | |
| size_t | num_columns () const noexcept |
| Returns the number of columns in the writer's schema. | |
| template<typename T > | |
| expected< void > | write_column (size_t col_index, const T *values, size_t count) |
| Write a batch of typed values to a single column. | |
| expected< void > | write_column (size_t col_index, const std::string *values, size_t count) |
| Write a batch of string values to a BYTE_ARRAY column. | |
| expected< void > | flush_row_group () |
| Flush the current row group to disk. | |
| expected< WriteStats > | close () |
| Close the file and finalize the Parquet footer. | |
| ~ParquetWriter () | |
| Destructor. | |
| ParquetWriter (const ParquetWriter &)=delete | |
| Deleted copy constructor. ParquetWriter is move-only. | |
| ParquetWriter & | operator= (const ParquetWriter &)=delete |
| Deleted copy-assignment operator. ParquetWriter is move-only. | |
| ParquetWriter (ParquetWriter &&other) noexcept | |
| Move constructor. | |
| ParquetWriter & | operator= (ParquetWriter &&other) noexcept |
| Move-assignment operator. | |
| int64_t | rows_written () const |
| Returns the total number of rows written so far. | |
| int64_t | row_groups_written () const |
| Returns the number of row groups that have been flushed to disk. | |
| bool | is_open () const |
| Returns whether the writer is open and accepting data. | |
Static Public Member Functions | |
| static expected< ParquetWriter > | open (const std::filesystem::path &path, const Schema &schema, const Options &options=Options{}) |
| Open a new Parquet file for writing. | |
| static expected< void > | csv_to_parquet (const std::filesystem::path &csv_input, const std::filesystem::path &parquet_output, const Options &options=Options{}) |
| Convert a CSV file to a Parquet file. | |
Streaming Parquet file writer with row-based and column-based APIs.
ParquetWriter is the primary write-path class in Signet Forge. It produces spec-compliant Apache Parquet files with configurable encoding (PLAIN, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, RLE_DICTIONARY, RLE), compression (Snappy, ZSTD, LZ4, Gzip), optional bloom filters, page indexes, and Parquet Modular Encryption (commercial tier).
Lifecycle:
The class is move-only (non-copyable). If the user forgets to call close(), the destructor performs a best-effort close.
Definition at line 280 of file writer.hpp.
Alias for WriterOptions, usable as ParquetWriter::Options.
Definition at line 283 of file writer.hpp.
|
inline |
Destructor.
Performs a best-effort close() if the file is still open.
Any errors during the implicit close are silently discarded. Prefer calling close() explicitly so that errors and WriteStats can be inspected.
Definition at line 1024 of file writer.hpp.
|
delete |
Deleted copy constructor. ParquetWriter is move-only.
|
inlinenoexcept |
Move constructor.
Transfers ownership of the open file and all internal state from other. After the move, other is in a closed, empty state.
Definition at line 1041 of file writer.hpp.
|
inline |
Close the file and finalize the Parquet footer.
Flushes any remaining row data via flush_row_group(), serializes the Thrift FileMetaData (schema, row group metadata, statistics, custom key-value pairs), writes the footer length as a 4-byte LE integer, and appends the closing PAR1 magic (or PARE for encrypted footers).
After close() returns, the file on disk is a complete, spec-valid Parquet file. Calling close() on an already-closed writer is safe and returns an empty WriteStats.
Definition at line 869 of file writer.hpp.
|
inlinestatic |
Convert a CSV file to a Parquet file.
Reads the entire CSV into memory, auto-detects column types by scanning every value in each column (priority: INT64 > DOUBLE > BOOLEAN > STRING), builds a Schema, writes all rows through a ParquetWriter, and closes the output file.
The first line of the CSV is treated as the header (column names). Quoted fields with embedded commas and escaped double-quotes ("") are supported.
| csv_input | Path to the input CSV file. |
| parquet_output | Path for the output Parquet file (created or truncated). |
| options | Writer options forwarded to ParquetWriter::open(). |
expected<void> – error on I/O failure, empty CSV, or any write/close error. Definition at line 1144 of file writer.hpp.
|
inline |
Flush the current row group to disk.
Encodes any pending string rows (row-based API), verifies that all columns have the same value count, writes column chunks with the selected encoding and compression, emits bloom filters and page indexes if enabled, and records the row group metadata for the footer.
This method is called automatically by write_row() when the pending row count reaches WriterOptions::row_group_size, and by close() to drain any remaining data. It may also be called explicitly to control row group boundaries.
expected<void> – error on I/O failure, schema mismatch (column value counts differ), or compression/encryption error. Definition at line 520 of file writer.hpp.
|
inline |
Returns whether the writer is open and accepting data.
true if the writer is open, false after close() or move. Definition at line 1118 of file writer.hpp.
|
inlinenoexcept |
Returns the number of columns in the writer's schema.
Definition at line 393 of file writer.hpp.
|
inlinestatic |
Open a new Parquet file for writing.
Creates (or truncates) the file at path, writes the 4-byte PAR1 magic header, and initializes internal column writers, bloom filters, and page-index builders according to options. Parent directories are created automatically if they do not exist.
| path | Filesystem path for the output Parquet file. |
| schema | Column schema describing names, physical types, and logical types. |
| options | Writer configuration (encoding, compression, bloom filters, encryption, etc.). Defaults to plain, uncompressed output. |
Definition at line 303 of file writer.hpp.
|
delete |
Deleted copy-assignment operator. ParquetWriter is move-only.
|
inlinenoexcept |
Move-assignment operator.
Closes the current file (if open) before transferring ownership from other.
Definition at line 1066 of file writer.hpp.
|
inline |
Returns the number of row groups that have been flushed to disk.
Definition at line 1112 of file writer.hpp.
|
inline |
Returns the total number of rows written so far.
Includes both rows already flushed to completed row groups and rows buffered in memory awaiting the next flush_row_group() call.
Definition at line 1105 of file writer.hpp.
|
inline |
Write a batch of string values to a BYTE_ARRAY column.
This overload handles variable-length binary / UTF-8 data. Each string is stored with a 4-byte little-endian length prefix in the PLAIN encoding buffer, matching the Parquet BYTE_ARRAY wire format.
| col_index | Zero-based column index in the schema. |
| values | Pointer to a contiguous array of count strings. |
| count | Number of string values to write. |
expected<void> – error if the writer is closed or col_index is out of range. Definition at line 467 of file writer.hpp.
|
inline |
Write a batch of typed values to a single column.
The caller writes each column independently and then calls flush_row_group(). All columns within a row group must receive the same number of values; a mismatch is detected at flush time.
Supported template types map to Parquet physical types:
bool -> BOOLEANint32_t -> INT32int64_t -> INT64float -> FLOATdouble -> DOUBLEstd::string -> BYTE_ARRAY (use the string overload instead)| T | C++ type matching the column's physical type. |
| col_index | Zero-based column index in the schema. |
| values | Pointer to a contiguous array of count values. |
| count | Number of values to write. |
expected<void> – error if the writer is closed or col_index is out of range. Definition at line 419 of file writer.hpp.
|
inline |
Write a single row as a vector of string values.
Each string is parsed and converted to its column's physical type when the row group is flushed (either automatically when WriterOptions::row_group_size is reached, or explicitly via flush_row_group()). The number of values must exactly match the schema's column count.
| values | One string per column, in schema order. |
expected<void> – error if the writer is closed or if values.size() does not match the schema. Definition at line 368 of file writer.hpp.