197template <PhysicalType PT>
constexpr const char * SIGNET_CREATED_BY
Default "created_by" string embedded in every Parquet footer.
PhysicalType
Parquet physical (storage) types as defined in parquet.thrift.
@ INT96
96-bit value (deprecated — legacy Impala timestamps).
@ FIXED_LEN_BYTE_ARRAY
Fixed-length byte array (UUID, vectors, decimals).
@ INT64
64-bit signed integer (little-endian).
@ INT32
32-bit signed integer (little-endian).
@ BOOLEAN
1-bit boolean, bit-packed in pages.
@ BYTE_ARRAY
Variable-length byte sequence (strings, binary).
@ FLOAT
IEEE 754 single-precision float.
@ DOUBLE
IEEE 754 double-precision float.
constexpr int32_t PARQUET_VERSION
Parquet format version written to the file footer.
constexpr uint32_t PARQUET_MAGIC_ENCRYPTED
"PARE" magic bytes (little-endian uint32) — marks a Parquet file with an encrypted footer.
@ STRING
Variable-length string.
@ FLOAT
32-bit IEEE float (float32)
@ DOUBLE
64-bit IEEE float (float64)
Compression
Parquet compression codecs.
@ BROTLI
Brotli compression (not currently supported).
@ SNAPPY
Snappy compression (bundled, header-only).
@ LZ4_RAW
LZ4 raw (unframed) block compression.
@ LZO
LZO compression (not currently supported).
@ UNCOMPRESSED
No compression.
@ ZSTD
Zstandard compression (requires SIGNET_ENABLE_ZSTD).
@ LZ4
LZ4 block compression (requires SIGNET_ENABLE_LZ4).
@ GZIP
Gzip/deflate compression (requires SIGNET_ENABLE_GZIP).
@ JSON
Pretty-printed JSON object (default)
ConvertedType
Legacy Parquet converted types for backward compatibility with older readers.
@ TIMESTAMP_MILLIS
Timestamp in milliseconds.
@ MAP_KEY_VALUE
Map key-value pair.
@ LIST
List (nested group).
@ INT_8
Signed 8-bit integer.
@ UINT_32
Unsigned 32-bit integer.
@ TIMESTAMP_MICROS
Timestamp in microseconds.
@ UINT_16
Unsigned 16-bit integer.
@ INT_16
Signed 16-bit integer.
@ UINT_8
Unsigned 8-bit integer.
@ TIME_MILLIS
Time in milliseconds.
@ UINT_64
Unsigned 64-bit integer.
@ INT_32
Signed 32-bit integer.
@ TIME_MICROS
Time in microseconds.
@ INT_64
Signed 64-bit integer.
@ UTF8
UTF-8 encoded string.
typename native_type_of< PT >::type native_type_of_t
Convenience alias: native_type_of_t<PhysicalType::INT64> == int64_t.
LogicalType
Parquet logical types (from parquet.thrift LogicalType union).
@ DECIMAL
Fixed-point decimal (INT32/INT64/FIXED_LEN_BYTE_ARRAY).
@ TIMESTAMP_NS
Timestamp — INT64, nanoseconds since Unix epoch.
@ UUID
RFC 4122 UUID (stored as FIXED_LEN_BYTE_ARRAY(16)).
@ DATE
Calendar date — INT32, days since 1970-01-01.
@ ENUM
Enum string (stored as BYTE_ARRAY).
@ TIME_MS
Time of day — INT32, milliseconds since midnight.
@ TIME_NS
Time of day — INT64, nanoseconds since midnight.
@ NONE
No logical annotation — raw physical type.
@ TIME_US
Time of day — INT64, microseconds since midnight.
@ BSON
BSON document (stored as BYTE_ARRAY).
@ TIMESTAMP_MS
Timestamp — INT64, milliseconds since Unix epoch.
@ FLOAT32_VECTOR
ML embedding vector — FIXED_LEN_BYTE_ARRAY(dim*4).
@ TIMESTAMP_US
Timestamp — INT64, microseconds since Unix epoch.
constexpr uint32_t PARQUET_MAGIC
"PAR1" magic bytes (little-endian uint32) — marks a standard Parquet file.
constexpr PhysicalType parquet_type_of_v
Convenience variable template: parquet_type_of_v<double> == PhysicalType::DOUBLE.
Encoding
Parquet page encoding types.
@ DELTA_BINARY_PACKED
Delta encoding for INT32/INT64 (compact for sorted/sequential data).
@ RLE
Run-length / bit-packed hybrid (used for booleans and def/rep levels).
@ RLE_DICTIONARY
Modern dictionary encoding (Parquet 2.0) — dict page + RLE indices.
@ BIT_PACKED
Deprecated — superseded by RLE.
@ DELTA_BYTE_ARRAY
Incremental/prefix encoding for byte arrays.
@ DELTA_LENGTH_BYTE_ARRAY
Delta-encoded lengths + concatenated byte arrays.
@ PLAIN_DICTIONARY
Legacy dictionary encoding (Parquet 1.0).
@ PLAIN
Values stored back-to-back in their native binary layout.
@ BYTE_STREAM_SPLIT
Byte-stream split for FLOAT/DOUBLE (transposes byte lanes for better compression).
@ INT64
Signed 64-bit integer.
@ INT32
Signed 32-bit integer.
@ FLOAT16
IEEE 754 half-precision (2 bytes)
PageType
Parquet page types within a column chunk.
@ DATA_PAGE_V2
Data page v2 (Parquet 2.0 format with separate rep/def level sections).
@ INDEX_PAGE
Index page (reserved, not used by Signet).
@ DICTIONARY_PAGE
Dictionary page — contains the value dictionary for RLE_DICTIONARY columns.
@ DATA_PAGE
Data page (Parquet 1.0 format).
Repetition
Parquet field repetition types (nullability / cardinality).
@ REPEATED
Zero or more values per row (list).
@ OPTIONAL
Zero or one value per row (nullable).
@ REQUIRED
Exactly one value per row (non-nullable).
Descriptor for a single column in a Parquet schema.
int32_t type_length
Byte length for FIXED_LEN_BYTE_ARRAY columns (-1 = N/A).
LogicalType logical_type
Semantic annotation (STRING, TIMESTAMP_NS, etc.).
Repetition repetition
Nullability / cardinality.
std::string name
Column name (unique within a schema).
int32_t scale
Decimal scale (-1 = N/A).
PhysicalType physical_type
On-disk storage type.
int32_t precision
Decimal precision (-1 = N/A).
Per-column statistics from ParquetReader::file_stats().
Compression compression
Compression codec.
bool has_page_index
Whether column/offset index is present.
std::string column_name
Column name.
PhysicalType physical_type
Storage type.
int64_t uncompressed_bytes
Total uncompressed size.
int64_t num_values
Total value count.
bool has_bloom_filter
Whether a bloom filter is present.
int64_t null_count
Total null count.
int64_t compressed_bytes
Total compressed size.
LogicalType logical_type
Logical annotation.
Per-column statistics produced by ParquetWriter::close().
PhysicalType physical_type
Storage type used on disk.
std::string column_name
Column name from the schema.
int64_t null_count
Number of null values.
int64_t uncompressed_bytes
Total uncompressed data size (bytes).
int64_t compressed_bytes
Total compressed data size (bytes).
Encoding encoding
Encoding applied to data pages.
int64_t num_values
Number of values written.
Compression compression
Compression codec applied.
Aggregate file-level statistics returned by ParquetReader::file_stats().
int64_t total_rows
Total rows in the file.
double compression_ratio
Overall uncompressed / compressed ratio.
std::string created_by
"created_by" string from the footer.
std::vector< ColumnFileStats > columns
Per-column statistics.
int64_t num_columns
Number of columns.
int64_t num_row_groups
Number of row groups.
double bytes_per_row
Average file bytes per row.
int64_t file_size_bytes
Total file size on disk (bytes).
File-level write statistics returned by ParquetWriter::close().
int64_t file_size_bytes
Total on-disk file size (bytes).
int64_t total_compressed_bytes
Sum of compressed page sizes.
double bytes_per_row
Average file bytes per row.
std::vector< ColumnWriteStats > columns
Per-column statistics.
double compression_ratio
Ratio of uncompressed / compressed (>= 1.0).
int64_t total_uncompressed_bytes
Sum of uncompressed page sizes.
int64_t total_rows
Total rows written across all row groups.
int64_t total_row_groups
Number of row groups in the file.
Maps a Parquet PhysicalType back to its corresponding C++ native type.
Maps a C++ type to its corresponding Parquet PhysicalType at compile time.