Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::MmapParquetReader Class Reference

#include <mmap_reader.hpp>

Classes

struct  RowGroupInfo
 Summary information for a single row group. More...
 

Public Member Functions

const Schemaschema () const
 The file's column schema.
 
int64_t num_rows () const
 Total number of rows across all row groups.
 
int64_t num_row_groups () const
 Number of row groups in the file.
 
const std::string & created_by () const
 The "created by" string from the file footer (may be empty).
 
const std::vector< thrift::KeyValue > & key_value_metadata () const
 User-defined key-value metadata from the file footer.
 
RowGroupInfo row_group (size_t index) const
 Retrieve summary information for a specific row group.
 
const thrift::Statisticscolumn_statistics (size_t row_group_index, size_t column_index) const
 Retrieve the Thrift Statistics for a column chunk.
 
template<typename T >
expected< std::vector< T > > read_column (size_t row_group_index, size_t column_index)
 Read an entire column from a row group as a typed vector.
 
expected< std::vector< std::string > > read_column_as_strings (size_t row_group_index, size_t column_index)
 Read a column and convert all values to their string representations.
 
expected< std::vector< std::vector< std::string > > > read_all ()
 Read all rows from all row groups as a vector of string rows.
 
const MmapReadermmap () const
 Direct access to the memory-mapped file data.
 
 ~MmapParquetReader ()=default
 Default destructor.
 
 MmapParquetReader (MmapParquetReader &&) noexcept=default
 Move-constructible.
 
MmapParquetReaderoperator= (MmapParquetReader &&) noexcept=default
 Move-assignable.
 

Static Public Member Functions

static expected< MmapParquetReaderopen (const std::filesystem::path &path)
 Open a Parquet file with memory-mapped I/O.
 

Detailed Description

Definition at line 266 of file mmap_reader.hpp.

Constructor & Destructor Documentation

◆ ~MmapParquetReader()

signet::forge::MmapParquetReader::~MmapParquetReader ( )
default

Default destructor.

◆ MmapParquetReader()

signet::forge::MmapParquetReader::MmapParquetReader ( MmapParquetReader &&  )
defaultnoexcept

Move-constructible.

Member Function Documentation

◆ column_statistics()

const thrift::Statistics * signet::forge::MmapParquetReader::column_statistics ( size_t  row_group_index,
size_t  column_index 
) const
inline

Retrieve the Thrift Statistics for a column chunk.

Parameters
row_group_indexZero-based row group index.
column_indexZero-based column index.
Returns
Pointer to the Statistics struct, or nullptr if unavailable.

Definition at line 442 of file mmap_reader.hpp.

◆ created_by()

const std::string & signet::forge::MmapParquetReader::created_by ( ) const
inline

The "created by" string from the file footer (may be empty).

Definition at line 399 of file mmap_reader.hpp.

◆ key_value_metadata()

const std::vector< thrift::KeyValue > & signet::forge::MmapParquetReader::key_value_metadata ( ) const
inline

User-defined key-value metadata from the file footer.

Returns
A reference to the metadata vector, or an empty vector if none.

Definition at line 403 of file mmap_reader.hpp.

◆ mmap()

const MmapReader & signet::forge::MmapParquetReader::mmap ( ) const
inline

Direct access to the memory-mapped file data.

Definition at line 708 of file mmap_reader.hpp.

◆ num_row_groups()

int64_t signet::forge::MmapParquetReader::num_row_groups ( ) const
inline

Number of row groups in the file.

Definition at line 394 of file mmap_reader.hpp.

◆ num_rows()

int64_t signet::forge::MmapParquetReader::num_rows ( ) const
inline

Total number of rows across all row groups.

Definition at line 391 of file mmap_reader.hpp.

◆ open()

static expected< MmapParquetReader > signet::forge::MmapParquetReader::open ( const std::filesystem::path &  path)
inlinestatic

Open a Parquet file with memory-mapped I/O.

Validates the file structure (PAR1 magic, footer length), deserializes the FileMetaData, and reconstructs the column schema.

Parameters
pathFilesystem path to the Parquet file.
Returns
An MmapParquetReader on success, or an Error.

Definition at line 275 of file mmap_reader.hpp.

◆ operator=()

MmapParquetReader & signet::forge::MmapParquetReader::operator= ( MmapParquetReader &&  )
defaultnoexcept

Move-assignable.

◆ read_all()

expected< std::vector< std::vector< std::string > > > signet::forge::MmapParquetReader::read_all ( )
inline

Read all rows from all row groups as a vector of string rows.

Each inner vector represents one row, with columns converted to strings via read_column_as_strings(). Useful for inspection and debugging, but not recommended for high-performance paths.

Returns
A 2D vector [row][column] of string values, or an Error.

Definition at line 666 of file mmap_reader.hpp.

◆ read_column()

template<typename T >
expected< std::vector< T > > signet::forge::MmapParquetReader::read_column ( size_t  row_group_index,
size_t  column_index 
)
inline

Read an entire column from a row group as a typed vector.

Automatically detects the encoding strategy (PLAIN, dictionary, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, or RLE boolean) and dispatches to the appropriate decoder.

Template Parameters
TThe C++ type to decode into (bool, int32_t, int64_t, float, double, std::string).
Parameters
row_group_indexZero-based row group index.
column_indexZero-based column index.
Returns
A vector of decoded values, or an Error.

Definition at line 467 of file mmap_reader.hpp.

◆ read_column_as_strings()

expected< std::vector< std::string > > signet::forge::MmapParquetReader::read_column_as_strings ( size_t  row_group_index,
size_t  column_index 
)
inline

Read a column and convert all values to their string representations.

Booleans become "true"/"false", numerics use std::to_string(), BYTE_ARRAY values are returned as-is, and FIXED_LEN_BYTE_ARRAY values are hex-encoded.

Parameters
row_group_indexZero-based row group index.
column_indexZero-based column index.
Returns
A vector of string-converted values, or an Error.

Definition at line 597 of file mmap_reader.hpp.

◆ row_group()

RowGroupInfo signet::forge::MmapParquetReader::row_group ( size_t  index) const
inline

Retrieve summary information for a specific row group.

Parameters
indexZero-based row group index.
Returns
A RowGroupInfo struct.
Exceptions
std::out_of_rangeif index >= num_row_groups().

Definition at line 423 of file mmap_reader.hpp.

◆ schema()

const Schema & signet::forge::MmapParquetReader::schema ( ) const
inline

The file's column schema.

Definition at line 388 of file mmap_reader.hpp.


The documentation for this class was generated from the following file: