Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::ColumnReader Class Reference

PLAIN-encoded Parquet column decoder. More...

#include <column_reader.hpp>

Public Member Functions

 ColumnReader (PhysicalType type, const uint8_t *data, size_t size, int64_t num_values, int32_t type_length=-1)
 Construct a reader over raw PLAIN-encoded page data.
 
expected< bool > read_bool ()
 Read a single BOOLEAN value (bit-packed, LSB first).
 
expected< int32_t > read_int32 ()
 Read a single INT32 value (4 bytes little-endian).
 
expected< int64_t > read_int64 ()
 Read a single INT64 value (8 bytes little-endian).
 
expected< float > read_float ()
 Read a single FLOAT value (4 bytes little-endian, IEEE 754).
 
expected< double > read_double ()
 Read a single DOUBLE value (8 bytes little-endian, IEEE 754).
 
expected< std::string > read_string ()
 Read a single BYTE_ARRAY value as a std::string.
 
expected< std::string_view > read_string_view ()
 Read a single BYTE_ARRAY value as a non-owning std::string_view.
 
expected< std::vector< uint8_t > > read_bytes ()
 Read a single BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY value as raw bytes.
 
expected< void > read_batch_bool (bool *out, size_t count)
 Read a batch of BOOLEAN values into out.
 
expected< void > read_batch_int32 (int32_t *out, size_t count)
 Read a batch of INT32 values via bulk memcpy.
 
expected< void > read_batch_int64 (int64_t *out, size_t count)
 Read a batch of INT64 values via bulk memcpy.
 
expected< void > read_batch_float (float *out, size_t count)
 Read a batch of FLOAT values via bulk memcpy.
 
expected< void > read_batch_double (double *out, size_t count)
 Read a batch of DOUBLE values via bulk memcpy.
 
expected< void > read_batch_string (std::string *out, size_t count)
 Read a batch of BYTE_ARRAY values as strings.
 
template<typename T >
expected< T > read ()
 Read a single value of type T, dispatching to the correct typed reader.
 
template<typename T >
expected< void > read_batch (T *out, size_t count)
 Read a batch of count values of type T into out.
 
int64_t values_remaining () const
 Number of values not yet read from this page.
 
bool has_next () const
 Whether there is at least one more value to read.
 
PhysicalType type () const
 The Parquet physical type of this column.
 
size_t position () const
 Current byte offset within the page data buffer.
 

Detailed Description

PLAIN-encoded Parquet column decoder.

Wraps a raw data page buffer and decodes values one at a time or in batches. The reader maintains a cursor position and a count of values read, returning an error on type mismatch, buffer overrun, or exhaustion.

Note
ColumnReader does not own the data buffer. The caller must ensure the buffer remains valid for the reader's lifetime.
See also
ColumnWriter (the encoding counterpart)
MmapParquetReader (constructs ColumnReaders over mmap'd pages)

Definition at line 46 of file column_reader.hpp.

Constructor & Destructor Documentation

◆ ColumnReader()

signet::forge::ColumnReader::ColumnReader ( PhysicalType  type,
const uint8_t *  data,
size_t  size,
int64_t  num_values,
int32_t  type_length = -1 
)
inline

Construct a reader over raw PLAIN-encoded page data.

Parameters
typeThe physical type of the column.
dataPointer to the start of the page data buffer.
sizeSize of the page data in bytes.
num_valuesNumber of values encoded in this page.
type_lengthFor FIXED_LEN_BYTE_ARRAY columns, the fixed byte length per value (ignored for other types).

Definition at line 56 of file column_reader.hpp.

Member Function Documentation

◆ has_next()

bool signet::forge::ColumnReader::has_next ( ) const
inline

Whether there is at least one more value to read.

Definition at line 566 of file column_reader.hpp.

◆ position()

size_t signet::forge::ColumnReader::position ( ) const
inline

Current byte offset within the page data buffer.

Definition at line 576 of file column_reader.hpp.

◆ read()

template<typename T >
expected< T > signet::forge::ColumnReader::read ( )
inline

Read a single value of type T, dispatching to the correct typed reader.

Supported types: bool, int32_t, int64_t, float, double, std::string, std::string_view, std::vector<uint8_t>.

Template Parameters
TThe value type to decode.
Returns
The decoded value, or an error.

Definition at line 504 of file column_reader.hpp.

◆ read_batch()

template<typename T >
expected< void > signet::forge::ColumnReader::read_batch ( T *  out,
size_t  count 
)
inline

Read a batch of count values of type T into out.

Dispatches to the correct typed batch reader. Supported types: bool, int32_t, int64_t, float, double, std::string.

Template Parameters
TThe value type to decode.
Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 537 of file column_reader.hpp.

◆ read_batch_bool()

expected< void > signet::forge::ColumnReader::read_batch_bool ( bool *  out,
size_t  count 
)
inline

Read a batch of BOOLEAN values into out.

Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 319 of file column_reader.hpp.

◆ read_batch_double()

expected< void > signet::forge::ColumnReader::read_batch_double ( double *  out,
size_t  count 
)
inline

Read a batch of DOUBLE values via bulk memcpy.

Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 433 of file column_reader.hpp.

◆ read_batch_float()

expected< void > signet::forge::ColumnReader::read_batch_float ( float *  out,
size_t  count 
)
inline

Read a batch of FLOAT values via bulk memcpy.

Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 405 of file column_reader.hpp.

◆ read_batch_int32()

expected< void > signet::forge::ColumnReader::read_batch_int32 ( int32_t *  out,
size_t  count 
)
inline

Read a batch of INT32 values via bulk memcpy.

Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 349 of file column_reader.hpp.

◆ read_batch_int64()

expected< void > signet::forge::ColumnReader::read_batch_int64 ( int64_t *  out,
size_t  count 
)
inline

Read a batch of INT64 values via bulk memcpy.

Parameters
outPre-allocated buffer of at least count elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 377 of file column_reader.hpp.

◆ read_batch_string()

expected< void > signet::forge::ColumnReader::read_batch_string ( std::string *  out,
size_t  count 
)
inline

Read a batch of BYTE_ARRAY values as strings.

Parameters
outPre-allocated buffer of at least count std::string elements.
countNumber of values to read.
Returns
Void on success, or an error.

Definition at line 461 of file column_reader.hpp.

◆ read_bool()

expected< bool > signet::forge::ColumnReader::read_bool ( )
inline

Read a single BOOLEAN value (bit-packed, LSB first).

Returns
The decoded boolean, or an error on type mismatch / exhaustion.

Definition at line 76 of file column_reader.hpp.

◆ read_bytes()

expected< std::vector< uint8_t > > signet::forge::ColumnReader::read_bytes ( )
inline

Read a single BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY value as raw bytes.

For BYTE_ARRAY, reads a 4-byte LE length prefix then the payload. For FIXED_LEN_BYTE_ARRAY, reads exactly type_length bytes.

Returns
A vector of the raw bytes, or an error.

Definition at line 264 of file column_reader.hpp.

◆ read_double()

expected< double > signet::forge::ColumnReader::read_double ( )
inline

Read a single DOUBLE value (8 bytes little-endian, IEEE 754).

Returns
The decoded double, or an error on type mismatch / exhaustion.

Definition at line 167 of file column_reader.hpp.

◆ read_float()

expected< float > signet::forge::ColumnReader::read_float ( )
inline

Read a single FLOAT value (4 bytes little-endian, IEEE 754).

Returns
The decoded float, or an error on type mismatch / exhaustion.

Definition at line 145 of file column_reader.hpp.

◆ read_int32()

expected< int32_t > signet::forge::ColumnReader::read_int32 ( )
inline

Read a single INT32 value (4 bytes little-endian).

Returns
The decoded int32_t, or an error on type mismatch / exhaustion.

Definition at line 101 of file column_reader.hpp.

◆ read_int64()

expected< int64_t > signet::forge::ColumnReader::read_int64 ( )
inline

Read a single INT64 value (8 bytes little-endian).

Returns
The decoded int64_t, or an error on type mismatch / exhaustion.

Definition at line 123 of file column_reader.hpp.

◆ read_string()

expected< std::string > signet::forge::ColumnReader::read_string ( )
inline

Read a single BYTE_ARRAY value as a std::string.

PLAIN encoding: 4-byte LE length prefix followed by raw bytes.

Returns
The decoded string, or an error on type mismatch / exhaustion.

Definition at line 192 of file column_reader.hpp.

◆ read_string_view()

expected< std::string_view > signet::forge::ColumnReader::read_string_view ( )
inline

Read a single BYTE_ARRAY value as a non-owning std::string_view.

The returned view points directly into the page data buffer, so it is only valid as long as the underlying buffer is alive.

Returns
A string_view into the page data, or an error.

Definition at line 228 of file column_reader.hpp.

◆ type()

PhysicalType signet::forge::ColumnReader::type ( ) const
inline

The Parquet physical type of this column.

Definition at line 571 of file column_reader.hpp.

◆ values_remaining()

int64_t signet::forge::ColumnReader::values_remaining ( ) const
inline

Number of values not yet read from this page.

Definition at line 561 of file column_reader.hpp.


The documentation for this class was generated from the following file: