Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::RleEncoder Class Reference

Streaming encoder for the Parquet RLE/Bit-Packing Hybrid scheme. More...

#include <rle.hpp>

Public Member Functions

 RleEncoder (int bit_width)
 Construct an encoder for values of the given bit width.
 
void put (uint64_t value)
 Add a single value to the encoding stream.
 
void flush ()
 Flush any pending values to the output buffer.
 
const std::vector< uint8_t > & data () const
 Returns a reference to the encoded byte buffer (without length prefix).
 
size_t encoded_size () const
 Returns the size of the encoded data in bytes.
 
void reset ()
 Reset the encoder to its initial state, preserving the bit width.
 

Static Public Member Functions

static std::vector< uint8_t > encode (const uint32_t *values, size_t count, int bit_width)
 Encode an array of uint32 values using the RLE/Bit-Pack Hybrid scheme.
 
static std::vector< uint8_t > encode_with_length (const uint32_t *values, size_t count, int bit_width)
 Encode with a 4-byte little-endian length prefix.
 

Detailed Description

Streaming encoder for the Parquet RLE/Bit-Packing Hybrid scheme.

Accepts a stream of unsigned integer values via put() and decides per-group whether to emit an RLE run (for repeated values) or a bit-packed group of 8. Call flush() after all values are written, then retrieve the encoded bytes from data().

Note
The encoded output does NOT include a length prefix. Use encode_with_length() for def/rep level encoding that requires one.
See also
RleDecoder
https://parquet.apache.org/documentation/latest/

Definition at line 211 of file rle.hpp.

Constructor & Destructor Documentation

◆ RleEncoder()

signet::forge::RleEncoder::RleEncoder ( int  bit_width)
inlineexplicit

Construct an encoder for values of the given bit width.

Parameters
bit_widthBits per value (0–64). Values outside this range are clamped to 0 (no encoding).

Definition at line 217 of file rle.hpp.

Member Function Documentation

◆ data()

const std::vector< uint8_t > & signet::forge::RleEncoder::data ( ) const
inline

Returns a reference to the encoded byte buffer (without length prefix).

Returns
Const reference to the internal encoded output.
Note
Call flush() before accessing this to ensure all data is emitted.

Definition at line 308 of file rle.hpp.

◆ encode()

static std::vector< uint8_t > signet::forge::RleEncoder::encode ( const uint32_t *  values,
size_t  count,
int  bit_width 
)
inlinestatic

Encode an array of uint32 values using the RLE/Bit-Pack Hybrid scheme.

Convenience static method that constructs an encoder, feeds all values, flushes, and returns the resulting byte buffer without a length prefix.

Parameters
valuesPointer to the input values.
countNumber of values to encode.
bit_widthBits per value (0–64). Returns empty for invalid widths.
Returns
Encoded byte buffer (empty on error or bit_width == 0).
See also
encode_with_length

Definition at line 342 of file rle.hpp.

◆ encode_with_length()

static std::vector< uint8_t > signet::forge::RleEncoder::encode_with_length ( const uint32_t *  values,
size_t  count,
int  bit_width 
)
inlinestatic

Encode with a 4-byte little-endian length prefix.

Produces the same output as encode(), but prepends a 4-byte LE uint32 length prefix containing the payload size. This format is required by Parquet for definition and repetition level encoding.

Parameters
valuesPointer to the input values.
countNumber of values to encode.
bit_widthBits per value (0–64).
Returns
Length-prefixed encoded byte buffer.
See also
encode, RleDecoder::decode_with_length

Definition at line 371 of file rle.hpp.

◆ encoded_size()

size_t signet::forge::RleEncoder::encoded_size ( ) const
inline

Returns the size of the encoded data in bytes.

Returns
Number of bytes in the encoded output.

Definition at line 313 of file rle.hpp.

◆ flush()

void signet::forge::RleEncoder::flush ( )
inline

Flush any pending values to the output buffer.

Must be called after all put() calls to finalize the encoding. Any partial bit-packed group (fewer than 8 values) is zero-padded to 8 before emission.

Definition at line 272 of file rle.hpp.

◆ put()

void signet::forge::RleEncoder::put ( uint64_t  value)
inline

Add a single value to the encoding stream.

Values are buffered internally and flushed as RLE runs or bit-packed groups. If bit_width is 0, this is a no-op (all values are implicitly zero).

Parameters
valueThe unsigned integer value to encode (must fit in bit_width bits).

Definition at line 228 of file rle.hpp.

◆ reset()

void signet::forge::RleEncoder::reset ( )
inline

Reset the encoder to its initial state, preserving the bit width.

Clears all internal buffers and accumulators so the encoder can be reused for a new encoding session.

Definition at line 319 of file rle.hpp.


The documentation for this class was generated from the following file: