Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
signet::forge::delta Namespace Reference

Functions

uint64_t zigzag_encode (int64_t n)
 Zigzag-encode a signed 64-bit integer to an unsigned representation.
 
uint32_t zigzag_encode32 (int32_t n)
 Zigzag-encode a signed 32-bit integer to an unsigned representation.
 
int64_t zigzag_decode (uint64_t v)
 Zigzag-decode an unsigned 64-bit integer back to its signed representation.
 
int32_t zigzag_decode32 (uint32_t v)
 Zigzag-decode an unsigned 32-bit integer back to its signed representation.
 
size_t encode_uvarint (std::vector< uint8_t > &buf, uint64_t value)
 Encode an unsigned varint (LEB128) into a byte buffer.
 
uint64_t decode_uvarint (const uint8_t *data, size_t &pos, size_t size)
 Decode an unsigned varint (LEB128) from a byte buffer.
 
int bit_width_for (uint64_t value)
 Compute the minimum number of bits required to represent an unsigned value.
 
void bit_pack_values (std::vector< uint8_t > &out, const uint64_t *values, size_t count, int bit_width)
 Bit-pack an arbitrary number of values at a given bit width.
 
void bit_unpack_values (const uint8_t *src, uint64_t *values, size_t count, int bit_width)
 Unpack an arbitrary number of values at a given bit width from packed data.
 
std::vector< uint8_t > encode_int64 (const int64_t *values, size_t count)
 Encode int64 values using the DELTA_BINARY_PACKED algorithm.
 
std::vector< uint8_t > encode_int32 (const int32_t *values, size_t count)
 Encode int32 values using the DELTA_BINARY_PACKED algorithm.
 
std::vector< int64_t > decode_int64 (const uint8_t *data, size_t size, size_t num_values)
 Decode DELTA_BINARY_PACKED data back to int64 values.
 
std::vector< int32_t > decode_int32 (const uint8_t *data, size_t size, size_t num_values)
 Decode DELTA_BINARY_PACKED data back to int32 values.
 

Variables

constexpr size_t DEFAULT_BLOCK_SIZE = 128
 Default number of delta values per block (must be a multiple of 128).
 
constexpr size_t DEFAULT_MINIBLOCK_COUNT = 4
 Default number of miniblocks within each block.
 
constexpr size_t VALUES_PER_MINIBLOCK = DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT
 Number of delta values per miniblock (DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT).
 

Function Documentation

◆ bit_pack_values()

void signet::forge::delta::bit_pack_values ( std::vector< uint8_t > &  out,
const uint64_t *  values,
size_t  count,
int  bit_width 
)
inline

Bit-pack an arbitrary number of values at a given bit width.

Appends ceil(count * bit_width / 8) bytes to out, packing each value LSB-first. Typically used to pack one miniblock of VALUES_PER_MINIBLOCK (32) delta-adjusted values. If bit_width is 0, no bytes are emitted.

Parameters
outOutput byte buffer to append packed data to.
valuesPointer to count unsigned values to pack.
countNumber of values (typically 32 for a miniblock).
bit_widthBits per value (0–64).
See also
bit_unpack_values

Definition at line 204 of file delta.hpp.

◆ bit_unpack_values()

void signet::forge::delta::bit_unpack_values ( const uint8_t *  src,
uint64_t *  values,
size_t  count,
int  bit_width 
)
inline

Unpack an arbitrary number of values at a given bit width from packed data.

Reverses bit_pack_values(). Reads ceil(count * bit_width / 8) bytes from src and unpacks count values. If bit_width is 0, all output values are set to zero.

Parameters
srcPointer to packed byte data (at least ceil(count*bit_width/8) bytes).
valuesOutput array for count unpacked unsigned values.
countNumber of values to unpack.
bit_widthBits per value (0–64).
See also
bit_pack_values

Definition at line 248 of file delta.hpp.

◆ bit_width_for()

int signet::forge::delta::bit_width_for ( uint64_t  value)
inline

Compute the minimum number of bits required to represent an unsigned value.

Equivalent to ceil(log2(value + 1)). Returns 0 for value == 0.

Parameters
valueThe unsigned integer whose bit width to compute.
Returns
Minimum bits needed (0–64).

Definition at line 179 of file delta.hpp.

◆ decode_int32()

std::vector< int32_t > signet::forge::delta::decode_int32 ( const uint8_t *  data,
size_t  size,
size_t  num_values 
)
inline

Decode DELTA_BINARY_PACKED data back to int32 values.

Convenience overload that decodes via decode_int64() and narrows the results to int32. Values exceeding the int32 range are truncated.

Parameters
dataPointer to the encoded DELTA_BINARY_PACKED payload.
sizeSize of the encoded data in bytes.
num_valuesNumber of values to decode (from the Parquet page header).
Returns
Decoded int32 values.
See also
encode_int32, decode_int64

Definition at line 564 of file delta.hpp.

◆ decode_int64()

std::vector< int64_t > signet::forge::delta::decode_int64 ( const uint8_t *  data,
size_t  size,
size_t  num_values 
)
inline

Decode DELTA_BINARY_PACKED data back to int64 values.

Parses the block header (block_size, miniblock_count, total_count, first_value), then iterates through blocks and miniblocks, unpacking bit-packed adjusted deltas and reconstructing original values. Includes overflow protection: returns a partial result on integer overflow or corrupt bit widths.

Parameters
dataPointer to the encoded DELTA_BINARY_PACKED payload.
sizeSize of the encoded data in bytes.
num_valuesNumber of values to decode (from the Parquet page header).
Returns
Decoded int64 values (may be fewer than num_values on truncated or corrupt input).
Note
The total_value_count in the header is ignored; the caller-supplied num_values is authoritative.
See also
encode_int64, decode_int32

Definition at line 438 of file delta.hpp.

◆ decode_uvarint()

uint64_t signet::forge::delta::decode_uvarint ( const uint8_t *  data,
size_t &  pos,
size_t  size 
)
inline

Decode an unsigned varint (LEB128) from a byte buffer.

Reads a variable-length encoded unsigned integer starting at data[pos]. Advances pos past the consumed bytes. Returns 0 if the buffer is exhausted or the shift exceeds 63 bits (overflow protection).

Parameters
dataPointer to the encoded byte stream.
posCurrent read position (updated on return).
sizeTotal size of the byte stream.
Returns
The decoded unsigned integer, or 0 on failure.
See also
encode_uvarint

Definition at line 152 of file delta.hpp.

◆ encode_int32()

std::vector< uint8_t > signet::forge::delta::encode_int32 ( const int32_t *  values,
size_t  count 
)
inline

Encode int32 values using the DELTA_BINARY_PACKED algorithm.

Convenience overload that widens int32 values to int64 and delegates to encode_int64(). The encoded wire format is identical.

Parameters
valuesPointer to the input int32 values.
countNumber of values to encode.
Returns
Encoded byte buffer containing the DELTA_BINARY_PACKED payload.
See also
decode_int32, encode_int64

Definition at line 408 of file delta.hpp.

◆ encode_int64()

std::vector< uint8_t > signet::forge::delta::encode_int64 ( const int64_t *  values,
size_t  count 
)
inline

Encode int64 values using the DELTA_BINARY_PACKED algorithm.

Computes successive deltas, partitions them into blocks of DEFAULT_BLOCK_SIZE, each subdivided into DEFAULT_MINIBLOCK_COUNT miniblocks, and bit-packs the adjusted deltas (delta - min_delta) per miniblock. Achieves excellent compression on sorted or near-monotonic sequences (timestamps, IDs).

Parameters
valuesPointer to the input int64 values.
countNumber of values to encode.
Returns
Encoded byte buffer containing the DELTA_BINARY_PACKED payload.
Note
For count == 0, returns a valid header with total_count = 0.
See also
decode_int64, encode_int32

Definition at line 298 of file delta.hpp.

◆ encode_uvarint()

size_t signet::forge::delta::encode_uvarint ( std::vector< uint8_t > &  buf,
uint64_t  value 
)
inline

Encode an unsigned varint (LEB128) into a byte buffer.

Appends the variable-length encoding of value to buf, using the same unsigned LEB128 format as the Thrift compact protocol. Each byte uses 7 data bits and 1 continuation bit (MSB).

Parameters
bufOutput byte buffer to append to.
valueThe unsigned integer to encode.
Returns
Number of bytes written (1–10).
See also
decode_uvarint

Definition at line 131 of file delta.hpp.

◆ zigzag_decode()

int64_t signet::forge::delta::zigzag_decode ( uint64_t  v)
inline

Zigzag-decode an unsigned 64-bit integer back to its signed representation.

Reverses zigzag_encode(): (v >> 1) ^ (~(v & 1) + 1).

Parameters
vThe zigzag-encoded unsigned 64-bit value.
Returns
The original signed 64-bit integer.
See also
zigzag_encode

Definition at line 102 of file delta.hpp.

◆ zigzag_decode32()

int32_t signet::forge::delta::zigzag_decode32 ( uint32_t  v)
inline

Zigzag-decode an unsigned 32-bit integer back to its signed representation.

32-bit variant of zigzag_decode(). Uses (v >> 1) ^ (~(v & 1) + 1).

Parameters
vThe zigzag-encoded unsigned 32-bit value.
Returns
The original signed 32-bit integer.
See also
zigzag_encode32

Definition at line 113 of file delta.hpp.

◆ zigzag_encode()

uint64_t signet::forge::delta::zigzag_encode ( int64_t  n)
inline

Zigzag-encode a signed 64-bit integer to an unsigned representation.

Maps signed integers to unsigned using the formula (n << 1) ^ (n >> 63), so that small-magnitude values (positive or negative) map to small unsigned values. This is critical for varint efficiency.

Parameters
nThe signed 64-bit integer to encode.
Returns
The zigzag-encoded unsigned 64-bit value.
See also
zigzag_decode

Definition at line 79 of file delta.hpp.

◆ zigzag_encode32()

uint32_t signet::forge::delta::zigzag_encode32 ( int32_t  n)
inline

Zigzag-encode a signed 32-bit integer to an unsigned representation.

32-bit variant of zigzag_encode(). Uses the formula (n << 1) ^ (n >> 31).

Parameters
nThe signed 32-bit integer to encode.
Returns
The zigzag-encoded unsigned 32-bit value.
See also
zigzag_decode32

Definition at line 91 of file delta.hpp.

Variable Documentation

◆ DEFAULT_BLOCK_SIZE

constexpr size_t signet::forge::delta::DEFAULT_BLOCK_SIZE = 128
inlineconstexpr

Default number of delta values per block (must be a multiple of 128).

Definition at line 58 of file delta.hpp.

◆ DEFAULT_MINIBLOCK_COUNT

constexpr size_t signet::forge::delta::DEFAULT_MINIBLOCK_COUNT = 4
inlineconstexpr

Default number of miniblocks within each block.

Definition at line 61 of file delta.hpp.

◆ VALUES_PER_MINIBLOCK

constexpr size_t signet::forge::delta::VALUES_PER_MINIBLOCK = DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT
inlineconstexpr

Number of delta values per miniblock (DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT).

Definition at line 64 of file delta.hpp.