Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
delta.hpp File Reference

DELTA_BINARY_PACKED encoding and decoding (Parquet encoding type 5). More...

#include <algorithm>
#include <cassert>
#include <cstdint>
#include <cstring>
#include <limits>
#include <vector>

Go to the source code of this file.

Namespaces

namespace  signet
 
namespace  signet::forge
 
namespace  signet::forge::delta
 

Functions

uint64_t signet::forge::delta::zigzag_encode (int64_t n)
 Zigzag-encode a signed 64-bit integer to an unsigned representation.
 
uint32_t signet::forge::delta::zigzag_encode32 (int32_t n)
 Zigzag-encode a signed 32-bit integer to an unsigned representation.
 
int64_t signet::forge::delta::zigzag_decode (uint64_t v)
 Zigzag-decode an unsigned 64-bit integer back to its signed representation.
 
int32_t signet::forge::delta::zigzag_decode32 (uint32_t v)
 Zigzag-decode an unsigned 32-bit integer back to its signed representation.
 
size_t signet::forge::delta::encode_uvarint (std::vector< uint8_t > &buf, uint64_t value)
 Encode an unsigned varint (LEB128) into a byte buffer.
 
uint64_t signet::forge::delta::decode_uvarint (const uint8_t *data, size_t &pos, size_t size)
 Decode an unsigned varint (LEB128) from a byte buffer.
 
int signet::forge::delta::bit_width_for (uint64_t value)
 Compute the minimum number of bits required to represent an unsigned value.
 
void signet::forge::delta::bit_pack_values (std::vector< uint8_t > &out, const uint64_t *values, size_t count, int bit_width)
 Bit-pack an arbitrary number of values at a given bit width.
 
void signet::forge::delta::bit_unpack_values (const uint8_t *src, uint64_t *values, size_t count, int bit_width)
 Unpack an arbitrary number of values at a given bit width from packed data.
 
std::vector< uint8_t > signet::forge::delta::encode_int64 (const int64_t *values, size_t count)
 Encode int64 values using the DELTA_BINARY_PACKED algorithm.
 
std::vector< uint8_t > signet::forge::delta::encode_int32 (const int32_t *values, size_t count)
 Encode int32 values using the DELTA_BINARY_PACKED algorithm.
 
std::vector< int64_t > signet::forge::delta::decode_int64 (const uint8_t *data, size_t size, size_t num_values)
 Decode DELTA_BINARY_PACKED data back to int64 values.
 
std::vector< int32_t > signet::forge::delta::decode_int32 (const uint8_t *data, size_t size, size_t num_values)
 Decode DELTA_BINARY_PACKED data back to int32 values.
 

Variables

constexpr size_t signet::forge::delta::DEFAULT_BLOCK_SIZE = 128
 Default number of delta values per block (must be a multiple of 128).
 
constexpr size_t signet::forge::delta::DEFAULT_MINIBLOCK_COUNT = 4
 Default number of miniblocks within each block.
 
constexpr size_t signet::forge::delta::VALUES_PER_MINIBLOCK = DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT
 Number of delta values per miniblock (DEFAULT_BLOCK_SIZE / DEFAULT_MINIBLOCK_COUNT).
 

Detailed Description

DELTA_BINARY_PACKED encoding and decoding (Parquet encoding type 5).

Delta-encodes int32/int64 values for high compression on sorted or monotonic sequences (timestamps, sequence IDs, etc.). Achieves 90%+ compression on sorted time-series data. All functions reside in the signet::forge::delta namespace.

See also
https://parquet.apache.org/documentation/latest/ (DELTA_BINARY_PACKED)

Definition in file delta.hpp.