Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
dictionary.hpp File Reference

Dictionary encoding and decoding for Parquet (PLAIN_DICTIONARY / RLE_DICTIONARY). More...

#include "signet/encoding/rle.hpp"
#include "signet/error.hpp"
#include "signet/types.hpp"
#include <cassert>
#include <cstdint>
#include <cstring>
#include <limits>
#include <stdexcept>
#include <string>
#include <unordered_map>
#include <vector>

Go to the source code of this file.

Classes

class  signet::forge::DictionaryEncoder< T >
 Dictionary encoder for Parquet PLAIN_DICTIONARY / RLE_DICTIONARY encoding. More...
 
class  signet::forge::DictionaryDecoder< T >
 Dictionary decoder for Parquet PLAIN_DICTIONARY / RLE_DICTIONARY encoding. More...
 

Namespaces

namespace  signet
 
namespace  signet::forge
 
namespace  signet::forge::detail
 Internal implementation details for dictionary encoding.
 

Functions

int signet::forge::detail::dict_bit_width (size_t dict_size)
 Compute the minimum bit width needed to represent dictionary indices.
 
void signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, const std::string &val)
 Append a string value in PLAIN BYTE_ARRAY format (4-byte LE length prefix + raw bytes).
 
void signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, int32_t val)
 Append an int32_t in PLAIN format (4-byte little-endian).
 
void signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, int64_t val)
 Append an int64_t in PLAIN format (8-byte little-endian).
 
void signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, float val)
 Append a float in PLAIN format (4-byte little-endian, IEEE 754).
 
void signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, double val)
 Append a double in PLAIN format (8-byte little-endian, IEEE 754).
 
std::string signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, std::string *)
 Decode a string from PLAIN BYTE_ARRAY format at data[pos].
 
int32_t signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, int32_t *)
 Decode an int32_t from PLAIN format at data[pos].
 
int64_t signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, int64_t *)
 Decode an int64_t from PLAIN format at data[pos].
 
float signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, float *)
 Decode a float from PLAIN format at data[pos].
 
double signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, double *)
 Decode a double from PLAIN format at data[pos].
 

Detailed Description

Dictionary encoding and decoding for Parquet (PLAIN_DICTIONARY / RLE_DICTIONARY).

Implements PLAIN_DICTIONARY (encoding type 2) for dictionary pages and RLE_DICTIONARY (encoding type 8) for data pages. Critical for low-cardinality columns (symbols, sides, exchanges) where 10–50x compression is typical. The encoder (DictionaryEncoder) and decoder (DictionaryDecoder) are both templated on the value type.

See also
rle.hpp for the RLE/Bit-Packing Hybrid used to encode dictionary indices

Definition in file dictionary.hpp.