![]() |
Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
|
DEMO |
Dictionary encoding and decoding for Parquet (PLAIN_DICTIONARY / RLE_DICTIONARY). More...
#include "signet/encoding/rle.hpp"#include "signet/error.hpp"#include "signet/types.hpp"#include <cassert>#include <cstdint>#include <cstring>#include <limits>#include <stdexcept>#include <string>#include <unordered_map>#include <vector>Go to the source code of this file.
Classes | |
| class | signet::forge::DictionaryEncoder< T > |
| Dictionary encoder for Parquet PLAIN_DICTIONARY / RLE_DICTIONARY encoding. More... | |
| class | signet::forge::DictionaryDecoder< T > |
| Dictionary decoder for Parquet PLAIN_DICTIONARY / RLE_DICTIONARY encoding. More... | |
Namespaces | |
| namespace | signet |
| namespace | signet::forge |
| namespace | signet::forge::detail |
| Internal implementation details for dictionary encoding. | |
Functions | |
| int | signet::forge::detail::dict_bit_width (size_t dict_size) |
| Compute the minimum bit width needed to represent dictionary indices. | |
| void | signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, const std::string &val) |
| Append a string value in PLAIN BYTE_ARRAY format (4-byte LE length prefix + raw bytes). | |
| void | signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, int32_t val) |
| Append an int32_t in PLAIN format (4-byte little-endian). | |
| void | signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, int64_t val) |
| Append an int64_t in PLAIN format (8-byte little-endian). | |
| void | signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, float val) |
| Append a float in PLAIN format (4-byte little-endian, IEEE 754). | |
| void | signet::forge::detail::plain_encode_value (std::vector< uint8_t > &buf, double val) |
| Append a double in PLAIN format (8-byte little-endian, IEEE 754). | |
| std::string | signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, std::string *) |
Decode a string from PLAIN BYTE_ARRAY format at data[pos]. | |
| int32_t | signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, int32_t *) |
Decode an int32_t from PLAIN format at data[pos]. | |
| int64_t | signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, int64_t *) |
Decode an int64_t from PLAIN format at data[pos]. | |
| float | signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, float *) |
Decode a float from PLAIN format at data[pos]. | |
| double | signet::forge::detail::plain_decode_value (const uint8_t *data, size_t &pos, size_t size, double *) |
Decode a double from PLAIN format at data[pos]. | |
Dictionary encoding and decoding for Parquet (PLAIN_DICTIONARY / RLE_DICTIONARY).
Implements PLAIN_DICTIONARY (encoding type 2) for dictionary pages and RLE_DICTIONARY (encoding type 8) for data pages. Critical for low-cardinality columns (symbols, sides, exchanges) where 10–50x compression is typical. The encoder (DictionaryEncoder) and decoder (DictionaryDecoder) are both templated on the value type.
Definition in file dictionary.hpp.