Signet Forge 0.1.0
C++20 Parquet library with AI-native extensions
DEMO
Loading...
Searching...
No Matches
snappy.hpp File Reference

Bundled, zero-dependency, header-only Snappy compression codec. More...

#include "signet/compression/codec.hpp"
#include <array>
#include <cstdint>
#include <cstring>
#include <string>
#include <vector>

Go to the source code of this file.

Classes

class  signet::forge::SnappyCodec
 Bundled Snappy compression codec (header-only, no external dependency). More...
 

Namespaces

namespace  signet
 
namespace  signet::forge
 
namespace  signet::forge::detail
 Internal implementation details for dictionary encoding.
 
namespace  signet::forge::detail::snappy
 

Functions

size_t signet::forge::detail::snappy::encode_varint32 (uint8_t *dst, uint32_t value)
 Encode a 32-bit unsigned integer as a Snappy varint (1-5 bytes).
 
bool signet::forge::detail::snappy::decode_varint32 (const uint8_t *data, size_t size, size_t &pos, uint32_t &out)
 Decode a Snappy varint from the input stream.
 
uint32_t signet::forge::detail::snappy::load_le32 (const uint8_t *p)
 Read a 32-bit little-endian value from a potentially unaligned pointer.
 
uint16_t signet::forge::detail::snappy::load_le16 (const uint8_t *p)
 Read a 16-bit little-endian value from a potentially unaligned pointer.
 
void signet::forge::detail::snappy::store_le16 (uint8_t *p, uint16_t v)
 Write a 16-bit little-endian value.
 
void signet::forge::detail::snappy::store_le32 (uint8_t *p, uint32_t v)
 Write a 32-bit little-endian value.
 
uint32_t signet::forge::detail::snappy::hash4 (const uint8_t *p)
 14-bit hash of 4 bytes read as a little-endian uint32.
 
void signet::forge::detail::snappy::emit_literal (std::vector< uint8_t > &out, const uint8_t *data, size_t length)
 Emit a literal element.
 
void signet::forge::detail::snappy::emit_copy (std::vector< uint8_t > &out, uint32_t offset, uint32_t length)
 Emit a copy element.
 
uint32_t signet::forge::detail::snappy::match_length (const uint8_t *src, size_t s1, size_t s2, size_t src_end)
 Find the match length between src[s1..] and src[s2..], bounded by src_end.
 
void signet::forge::register_snappy_codec ()
 Register the bundled Snappy codec with the global CodecRegistry.
 

Detailed Description

Bundled, zero-dependency, header-only Snappy compression codec.

Implements the Snappy framing-free compression format as specified at: https://github.com/google/snappy/blob/main/format_description.txt

This is a clean-room implementation providing correct Snappy compress and decompress for use in Parquet files, where Snappy is the most commonly used compression codec. The compressor is deliberately simple (single-pass, greedy hash-chain matching) and optimized for correctness over speed.

Wire format summary:

[varint: uncompressed_length] [element]...

Element types (low 2 bits of tag byte):

  • 00 = Literal – Copy raw bytes into output
  • 01 = Copy-1 – Short back-reference (offset up to 2047, length 4-11)
  • 02 = Copy-2 – Medium back-reference (offset up to 65535, length 1-64)
  • 03 = Copy-4 – Long back-reference (offset up to 2^32-1, length 1-64)
See also
CompressionCodec, CodecRegistry

Definition in file snappy.hpp.