OpenZL JNI – Java API documentation
Java bindings for the Meta OpenZL compressor.
Getting started
Add the portable Java artifact and the classifier that matches your platform. Replace the classifier with
macos_arm64
or windows_amd64
when needed.
<dependency>
<groupId>io.github.hybledav</groupId>
<artifactId>openzl-jni</artifactId>
<version>VERSION</version>
</dependency>
<dependency>
<groupId>io.github.hybledav</groupId>
<artifactId>openzl-jni</artifactId>
<version>VERSION</version>
<classifier>linux_amd64</classifier>
</dependency>
The Java façade is compatible with Java 21+. Classes are compiled with --release 11
.
OpenZLNative.load()
extracts the bundled libopenzl_jni
library automatically.
Published coordinates: io.github.hybledav:openzl-jni
on Maven Central. Replace VERSION
with the release listed there.
Quick start: byte array round trip
import io.github.hybledav.OpenZLCompressor;
import io.github.hybledav.OpenZLCompressionInfo;
byte[] payload = "openzl-jni quick start".getBytes(StandardCharsets.UTF_8);
try (OpenZLCompressor compressor = new OpenZLCompressor()) {
byte[] compressed = compressor.compress(payload);
byte[] restored = compressor.decompress(compressed);
OpenZLCompressionInfo info = compressor.inspect(compressed);
System.out.printf("restored=%s, compressed=%d bytes%n",
java.util.Arrays.equals(payload, restored),
info.compressedSize());
}
By default OpenZLCompressor
uses the ZSTD
graph. The inspect call returns size
information, detected graph, data flavour, element count, and format version.
Direct buffers and pooling
import io.github.hybledav.OpenZLBufferManager;
import io.github.hybledav.OpenZLCompressor;
byte[] payload = "direct buffers keep JNI zero-copy".getBytes(StandardCharsets.UTF_8);
try (OpenZLBufferManager buffers = OpenZLBufferManager.builder()
.minimumCapacity(1 << 12)
.alignment(256)
.build();
OpenZLCompressor compressor = new OpenZLCompressor()) {
ByteBuffer src = buffers.acquire(payload.length);
src.put(payload).flip();
ByteBuffer compressed = compressor.compress(src, buffers);
ByteBuffer restored = compressor.decompress(compressed, buffers);
byte[] roundTrip = new byte[restored.remaining()];
restored.get(roundTrip);
System.out.println("round-trip ok? " + java.util.Arrays.equals(payload, roundTrip));
buffers.release(src);
buffers.release(compressed);
buffers.release(restored);
}
Use compress(src, dst)
when you manage the destination buffer yourself. The static helper
OpenZLCompressor.maxCompressedSize(int)
returns the upper bound used by acquireForCompression
.
Numeric helpers
import io.github.hybledav.OpenZLCompressor;
import io.github.hybledav.OpenZLCompressionInfo;
import io.github.hybledav.OpenZLGraph;
int[] readings = new int[1024];
for (int i = 0; i < readings.length; i++) {
readings[i] = (i % 17) * 42;
}
try (OpenZLCompressor compressor = new OpenZLCompressor(OpenZLGraph.NUMERIC)) {
byte[] compressed = compressor.compressInts(readings);
int[] restored = compressor.decompressInts(compressed);
OpenZLCompressionInfo info = compressor.inspect(compressed);
System.out.println("graph=" + info.graph() + ", flavor=" + info.flavor());
}
Array helpers exist for long
, float
, and double
. Empty arrays return byte[0]
.
These methods avoid extra array-to-byte conversions when the data is already typed.
Structured data with SDDL
import io.github.hybledav.OpenZLCompressor;
import io.github.hybledav.OpenZLProfile;
import io.github.hybledav.OpenZLSddl;
String rowStreamSddl = String.join("\n",
"field_width = 4;",
"Field1 = Byte[field_width];",
"Field2 = Byte[field_width];",
"Row = {",
" Field1;",
" Field2;",
"};",
"row_width = sizeof Row;",
"input_size = _rem;",
"row_count = input_size / row_width;",
"expect input_size % row_width == 0;",
"RowArray = Row[row_count];",
": RowArray;");
byte[] compiled = OpenZLSddl.compile(rowStreamSddl, true, 0);
byte[] payload = "12345678".repeat(128).getBytes(StandardCharsets.US_ASCII);
try (OpenZLCompressor compressor = new OpenZLCompressor()) {
compressor.configureProfile(OpenZLProfile.SERIAL, java.util.Map.of());
byte[] serial = compressor.compress(payload);
compressor.reset();
compressor.configureSddl(compiled);
byte[] structured = compressor.compress(payload);
System.out.printf("serial=%d B, sddl=%d B%n", serial.length, structured.length);
}
Reset between experiments to clear the previous graph state. SDDL can improve compression whenever the input layout matches the declared structure.
Inspecting frame metadata
import io.github.hybledav.OpenZLCompressor;
import io.github.hybledav.OpenZLCompressionInfo;
import io.github.hybledav.OpenZLCompressionLevel;
byte[] payload = "inspect me for metadata".getBytes(StandardCharsets.UTF_8);
try (OpenZLCompressor compressor = new OpenZLCompressor()) {
compressor.setCompressionLevel(OpenZLCompressionLevel.LEVEL_5);
byte[] compressed = compressor.compress(payload);
OpenZLCompressionInfo info = compressor.inspect(compressed);
System.out.printf("original=%d, compressed=%d, flavor=%s%n",
info.originalSize(),
info.compressedSize(),
info.flavor());
}
OpenZLCompressionInfo
also exposes compressionRatio()
, the inferred OpenZLGraph
, and
optional element counts for structured frames. Use serialize()
or serializeToJson()
to persist the compressor state.
Graphs
Graph | ID | Description |
---|---|---|
AUTO | -1 | Select the default graph (currently ZSTD). |
ZSTD | 0 | General purpose LZ77 + entropy. |
GENERIC | 1 | Lightweight transforms before entropy coding. |
NUMERIC | 2 | Optimised for numeric primitives. |
STORE | 3 | No compression (passthrough). |
BITPACK | 4 | Bitpacking for small ranges. |
FSE | 5 | Finite State Entropy. |
HUFFMAN | 6 | Huffman-only entropy stage. |
ENTROPY | 7 | Generic entropy coding. |
CONSTANT | 8 | Fast path for constant or near-constant inputs. |
Profiles
Profile | Key | Use case |
---|---|---|
SERIAL | serial | General byte streams. |
PYTORCH | pytorch | Tensors exported from PyTorch. |
CSV | csv | Comma-separated rows. |
LE I16 | le-i16 | Little-endian signed 16-bit sequences. |
LE U16 | le-u16 | Little-endian unsigned 16-bit sequences. |
LE I32 | le-i32 | Little-endian signed 32-bit values. |
LE U32 | le-u32 | Little-endian unsigned 32-bit values. |
LE I64 | le-i64 | Little-endian signed 64-bit values. |
LE U64 | le-u64 | Little-endian unsigned 64-bit values. |
PARQUET | parquet | Columnar data exported from Parquet. |
SDDL | sddl | Precompiled SDDL programs. |
SAO | sao | Sample Adaptive Optimisation pipeline. |
List available profiles at runtime with OpenZLCompressor.listProfiles()
. Use
configureProfile(profile, Map<String,String> args)
to pass profile-specific parameters.
Training
Two convenience methods expose the native trainer:
OpenZLCompressor.train(String profile, byte[][] inputs, TrainOptions opts)
OpenZLCompressor.trainFromDirectory(String profile, String dir, TrainOptions opts)
TrainOptions
lets you control the maximum time, parallelism, requested sample count, and whether to compute
the Pareto frontier. The helper writes samples to a temporary directory, hands it to the native trainer, and returns
serialized compressors that you can feed back into configureProfile
.
Streaming APIs are tracked upstream in OpenZL issue #128; follow TODO.md
for progress in this repository.