NNCP: Lossless Data Compression with Neural Networks

NNCP is an experiment to build a practical lossless data compressor with neural networks. The best performer uses an LSTM model. A model based on self-attention (Transformer) is also evaluated.

The algorithms and results are described in this paper.

NNCP is based on the LibNC library which allows fast and deterministic evaluation and training of neural networks on x86 CPUs. It is optimized for small batch sizes and low latency. LibNC has no dependency on other libraries and has a C API.

Compression ratio

Result for enwik8:

Program	Compr. size (bytes)	Ratio (bpb)
gzip	36 445 248	2.92
xz	24 865 244	1.99
NNCP (2019-06-29)	16 571 476	1.33
CMIX (v17)	14 877 373	1.19

Result for enwik9:

Program	Compr. size (bytes)	Ratio (bpb)	Program size (zip, bytes)	Total (bytes)
gzip	322 591 995	2.58	38 801	322 630 796
xz	197 331 816	1.58	36 752	197 368 568
NNCP (2019-06-29)	123 050 014	0.98	166 690	123 216 704
CMIX (v17)	116 394 271	0.93	208 263	116 602 534

* The results for the other programs are from the Large Text Compression Benchmark.

Download

Linux version: nncp-2019-06-29.tar.gz. LibNC is currently only provided as object code.

Precompiled Windows version: nncp-2019-06-29-win64.zip.

NNCP: Lossless Data Compression with Neural Networks

Compression ratio

Download

Related Links