NNCP: Lossless Data Compression with Neural Networks

NNCP is an experiment to build a practical lossless data compressor with neural networks. The best performer uses an LSTM model. A model based on self-attention (Transformer) is also evaluated.

The algorithms and results are described in this paper.

NNCP is based on the LibNC library which allows fast and deterministic evaluation and training of neural networks on x86 CPUs. It is optimized for small batch sizes and low latency. LibNC has no dependency on other libraries and has a C API.

Compression ratio

Result for enwik8:

Program Compr. size
(bytes)
Ratio
(bpb)
gzip 36 445 2482.92
xz 24 865 2441.99
NNCP (2019-06-29)16 571 4761.33
CMIX (v17) 14 877 3731.19

Result for enwik9:

Program Compr. size
(bytes)
Ratio
(bpb)
Program size
(zip, bytes)
Total
(bytes)
gzip 322 591 995 2.5838 801322 630 796
xz 197 331 816 1.5836 752197 368 568
NNCP (2019-06-29) 123 050 0140.98166 690123 216 704
CMIX (v17) 116 394 271 0.93208 263116 602 534

* The results for the other programs are from the Large Text Compression Benchmark.

Download

Linux version: nncp-2019-06-29.tar.gz. LibNC is currently only provided as object code.

Precompiled Windows version: nncp-2019-06-29-win64.zip.

Related Links


Fabrice Bellard - https://bellard.org/