In this blog post, we explain Zstandard (ZSTD), a fast data compression algorithm that offers best-in-kind performance, in a way that is super simple to understand. If you're interested in learning by watching or listening, check out a video about this open source project on our Facebook Open Source YouTube channel.
Zstandard (ZSTD) is a fast, lossless compression algorithm. It provides high compression ratios as well as great compression and decompression speeds, offering best-in-kind performance in many conventional situations. In addition to this, ZSTD now has a number of features that make a lot of real-world scenarios that have previously been difficult to achieve for compressors, possible.
There are three standard metrics for comparing compression algorithms and implementations:
Many of the algorithms commonly used today focus on one of the metrics from above or try to strike a balance between them. Several fast compression algorithms were tested and compared and as shown in the figure below, there are often drastic compromises between speed and size (source).
The fastest algorithm, Iz4 1.9.2, results in lower compression ratios; the one with the highest compression ratio (other than ZSTD), zlib 1.2.11-1, suffers from a slow compression speed. However, ZSTD shows substantial improvements in both compression speed and decompression speed, while maintaining a high compression ratio. Note that the negative compression levels, specified with --fast=X, offer faster compression and decompression speeds in exchange for some loss in compression ratio compared to level 1.
As shown in the chart below, ZSTD offers a very wide range of speed/compression trade-offs, which lets ZSTD trade compression speeds for better compression ratios and vice versa. ZSTD can provide these speeds because it is backed by an extremely fast decoder (source).
However, most of these results apply to typical file and stream scenarios, which are typically several MBs in size. Data smaller than this is handled in a slightly different manner.
Generally speaking, the smaller the amount of data to compress, the more difficult it is to compress. Compression algorithms learn from past data how to compress future data. At the beginning of a new data set, there is no past data to build upon, making it more challenging. To solve this problem, ZSTD offers a special training mode, which can be used to tune the algorithm for a selected type of data. A dictionary is generated from the results obtained from this training and helps capture common patterns in the data. This dictionary must be loaded before the compression and decompression. Once the patterns have been captured, the dictionary assumes future data will be similar and begins the compression. By using this dictionary, the compression ratio on small data improves drastically as shown in the graph below (source).
The type of data being compressed can also affect these metrics. Many algorithms are tuned for specific types of data, such as English text, genetic sequences, or rasterized images; however, ZSTD is meant for general-purpose compression for a variety of data types.
ZSTD was open-sourced in 2016 and is used continuously to compress large amounts of data in multiple formats in Facebook’s development servers, data warehouse, databases and compressed file systems as a powerful and flexible compressor engine. To get a better understanding of where ZSTD is used check out this Facebook Engineering blog that explains how Facebook improved compression at scale with ZSTD.
ZSTD is used by Linux, FreeBSD, Amazon Web Services, and many more. For a detailed list of industries where ZSTD is being used, check out their website.
ZSTD has a rich collection of APIs and supports a number of popular programming languages. To learn more about ZSTD, visit their website, which contains great information on the benchmarks and the various languages that are supported. If you’d like to learn about how to use this algorithm, build instructions and testing, make sure to visit the project’s Github page. For detailed API reference, check out its documentation.
If you have any further questions about ZSTD, please let us know on our YouTube channel, or by tweeting at us. We always want to hear from you and hope you will find this open source project and the ELI5 series useful.
In a series of short videos, one of our Developer Advocates on the Facebook Open Source team explains a Facebook open source project in a way that is easy to understand and use.
We will write an accompanying blog post (like the one you're reading right now) for each of these videos, which you can find on our YouTube channel.
Interested in working with open source at Facebook? Check out our open source-related job postings on our career page by taking this quick survey.