May 2, 2018
In the beginning of the recent deep learning revolution, researchers had only a handful of tools (such as Torch, Theano, and Caffe) to work with, but today there is a robust ecosystem of deep learning frameworks and hardware runtimes. While this growing toolbox is extremely useful, each framework has the potential to become an island unto itself without interoperability. But interoperability requires a lot of custom integration work for each possible framework/runtime pair, and reimplementing models to move between frameworks is typically difficult and can slow development by weeks or months.
Facebook helped develop the Open Neural Network Exchange (ONNX) format to allow AI engineers to more easily move models between frameworks without having to do resource-intensive custom engineering. Today, we're sharing that ONNX is adding support for additional AI tools, including Baidu's PaddlePaddle platform, and Qualcomm SNPE. ONNX is also adding a production-ready converter for Apple Core ML technology. With these additions, ONNX now works with the vast majority of model types and can be deployed to millions of mobile devices. With ONNX, AI engineers can develop their models using any number of supported frameworks, export models to another framework tooled for production serving, or export to hardware runtimes for optimized inference on specific devices. As a result, engineers can now develop and implement their latest research much more quickly and flexibly, while taking advantage of a broad range of tools. In practice, accelerating the "research to production" pipeline has the potential to more quickly bring powerful AI capabilities to real-world applications and drive new experiences.
Major technology companies have fueled AI development by open-sourcing or actively backing various deep learning frameworks. These include Amazon Web Services (Apache MXNet), Facebook (Caffe2 and PyTorch, and now PyTorch 1.0, which is under development), Google (TensorFlow), and Microsoft (Cognitive Toolkit). There is also a growing ecosystem of hardware runtimes such as NVIDIA's TensorRT and Intel's nGraph to help ease and optimize for the "last mile" deployment onto devices supporting techniques such as quantization and layer fusion.
We began a collaboration with Microsoft in September 2017 to launch the ONNX specification with the purpose of making the deep learning ecosystem interoperable. Since then, Amazon Web Services, AMD, ARM, Huawei, IBM, Intel, NVIDIA, and Qualcomm joined the effort; more recently, Baidu, Bitmain, Mediatek, and Preferred Networks also joined.
ONNX already allows AI engineers to use model types such as convolutional neural networks (CNN) and long short-term memory (LSTM) units freely within a broad ecosystem of frameworks, converters, and runtimes. This flexibility lets engineers focus more on the problem they are trying to solve and less on which tools to use.
As new capabilities are added to ONNX, developers will be able to deploy more types of models, use quantized data formats, and go beyond inference to support model training. Furthermore, this kind of cross-system interoperability will allow the rapidly growing AI community to collaborate more closely and benefit from advances across the field.
While it is exciting to see such a broad ecosystem form so quickly, we are particularly encouraged to see how the AI community has already built and launched solutions and libraries that are production-ready. These advances show ONNX is moving past the initial development phase and into something that can be used in large-scale environments supporting a variety of use cases. Some of the recently launched products and community projects include:
Official Core ML support - Core ML enables developers to quickly build apps with intelligent new features across Apple products. The ONNX community now has access to a production-ready Core ML converter which allows developers to consume ONNX-formatted models directly into Core ML and integrate directly into iOS apps.
Snapdragon Neural Processing Engine production support - The Qualcomm Snapdragon Neural Processing Engine SDK is designed to help developers run one or more neural network models trained and exported in the ONNX format on Snapdragon mobile platforms, whether that is the CPU, GPU, or DSP.
Baidu's PaddlePaddle - PaddlePaddle (PArallel Distributed Deep LEarning) is a deep learning platform originally developed by Baidu scientists and engineers to use on the company's own products. Today, the Paddle team released the first version of their ONNX-formatted model exporter publicly, allowing AI developers to leverage a variety of datacenter, mobile, and embedded inference runtimes.
NVIDIA TensorRT 4 - TensorRT is a deep learning inference optimizer and runtime. The native ONNX parser in TensorRT 4 provides an easy path to import ONNX models from frameworks such as Caffe2, Chainer, Microsoft Cognitive Toolkit, Apache MxNet and PyTorch into TensorRT.
While ONNX is making strides in adoption and ecosystem expansion, there is still a lot to do. We continue to make progress in a number of areas, all of which are open to community participation and contributions:
NLP support - Modern NLP (natural language processing) is an important application for deep learning. In addition, modern NLP networks are often nontrivial to implement and even more difficult to transfer between frameworks. These networks are not typically handled uniformly across the landscape of frameworks. ONNX's ability to connect these networks can be a very compelling feature. NLP networks, including recurrent networks, are built on dynamic control structures. Standardizing the handling of these structures can lead to better collaboration with backends to expose network semantics and achieve better performance. A tradition has developed within the computer vision field for optimizing hardware backends for canonical vision models, such as ResNet-50. There hasn't been a similar tradition in the NLP field, but through standardizing the representation of NLP networks, we can give vendors a common representation and push forward the performance of NLP models.
A common framework interface proposal - Leading hardware and systems vendors offer highly optimized software to run neural network graphs. This software can deliver order-of-magnitude speedups compared with generic implementations, but their integration with deep learning frameworks and applications is complicated by a wide variety of vendor-specific interfaces and subtle incompatibilities with the software stack of high-level applications. So far, ONNX format targets the problem of offline conversion of neural network models between different high-level frameworks and vendor-specific libraries through offline translation. In this proposal, we analyze how the ONNX ecosystem could be enriched to enable runtime discovery and selection of high-performance graph execution backends, and online conversion of ONNX graph to internal representations of these implementations.
Community working groups - In addition to work on the ONNX core and interface, there are efforts to bring the ONNX paradigm to areas like model training (in addition to inference), as well as to build support for model quantization and compression, to create a test and compliance tool set, and to continue the expansion of a model zoo containing pre-training ONNX models. If you'd like to participate, please reach out to the individual working group leads on Github.
Beyond these important additions to the ONNX ecosystem, we also are adapting it for use as an internal intermediate representation in PyTorch 1.0, our new flexible AI framework for both development and production. We are confident ONNX will continue to grow and find new uses to drive AI development and implementation.
Engineers looking to find out more about ONNX can use these resources:
We believe ONNX is off to a great start and can be even better with your help. We are actively looking for partners to participate in working groups, evangelize usage, and contribute directly to the project. Come join us if you'd like to contribute to ONNX's development.
Visit the Facebook Engineering Blog at code.fb.com for more news.