When it's time to show your trained models to the world you'll need to choose one or more deployment options. Fortunately, TensorFlow offers the tools and frameworks you'll need to deploy your models for a wide range of use cases.
Machine Learning (ML) serving systems need to support model versioning (for model updates with a rollback option) and multiple models (for A/B testing), while ensuring that concurrent models achieve high throughput on hardware accelerators (GPUs and TPUs) with low latency. TensorFlow Serving is currently handling tens of millions of inferences per second for 1100+ of Google projects, including Google’s Cloud ML Prediction.
TensorFlow Extended (TFX)
When you’re ready to go beyond training a single model, or ready to put your amazing model to work and move it to production, TFX is there to help you build a complete ML pipeline.
TFX pipelines can be deployed to on-premises infrastructure, or a hybrid of on-prem and cloud, a pure cloud deployment on the Google Cloud Platform. Your models can be deployed to be served online, or included in a mobile app, or both. You can run TFX on Flink or Spark cluster to distribute processing across your resources, and take advantage of Kubernetes for task management.
With a TFX pipeline you can continuously retrain and update your models, manage versioning and life cycle, monitor performance and validate new data, and perform A/B testing. With TFX, your models are ready for production.
TensorFlow Lite is the official solution for running machine learning models on mobile and embedded devices. It enables on‑device machine learning inference with low latency and a small binary size on Android, iOS, and other operating systems. Build a new model or retrain an existing one, such as using transfer learning. Convert a TensorFlow model into a compressed flat buffer with the TensorFlow Lite Converter. Take the compressed .tflite file and load it into a mobile or embedded device.