Core ML – inference on iOS

19.8.2019 | 7 minutes of reading time

In machine learning, we are training a model for a particular task, e.g. distinguishing dogs and cats in pictures. Inference refers to the application of the model. Most of the inference applications are addressed via a client-server API or used in batch mode. In comparison, applications (such as Apple’s FaceID) run directly on the mobile device. The on-device inference has the benefit of low latencies, which creates an excellent user experience. In these applications, the topic of reasoning on mobile devices is gaining more and more attention. In addition to Apple, Google explores various hardware options to deploy the resource-intensive models on mobile devices. The goal of this article is to show the possibilities of inference with an iOS device. In addition to the advantages and disadvantages of mobile inference, we are taking a look at the Core ML framework, the Neural Engine and the hardware innovation in the mobile area.

Inference on-device

Let’s look at the pros and cons of putting a model on a mobile device.


  • Latency: There is no network traffic generated by using on-device inference. The prediction is computed directly on the hardware, that has the side benefit to use the application in an offline environment.
  • Data Security: There is no data movement involved when it comes to computing the prediction. The data does not have to leave the device, which introduces a certain level of data safety.


  • Updating the models: When we want to publish a newly trained model, we have to release a new version of the application itself. For my experience in the mobile development area, it usually takes some time until all users have their app updated. Furthermore, the models are consuming a lot of space, which is also a downside for some users.
  • Speed of the hardware: Depending on the equipment used, the computing time of the models may vary significantly. While newer devices such as the iPhone XS include specialized machine learning hardware, performance degradation can occur on older devices.

Both the latency and the given data security are fascinating arguments to deal with the subject of mobile machine learning more closely. One of the most critical factors in the successful application of the models is the speed of the hardware. Apple has equipped the A11 and A12 Bionic chip with specialized hardware to run neural networks on the iPhone efficiently. For these reasons, we want to dive deeper into the subject of machine learning on the iPhone.

Core ML

Core ML is a machine learning framework developed by Apple. Compared to PyTorch and TensorFlow, that are used to train models, Core ML has a focus on deployment and runtime of the models. With Core ML 3 on-device training is possible. The developer must have already trained a model to be then able to execute it with Core ML or integrate it into an iOS app. Before we can integrate the model into the application with Core ML, a conversion to the Core ML format is necessary. Mainly, Core ML can only be used within the Apple ecosystem and not for Android applications.

Core ML stack (https://developer.apple.com/documentation/coreml)

Core ML uses Accelerate , Basic neural network subroutines (BNNS) and Metal Performance Shaders (MPS) libraries, which primarily cover low-level neural network, inference, CPU, and GPU operations. These libraries greatly facilitate access to machine learning on iOS. Furthermore, Apple has developed the frameworks Vision and Natural Language to perform feature extraction on image and text data. For example, existing models of the Vision library can recognize faces, texts and barcodes on images. Then this information can act as features for your models.

Alternatives to Core ML

In addition to Core ML, there are of course also other ways to take a model on an iOS terminal in operation, for example, TensorFlow Lite . The significant advantage of this is that a model can be used directly on different platforms like Android. However, this involves some disadvantages. XCode provides access to Core ML. We don’t need to set up complex environments to start developing. Furthermore, Core ML is optimized to run on iOS. As a result, Core ML’s performance is significantly better compared to TensorFlow Lite. It is worth taking a look at the article by Andrey Logvinenko, who has studied the performance differences in detail.

Neural Engine 

After taking a look at the software aspects of the iOS ecosystem, let’s look at the hardware features of the iPhone XS introduced last year. The A12 Bionic chip consists of a computing unit (six cores), a graphics unit (four cores) and a neural processing unit (also known as Neural Engine). The Neural Engine is the centrepiece for running models. For example, FaceID and Siri use the Neural Engine to make predictions. Although these applications could also be carried out with the CPU, this would significantly increase the computing time and energy consumption. The Neural Engine consists of eight cores and theoretically can perform up to 5 trillion calculations per second. As a developer, we have access to the Neural Engine and can run our models on it.

Looking at the development of the A * Bionic series, it becomes clear that Apple puts a lot of effort into further development. On the A12 Bionic chip, Core ML is up to 9 times faster than its predecessor A11 Bionic . Various experiments from the community, such as Yolo and Core ML , show that a similar improvement is achieved.

Image recognition with Core ML

To use a trained model with Core ML, we need to convert it with the Python Package coremltools . Besides the coremltools, there are also third-party conversion tools, such as TensorFlow converter . 

We are using a Keras model that can distinguish between dogs and cats. In the app, either a picture taken with the camera or removed from the photo library.

Keras model to Core ML model with coremltools

Before the conversion, the Keras model must be saved with model.save (“model.h5”). Finally, the model can be converted using the method coremltools.converters.keras.convert. You must specify different metadata such as the classes. Furthermore, additional preprocessing methods such as a normalization of the data can be specified. In our case, we have the two classes Cat (0) and Dogs (1). The image_scale, green_bias, red_bias, and blue_bias fields specify the preprocessing values. In this example, we use MobileNet preprocessing. After conversion, the model must be saved as “.mlmodel”. Core ML can then read this in an app.

For integration into an app, the file must be added to the Xcode project. In XCode, you can see which model parameters are given for the input and output of the data. In our case, we need as input an RGB image with 224×224 pixels. The output of the model is the highest-probability label and a hashmap that contains the likelihood of the labels.

Integration of Core ML model into XCode

The prediction works with the model.prediction (image: features) method. For this, the model must first be loaded. With the class UIImage the image data can be processed. Besides, we have added the methods resize and pixelBuffer to the class. The resize method can be used to resize images to 224×224 pixels to prepare for the prediction. The pixel buffer serves as the input vector for the model. 

Application to predict cats and dogs


In this article, Core ML and Apple’s hardware innovations were introduced to enable inference on iOS. While frameworks such as TensorFlow can both train new models and infer models, Core ML only allows inference. A model trained with a TensorFlow or another Third-Party Library needs to be converted to Core ML using the Python library coremltools. Then the converted model can be integrated into an app and run through Core ML. In addition to Core ML, there are other frameworks such as TensorFlow Lite to perform inference on iOS. One of the core strengths of Core ML compared to the other frameworks is its performance. Core ML is much faster due to the hardware optimizations. In addition to software development, Apple is investing in hardware innovations. The Neural Engine has created a vital core that provides iOS devices with sufficient resources to enable inference on the end device. This ensures the privacy of the data without compromising the performance of the models. In conclusion, Apple has created an ecosystem through Core ML and hardware innovation that makes it easy to use machine learning in apps.

share post




More articles in this subject area\n

Discover exciting further topics and let the codecentric world inspire you.


Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.