On-Device AI — Edge AI vs End-Point AI

This is the first in a series of articles that demystifies Edge AI.

AI ON-DEVICE AI

Shyam Vedantam

11/8/2021 4 min read

Edge AI has become a buzzword and there are several companies building AI and ML technology for the Edge. However, all on-device AI isn’t really Edge AI. This article talk about the spectrum of AI — from cloud to edge to end-point and establishes the key difference between AI on Edge vs End-points.

Cloud to Edge transition

Traditionally, most of AI or more specifically ML workloads have been processed on Cloud. The advantages of cloud for such workloads is obvious — infinitely scalable compute, memory and storage. While a majority of AI workloads shall continue to be processed on the Cloud, devices are also become increasingly powerful. At the same time, the need to have a higher degree of intelligence and processing closer to source of data for reasons such as latency or availability of connectivity or security or data privacy has resulted in significant advancement of on-device computing over the last decade. However such AI workloads are extending not just down to the Edge but also to End-points or also increasingly referred to as Micro Edge.

The architecture shown in the figure above is similar to that of currently prevalent IoT architecture. However, infusion of AI right down to IoT end-points make these erstwhile dumb nodes more smarter.

AI — Cloud to Edge to End-Point

Edge AI

Any AI where data processing i.e. inference is not done on the cloud but on a device closer to the source of data is generally defined as Edge AI. This edge could be a Device Edge (such an IoT Edge Gateway that typically bridges an IT and OT systems) or Network Edge (such as a device that provides an entry point into enterprise or 4G/5G service provider’s core networks). Edge AI moves ML and AI processing from cloud to intelligent compute HW platforms which typically reside on-premise ie offices, factories, telecom base stations etc. Moving compute closer to the source of data eliminates latency and data privacy issues to a large extent. Many of IoT’s use cases can get more effectively realized with Edge AI.

Edge AI applications extend across Automotive, Industrial, Retail, Telecommunications, Consumer Devices and other domains. Typical use cases include computer vision, NLP, 4G or 5G networking, Security etc with diverse data sets related to image, audio, video, machines or enterprise data. Another Edge AI use case in sensor fusion where data from multiple sensors are combined and the collective intelligence is used for critical decision-making right at the Edge.

Most of Edge AI today gets processed on GPU or CPU class devices from semiconductor majors such as Nvidia, Intel or Qualcomm. Typically these devices are general purpose processors that can be used for graphical, gaming or signal-processing workloads and are not specially designed for Neural Network processing. They offer reasonably high performance with 10s of TOPS and constraints to some extent in terms of memory or model size or power efficiency. Over the last few years, several ASIC or FPGA based alternatives for AI accelerators have emerged. These might be established players like Xilinx or from a newer crop of 50+ semiconductor startups vying for a share of the rapidly expanding AI HW pie. Some of these offer benefits in both of both performance and power efficiency.

On the AI software side, several AI developer tools and frameworks have transitioned from cloud to the edge. Deep learning (DL) frameworks such TensorFlow, Keras, PyTorch, Caffe2, MxNet, Microsoft Cognitive Toolkit, ONNX etc offer support for most GPU, CPU and FPGA platforms and can build CNN, RNN models for vision and NLP applications.

In addition, there are several DL compilers such as TVM, Glow, XLA with front-end and back-end components that takes a DL model and transforms it into a low-level hardware-optimized Intermediate Representation (IR). Established player like Nvidia have built mature software development tools such as CUDA with with support for TensorRT and DeepStream SDK for computer vision use cases.

End-point or Micro Edge AI

End-point or Micro Edge devices are physical devices that are lower than Edge devices in the order of hierarchy and typically right at the source of data. In most of the cases, these are sensor nodes and are battery operated. Bringing autonomous intelligence to these end-points further reduces the latency and enhances real-time decision making capabilities.

This is true of both safety or mission critical applications in automotive & industrial control or non-critical applications in retail, homes or offices. For example, an occupancy monitoring solution in an office can be a lot more efficient if an intelligent end-point can not just detect but act of occupancy data instead of sending camera or sensor data from multiple end-points to a central edge device. On-device End-point AI enables this kind of distributed decision making across the entire hierarchy of devices. As on-device AI become more widely adopted over the next decade, end-point AI shall constitute 70% of all on-device AI.

Given that end-points typically perform one task (ie sensing or monitoring or control of one attribute), such devices are build to address a specific price point. They have significant constraints in terms of power, compute and memory. Most of the Edge AI hardware and software technology discussed above are not suitable for End-point AI. A new breed of AI technology is being built to deliver the same.

In terms of AI or DL Hardware, a new range of MCU-class Neural Processing Units (NPU) are being built on ARM or ASIC architectures. These are purpose-built, ultra-low power SoCs for tensor operations and can deliver up to 5 TOPS or more at <100 uW of power consumption for inference tasks. These NPUs can support CNN, RNN models for image, audio and NLP use cases right at end-point. Few examples are Arm Ethos, Synaptics Katana etc. In some cases, a MCU and NPU can work in tandem to process both AI and non-AI workloads thereby increasing the performance of the entire system.

Regarding AI Software for End-points, most of DL frameworks and compilers are too compute-heavy and power-hungry for such applications. TensorFlow Lite and TinyML have emerged as the DL frameworks of choice for such small footprint DL applications. In addition, most NPU and AI accelerator chip manufacturers are also investing time and effort in building AI SDKs that can process NN models efficiently without a performance penalty on small footprint HW platforms.

Also there are NPU HW options from mobile silicon manufacturers such as Samsung, Qualcomm etc and Android NNAPI framework that can truly make a mobile device an all-pervasive AI end-point.

Summary

For on-device AI to get more widely adopted, it is critical that both DL Hardware and software work in tandem to enable complex AI workloads with CNN and RNN models to run efficiently on low-footprint HW.

As on-device AI becomes widely adopted, the entire spectrum of AI capability — from Cloud to Edge to End-point, shall truly realize the paradigm of AIOT or ‘Internet of Intelligent Things’.

In stories to follow, the full technology stack for Edge AI and End-point AI shall be explored in detail.

Nuvos Advisory Services LLP

connect@nuvosadvisory.com