Ravindra Bidnur: AI in the Eyes of Semiconductor

Abstract:

Semiconductor chips enabling the Artificial Intelligence products and systems. The higher technology nodes enabled higher integration capability and hence provides a road-map of improved products and systems over a period of time. Over the next couple of decades, the technological developments around storage and processing power will enable some innovative products that we know and love today, such as Netflix’s recommendation engine or self-driving cars. In general, the AI systems comprises of two major functionalities namely machine learning and then the inference. Both these functions need specific set of computation and storage needs to enable the target application in an effective way. This blog talks about machine learning and inference concepts, scope of semiconductor chips in AI systems and types of different chips to enable based on the applications.

What is Machine Learning?

Machine Learning is a major field in data science. In simple words Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. The amount of data to be analyzed is growing leaps and bounds in current applications and hence necessitating the need of machine interference in data analysis. Just to visualize the amount data, take an example of how e-commerce platforms like Amazon, Flipcart or Netflix comes up of interested or probable items to buy when user login. This is done through tracking his items he searches/orders(learning) and then trying to infer what items he may be interested.

Machine learning can also be treated as a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights within data mining projects. These insights subsequently drive decision making within applications and businesses.

Let us briefly look into how machine learning works or in other wards what are the basic functionalities involved in the aspect of machine learning. One can classify the into basic 7 steps in this machine learning

Each of the seven steps depicted in the adjacent picture logically explains and connect each step. Starting with collection of data for an end application, massaging/preparing the useful data from raw data collected to feed into the model to analyse for the required result and then keep refining the results with new data each time. This results in a required model for the application which will be used for future predications. In one looks deeply into each step, it involves deep analytical and mathematical knowledge clubbed with probability theory and statistics. Using this knowledge along with programming skills (Python etc.) one can develop s/w tools implementing algorithms for achieving the functionality of each step.

The three major building blocks of a Machine Learning system are the model, the parameters, and the learner.

Model is the system which makes predictions
The parameters are the factors which are considered by the model to make predictions
The learner makes the adjustments in the parameters and the model to align the predictions with the actual results

To put the same thing in another way to simplify the explanation to suit the implementation …

An ML lifecycle can be broken up into two main, distinct parts. The first is the training phase, in which an ML model is created or “trained” by running a specified subset of data into the model. ML inference is the second phase, in which the model is put into action on live data to produce actionable output. The data processing by the ML model is often referred to as “scoring,” so one can say that the ML model scores the data, and the output is a score.

Training and Inference in AI:

Artificial intelligence is essentially the simulation of the human brain using artificial neural networks, which are meant to act as substitutes for the biological neural networks in our brains. A neural network is made up of a bunch of nodes which work together, and can be called upon to execute a model.

This is where AI chips come into play. They are particularly good at dealing with these artificial neural networks, and are designed to do two things with them: training and inference.

Chips designed for training essentially act as teachers for the network, like a kid in school. A raw neural network is initially under-developed and taught, or trained, by inputting masses of data. Training is very compute-intensive, so we need AI chips focused on training that are designed to be able to process this data quickly and efficiently. The more powerful the chip, the faster the network learns.

Once a network has been trained, it needs chips designed for inference in order to use the data in the real world, for things like facial recognition, gesture recognition, natural language processing, image searching, spam filtering etc. think of inference as the aspect of AI systems that you’re most likely to see in action, unless you work in AI development on the training side.

You can think of training as building a dictionary, while inference is akin to looking up words and understanding how to use them. Both go hand in hand.

It’s worth noting that chips designed for training can also inference, but inference chips cannot do training. Chips designed for training and inference are the two eyes to steer the future mankind.

AI system at a glance:

The above two are generic examples showing how the ML systems looks like and function.

The data sources are typically a system that captures the live data from the mechanism that generates the data. For example, a data source might be a various server cluster (ex apachi kafka) that stores data created by an Internet Of Things (IoT) device, a web application log file, or a point-of-sale (POS) machine. Or a data source might simply be a web application that collects user clicks and sends data to the system that hosts the ML model.

The host system for the ML model accepts data from the data sources and inputs the data into the ML model. It is the host system that provides the infrastructure to turn the code in the ML model into a fully operational application. After an output is generated from the ML model, the host system then sends that output to the data destinations. The host system can be, for example, a web application that accepts data input via a REST interface, or a stream processing application that takes an incoming feed of data from Server cluster (ex Apache Kafka) to process many data points per second.

The data destinations are where the host system should deliver the output score from the ML model for the final inference. A destination can be any type of data repository like Apache Kafka or a database, and from there, downstream applications take further action on the scores. For example, if the ML model calculates a fraud score on purchase data, then the applications associated with the data destinations might send an “approve” or “decline” message back to the purchase site.

AI – Enabled by Semiconductor chips ( Training & Inference chips, 2 Eyes of AI systems )

Looking at various ML systems and the requirements to build smart, efficient systems with present day technology, semiconductor industry plays a significant role in enabling through well-defined chips. As we now understand the mammoth computational requirement along with real-time processing, the architecture of ML chips evolved as couplets as training and Inference chips.

As the industry and technology is evolving the semiconductor chip echo system is also evolving to enable the ML systems. This is very well seen from the usage of available chips how they have been made to use for ML applications and also how start-ups are emerging with custom define ASICs for efficient ML system for target applications.

Let us look at how the GPU (Graphics Process Unit) and generic CPU (Central Processing Unit) fares in ML application usage and also how ASICs and FPGA’s are making way into usage.

GPUs and CPU’s:

GPU chips were originally developed for rendering 3D graphics onscreen. Nevertheless, GPUs have proved optimal for specialized computational tasks due to their ability to perform parallel computation in a way that CPUs may not.

CPUs perform serial tasks very fast but with very little parallelism. A mid-range CPU may have a handful of cores and a mid-range GPU will have several thousand. GPU cores are much slower/less powerful but run in parallel. The parallelism of GPUs are optimal for neural networks because of the kind of math that is performed: Sparse matrix multiplication.

GPUs were popularized in the ML community after discoveries in 2009 and 2012 during which researchers co-opted NVIDIA GPUs and an NVIDIA library called CUDA to train an image recognition model orders of magnitude faster than was previously possible.

For performance reasons, CPUs are not optimal for training models. That said, CPUs are often used to perform inference as GPUs are over-tuned for the task.

Custom ASIC/FPGAs :

While typically GPUs are better than CPUs when it comes to AI processing, they’re not perfect. The industry needs specialised processors to enable efficient processing of AI applications, modelling and inference. As a result, chip designers are now working to create processing units optimized for executing these algorithms. These come under many names, such as NPU, TPU, DPU, SPU etc., but a catchall term can be the AI processing unit (AI PU).

The AI PU was created to execute machine learning algorithms, typically by operating on predictive models such as artificial neural networks. They are usually classified as either training or inference as these processes are generally performed independently.

As a result, several purpose-built AI chips are currently under development by tech giants and start-ups alike:

· FPGAs (field-programmable gate array) are purpose-built but generic enough to accommodate multiple types of tasks, from encryption to encoding. Example: Microsoft Brainwave

· ASICs (application-specific integrated circuit) are typically designed for a single, specific task. Example: Google TPU.

Other examples include: Intel Nervana, Cerebras, Graphcore, SambaNova, Wave Computing, Groq, etc.

Examples of AI systems architecture:

Cloud + Training

The purpose of this pairing is to develop AI models used for inference. These models are eventually refined into AI applications that are specific towards a use case. These chips are powerful and expensive to run, and are designed to train as quickly as possible.

Example systems include NVIDIA’s DGX-2 system, which totals 2 petaFLOPS of processing power. It is made up of 16 NVIDIA V100 Tensor Core GPUs. Another example is Intel Habana’s Gaudi chip.

Examples of applications that people interact with every day that require a lot of training include Facebook photos or Google translate.

As the complexity of these models increases every few months, the market for cloud and training will continue to be needed and relevant

Cloud + Inference

The purpose of this pairing is for times when inference needs significant processing power, to the point where it would not be possible to do this inference on-device. This is because the application utilizes bigger models and processes a significant amount of data.

Sample chips here include Qualcomm’s Cloud AI 100, which are large chips used for AI in massive cloud datacentres. Another example is Alibaba’s Huanguang 800, or Graphcore’s Colossus MK2 GC200 IPU.

Where training chips were used to train Facebook’s photos or Google Translate, cloud inference chips are used to process the data you input using the models these companies created. Other examples include AI chatbots or most AI-powered services run by large technology companies.

Edge + Inference

Using on-device edge chips for inference removes any issues with network instability or latency, and is better for preserving privacy of data used, as well as security. There are no associated costs for using the bandwidth required to upload a lot of data, particularly visual data like images or video, so as long as cost and power-efficiency are balanced it can be cheaper and more efficient than cloud inference.

Examples here include KL520 and KL720 chip from Kneron’s(specific to AI) , which are lower-power, cost-efficient chips designed for on-device use. Other examples include Intel Movidius and Google’s Coral TPU.

Use cases include facial recognition surveillance cameras, cameras used in vehicles for pedestrian and hazard detection or drive awareness detection, and natural language processing for voice assistants.

Summary:

All of these different types of chips and their different implementations, models, and use cases are essential for the development of the Artificial Intelligence of Things (AIoT) future. When supported by other nascent technologies like 5G, the possibilities only grow. AI is fast becoming a big part of our lives, both at home and at work, and development in the AI chip space will be rapid in order to accommodate our increasing reliance on the technology

Challenges in AI implementation:

Even though artificial intelligence is developing and gaining more popularity in both business and society, the subject still faces significant hurdles. There are many challenges that must be overcome before AI (implementation) is able to achieve maximum potential. To list few of the challenges like Compute performance, Data privacy & security, speed of communication and finally the Bias, the acceptance of AI results. Each of the challenges are being accepted as opportunities and tremendous amount of research and development work is happening in various MNC’s and starts-ups. AI will ultimately prove to be cheaper, more efficient, and potentially more impartial in its actions than human beings.

To conclude …

Man has long feared the rise of the machine – his own creation becoming smarter and more intelligent than he. But while artificial intelligence and machine learning are rapidly changing our world and powering the Fourth Industrial Revolution, humanity does not need to be afraid !!

Resources & References

Blogs and research papers on machine learning
Information from Industry web-portal information
Pictures - Sources from Internet

Labels: AI, CPU, GPUs, Inference, Machine Learning, Training

Ravindra Bidnur

Thursday, November 18, 2021

AI in the Eyes of Semiconductor

0 Comments:

About Me

Previous Posts