AI in the Eyes of Semiconductor
Abstract:
Semiconductor
chips enabling the Artificial Intelligence products and systems. The higher
technology nodes enabled higher integration capability and hence provides a
road-map of improved products and systems over a period of time. Over the next
couple of decades, the technological developments around storage and processing
power will enable some innovative products that we know and love today, such as
Netflix’s recommendation engine or self-driving cars. In general, the AI
systems comprises of two major functionalities namely machine learning and then
the inference. Both these functions need specific set of computation and
storage needs to enable the target application in an effective way. This blog
talks about machine learning and inference concepts, scope of semiconductor
chips in AI systems and types of different chips to enable based on the
applications.
What is Machine
Learning?
Machine
Learning is a major field in data science. In simple words Data science encompasses preparing
data for analysis, including cleansing, aggregating, and manipulating the data
to perform advanced data analysis. The amount of data to be analyzed is growing
leaps and bounds in current applications and hence necessitating the need of
machine interference in data analysis. Just to visualize the amount data, take
an example of how e-commerce platforms like Amazon, Flipcart or Netflix comes
up of interested or probable items to buy when user login. This is done through
tracking his items he searches/orders(learning) and then trying to infer what
items he may be interested.
Machine
learning can also be treated as a branch of artificial intelligence
(AI) and computer science which focuses on the use of data and algorithms
to imitate the way that humans learn, gradually improving its accuracy. Machine
learning is an important component of the growing field of data science. Through
the use of statistical methods, algorithms are trained to make classifications
or predictions, uncovering key insights within data mining projects. These
insights subsequently drive decision making within applications and businesses.
Let us
briefly look into how machine learning works or in other wards what are the
basic functionalities involved in the aspect of machine learning. One can
classify the into basic 7 steps in this machine learning
The three
major building blocks of a Machine Learning system are the model, the
parameters, and the learner.
- Model is the system which makes predictions
- The parameters are the factors which are considered by the model to make predictions
- The learner makes the adjustments in the parameters and the model to align the predictions with the actual results
To put the
same thing in another way to simplify the explanation to suit the
implementation …
An ML
lifecycle can be broken up into two main, distinct parts. The first is the training
phase, in which an ML model is created or “trained” by running a specified
subset of data into the model. ML inference is the second phase, in
which the model is put into action on live data to produce actionable output.
The data processing by the ML model is often referred to as “scoring,” so one
can say that the ML model scores the data, and the output is a score.
Training
and Inference in AI:
Artificial
intelligence is essentially the simulation of the human brain using artificial
neural networks, which are meant to act as substitutes for the biological
neural networks in our brains. A neural network is made up of a bunch of nodes
which work together, and can be called upon to execute a model.
This is
where AI chips come into play. They are particularly good at dealing with these
artificial neural networks, and are designed to do two things with them:
training and inference.
Chips
designed for training essentially act as teachers for the network, like a kid
in school. A raw neural network is initially under-developed and taught, or
trained, by inputting masses of data. Training is very compute-intensive, so we
need AI chips focused on training that are designed to be able to process this
data quickly and efficiently. The more powerful the chip, the faster the network
learns.
Once a
network has been trained, it needs chips designed for inference in order to use
the data in the real world, for things like facial recognition, gesture
recognition, natural language processing, image searching, spam filtering etc.
think of inference as the aspect of AI systems that you’re most likely to see
in action, unless you work in AI development on the training side.
You can
think of training as building a dictionary, while inference is akin to looking
up words and understanding how to use them. Both go hand in hand.
It’s worth noting that chips designed for
training can also inference, but inference chips cannot do training. Chips
designed for training and inference are the two eyes to steer the future mankind.
AI
system at a glance:
The above
two are generic examples showing how the ML systems looks like and function.
The data
sources are typically a system that captures the live data from the mechanism
that generates the data. For example, a data source might be a various server cluster
(ex apachi kafka) that stores data created by an Internet Of Things (IoT) device, a web application log file, or a
point-of-sale (POS) machine. Or a data source might simply be a web application
that collects user clicks and sends data to the system that hosts the ML model.
The host
system for the ML model accepts data from the data sources and inputs the data
into the ML model. It is the host system that provides the infrastructure to
turn the code in the ML model into a fully operational application. After an output
is generated from the ML model, the host system then sends that output to the
data destinations. The host system can be, for example, a web application that
accepts data input via a REST interface, or a stream processing application
that takes an incoming feed of data from Server cluster (ex Apache Kafka) to
process many data points per second.
The data
destinations are where the host system should deliver the output score from the
ML model for the final inference. A destination can be any type of data
repository like Apache Kafka or a database, and from there, downstream
applications take further action on the scores. For example, if the ML model
calculates a fraud score on purchase data, then the applications associated
with the data destinations might send an “approve” or “decline” message back to
the purchase site.
AI –
Enabled by Semiconductor chips ( Training
& Inference chips, 2 Eyes of AI systems )
Looking at various ML systems and the requirements to build smart, efficient systems with present day technology, semiconductor industry plays a significant role in enabling through well-defined chips. As we now understand the mammoth computational requirement along with real-time processing, the architecture of ML chips evolved as couplets as training and Inference chips.
As the
industry and technology is evolving the semiconductor chip echo system is also
evolving to enable the ML systems. This is very well seen from the usage of
available chips how they have been made to use for ML applications and also how
start-ups are emerging with custom define ASICs for efficient ML system for
target applications.
Let us look
at how the GPU (Graphics Process Unit) and generic CPU (Central Processing
Unit) fares in ML application usage and also how ASICs and FPGA’s are making
way into usage.
GPUs and
CPU’s:
GPU chips were originally developed for
rendering 3D graphics onscreen. Nevertheless, GPUs have proved optimal for
specialized computational tasks due to their ability to perform parallel
computation in a way that CPUs may not.
CPUs perform serial tasks very fast but with
very little parallelism. A mid-range CPU may have a handful of cores and a
mid-range GPU will have several thousand. GPU cores are much slower/less
powerful but run in parallel. The parallelism of GPUs are optimal for neural
networks because of the kind of math that is performed: Sparse matrix
multiplication.
GPUs were popularized in the ML
community after discoveries in 2009 and 2012 during which researchers co-opted
NVIDIA GPUs and an NVIDIA library called CUDA to train an image recognition
model orders of magnitude faster than was previously possible.
For performance reasons, CPUs are
not optimal for training models. That said, CPUs are often used to perform
inference as GPUs are over-tuned for the task.
Custom ASIC/FPGAs :
While typically GPUs are better
than CPUs when it comes to AI processing, they’re not perfect. The industry
needs specialised processors to enable efficient processing of AI applications,
modelling and inference. As a result, chip designers are now working to create
processing units optimized for executing these algorithms. These come under
many names, such as NPU, TPU, DPU, SPU etc., but a catchall term can be the AI
processing unit (AI PU).
The AI PU was created to execute machine
learning algorithms, typically by operating on predictive models such as
artificial neural networks. They are usually classified as either training or
inference as these processes are generally performed independently.
As a result, several purpose-built
AI chips are currently under development by tech giants and start-ups alike:
·
FPGAs (field-programmable gate array) are
purpose-built but generic enough to accommodate multiple types of tasks, from
encryption to encoding. Example: Microsoft Brainwave
·
ASICs (application-specific integrated circuit)
are typically designed for a single, specific task. Example: Google TPU.
Other examples include: Intel
Nervana, Cerebras, Graphcore, SambaNova, Wave Computing, Groq, etc.
Examples of AI systems architecture:
Cloud + Training
The purpose of this pairing is to
develop AI models used for inference. These models are eventually refined into
AI applications that are specific towards a use case. These chips are powerful
and expensive to run, and are designed to train as quickly as possible.
Example systems include NVIDIA’s
DGX-2 system, which totals 2 petaFLOPS of processing power. It is made up of 16
NVIDIA V100 Tensor Core GPUs. Another example is Intel Habana’s Gaudi chip.
Examples of applications that
people interact with every day that require a lot of training include Facebook
photos or Google translate.
As the complexity of these models
increases every few months, the market for cloud and training will continue to
be needed and relevant
The purpose of this pairing is for times when inference needs
significant processing power, to the point where it would not be possible to do
this inference on-device. This is because the application utilizes bigger
models and processes a significant amount of data.
Sample chips here include
Qualcomm’s Cloud AI 100, which are large chips used for AI in massive cloud
datacentres. Another example is Alibaba’s Huanguang 800, or Graphcore’s
Colossus MK2 GC200 IPU.
Where training chips were used to
train Facebook’s photos or Google Translate, cloud inference chips are used to
process the data you input using the models these companies created. Other
examples include AI chatbots or most AI-powered services run by large
technology companies.
Edge + Inference
Using on-device edge chips for inference removes any issues with network instability or latency, and is better for preserving privacy of data used, as well as security. There are no associated costs for using the bandwidth required to upload a lot of data, particularly visual data like images or video, so as long as cost and power-efficiency are balanced it can be cheaper and more efficient than cloud inference.
Examples here include KL520 and
KL720 chip from Kneron’s(specific to AI) , which are lower-power, cost-efficient
chips designed for on-device use. Other examples include Intel Movidius and
Google’s Coral TPU.
Use cases include facial
recognition surveillance cameras, cameras used in vehicles for pedestrian and
hazard detection or drive awareness detection, and natural language processing
for voice assistants.
Summary:
All of these different types of
chips and their different implementations, models, and use cases are essential
for the development of the Artificial Intelligence of Things (AIoT) future.
When supported by other nascent technologies like 5G, the possibilities only
grow. AI is fast becoming a big part of our lives, both at home and at work,
and development in the AI chip space will be rapid in order to accommodate our
increasing reliance on the technology
Challenges in AI implementation:
Even though artificial intelligence is developing
and gaining more popularity in both business and society, the subject still
faces significant hurdles. There are many challenges that must be overcome
before AI (implementation) is able to achieve maximum potential. To list few of
the challenges like Compute performance, Data privacy & security, speed of
communication and finally the Bias, the acceptance of AI results. Each of the
challenges are being accepted as opportunities and tremendous amount of
research and development work is happening in various MNC’s and starts-ups. AI
will ultimately prove to be cheaper, more efficient, and potentially more
impartial in its actions than human beings.
To conclude …
Man has long feared the rise of the machine – his own
creation becoming smarter and more intelligent than he. But while artificial
intelligence and machine learning are rapidly changing our world and powering
the Fourth Industrial Revolution, humanity does not need to be afraid !!
Resources & References
- Blogs and research papers on machine learning
- Information from Industry web-portal information
- Pictures - Sources from Internet
Labels: AI, CPU, GPUs, Inference, Machine Learning, Training
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home