[ad_1]
Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, streamline business processes, and better understand their customers. AWS helps customers accelerate AI/ML adoption by providing powerful on-demand computing, high-speed networking, and scalable high-performance storage options for any machine learning project. This lowers the barrier to entry for organizations looking to adopt the cloud to scale their ML applications.
Developers and data scientists are pushing the boundaries of technology and increasingly embracing deep learning, a type of machine learning based on neural network algorithms. These deep learning models are larger and more complex, resulting in increased costs to run the underlying infrastructure to train and deploy these models.
AWS is building high-performance, low-cost machine learning chips to enable customers to accelerate their AI/ML transformation. AWS Inferentia is the first machine learning chip built from the ground up by AWS for the lowest cost machine learning inference in the cloud. In fact, Amazon EC2 Inf1 instances powered by Inferentia offer 2.3x higher performance and up to 70% lower cost for machine learning inference than current generation GPU-based EC2 instances. AWS Trainium is AWS’ second machine learning chip specifically designed to train deep learning models and will be available in late 2021.
Customers across industries have deployed ML applications in production at Inferentia, achieving significant performance improvements and cost savings. For example, AirBnB’s customer support platform provides smart, scalable and exceptional service experiences to millions of host and guest communities worldwide. It used Inferentia-based EC2 Inf1 samples to deploy natural language processing (NLP) models that support chatbots. This resulted in a 2X increase in performance out of the box over GPU-based samples.
With these innovations in silicon, AWS enables customers to easily train and run deep learning models in production at significantly lower costs with higher performance and throughput.
Machine learning accelerates the transition to cloud-based infrastructure
Machine learning is an iterative process that requires teams to quickly build, train, and deploy applications, as well as frequently train, retrain, and experiment with models to improve predictive accuracy. When deploying trained models to business applications, organizations also need to scale their applications to serve new users worldwide. To provide a superior user experience, they must be able to serve multiple simultaneous requests with near real-time latency.
Emerging use cases such as object detection, natural language processing (NLP), image classification, speech-based artificial intelligence, and time series data are based on deep learning technology. Deep learning models are growing exponentially in size and complexity, going from having millions of parameters to billions in a few years.
Training and deployment of these complex and complex models means significant infrastructure costs. Costs can skyrocket as organizations scale their applications to deliver near real-time experiences to their users and customers.
This is where cloud-based machine learning infrastructure services can help. The cloud provides on-demand access to computing, high-performance networking, and big data storage that are seamlessly combined with ML operations and higher-level AI services to enable organizations to get started right away and scale their AI/ML initiatives.
How is AWS helping customers accelerate AI/ML conversions?
AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers regardless of experience and organization size. Inferentia’s design is optimized for high performance, throughput and low latency, making it ideal for deploying ML inference at scale.
Each AWS Inferentia chip contains four NeuronCores that implement a high-performance systolic array matrix multiplication engine that greatly accelerates typical deep learning operations such as convolution and transformers. NeuronCores is also equipped with a large on-chip cache that helps reduce external memory accesses, reduce latency and increase throughput.
AWS Neuron, the software development kit for Inferentia, natively supports leading ML frameworks such as TensorFlow and PyTorch. Developers can continue to use the same frameworks and lifecycle development tools they know and love. For most of their trained models, they can compile and deploy them in Inferentia by simply changing a single line of code with no additional application code changes.
The result is a high-performance inference deployment that scales easily while keeping costs under control.
Sprinklr, a software-as-a-service company, has a unified AI-driven customer experience management platform that enables companies to collect real-time customer feedback across multiple channels and turn it into actionable insights. This results in proactive issue resolution, enhanced product development, improved content marketing and better customer service. Sprinklr used Inferentia to deploy its NLP and some computer vision models and saw significant performance improvements.
Several Amazon services also deploy their machine learning models on Inferentia.
Amazon Prime Video uses computer vision ML models to analyze the video quality of live events to provide the best viewer experience for Prime Video members. Embedded image classification ML models on EC2 Inf1 instances, resulting in a 4X increase in performance and up to 40% cost savings compared to GPU-based instances.
Another example is Amazon Alexa’s AI and ML-based intelligence powered by Amazon Web Services, which is available on over 100 million devices today. Alexa’s promise to customers is that it’s always getting smarter, more talkative, more proactive and even more enjoyable. Delivering on this promise requires continuous improvements in response times and machine learning infrastructure costs. By deploying Alexa’s text-to-speech ML models to Inf1 instances, he was able to reduce inference latency by 25% and cost-per-inference by 30% to improve the service experience for tens of millions of customers using Alexa each month.
Unlocking new machine learning capabilities in the cloud
As companies race to future-proof their business by providing the best digital products and services, no organization can fail to use sophisticated machine learning models to help innovate customer experiences. In the last few years, there has been a huge increase in the applicability of machine learning for use cases ranging from personalization and loss prediction to fraud detection and supply chain prediction.
Fortunately, machine learning infrastructure in the cloud is releasing new capabilities that were not possible before, making it much more accessible to non-expert practitioners. That’s why AWS customers already use Inferentia-powered Amazon EC2 Inf1 instances to provide the intelligence behind recommendation engines and chatbots and to gain actionable insights from customer feedback.
With AWS cloud-based machine learning infrastructure options for a variety of skill levels, it’s clear that any organization can accelerate innovation and adopt the entire machine learning lifecycle at scale. As machine learning continues to become more pervasive, organizations can now fundamentally transform the customer experience and the way they do business with affordable, high-performance cloud-based machine learning infrastructure.
Learn more about how AWS’ machine learning platform can help your company innovate Here.
This content was produced by AWS. It was not written by the editorial staff of MIT Technology Review.
[ad_2]
Source link