All About Spark AI Summit 2020

All about Spark AI Summit 2020

255 views

Embed
Email

From

Username or Email (please add comma after each username or email)

Name	Email

Back

Menu 3

Eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.

Ankitsharmappt

Uploaded on Jul 7, 2020

Category Technology

PPT on All about Spark AI Summit 2020

Category Technology

Comments

                     All about Spark AI Summit 2020
                     All About 
Spark AI 
Summit 
2020
Introduction
The Spark + AI Summit is the world's biggest 
conference around technology & machine 
learning. Collaborating at the data and ML 
intersection is a unique experience for developers, 
data engineers, data scientists, and decision-
makers. 
The participants will hear about the new 
developments in Apache Spark and ML 
technologies such as TensorFlow, MLflow, PyTorch 
as well as best practices in real-world business AI.
Spark 3.0 
Optimizations for 
In the opeSninpg akeyrnkote , SCQTO LMatei Zaharia 
addressed that 90 percent of Spark API calls run 
via the Spark SQL engine, resulting in 46 percent 
of the Spark patches being spent by the Apache 
Spark group in enhancing Spark SQL. 
Spark 3.0 is around 2x faster than Spark 2.4 
(using TPC-DS), allowed by adaptive database 
execution, dynamic partition pruning, and other 
enhancements.
Spark SQL Adaptive 
Query Execution
Spark 2.2 introduced cost-dependent optimisation 
to the new SQL Optimizer based on the rule. Spark 
3.0 now has Adaptive Database Execution(AQE) 
runtime. 
Runtime statistics retrieved from completed 
phases of the database process are used with AQE 
to re-optimize the execution schedule for the 
remaining database phases. When using AQE, Source: itnext
Databricks tests provided speed-ups ranging from 
1.1x to 8x.
Spark SQL Dynamic 
Partition Pruning
Pushdown and partition pruning of Spark 2.x static 
predicate is a performance enhancement that 
restricts the number of files and partitions that 
Spark reads when querying. 
After partitioning the records, queries meeting 
certain parameters for partition filters boost 
efficiency by allowing Spark to read only a subset 
of the directories and files. Source: Databricks
Spark 3.0 GPU 
Acceleration
Robert Evans and Jason Lowe gave an overview of 
accelerator-conscious scheduling and the RAPIDS 
Accelerator for Apache Spark in the Deep Dive in 
GPU Support in Apache Spark 3.x session, allowing 
GPU accelerated SQL / DataFrame operations and 
Spark shuffles without code shift.
Source: Medium.com
Accelerator-aware 
scheduling
This allows Spark to schedule executors with a 
specified number of GPUs, and users can specify 
the number of GPUs required for each task. Spark 
transmits these requests for resources to the 
underlying cluster manager, Kubernetes, YARN or 
Standalone. 
Users can also customize a discovery script which 
will detect which GPUs the cluster manager has 
allocated.
Source: pixabay
Accelerated 
SQL/DataFrame
Spark 3.0 supports SQL optimizer plugins that 
process data using batches in columns rather than 
rows. Columnar data is GPU-friendly and the 
RAPIDS Accelerator links this functionality to 
accelerate SQL and DataFrame operators
Source: pixabay
Accelerated Shuffle
Spark operations that sort, group or join data by 
value have to move data between partitions in a 
process called a shuffle that involves disk I / O, 
data serialization and network I / O, when creating 
a new DataFrame from an existing one between 
stages.
Source: spark.ai (2018)
Accelerated end to 
end ML and DL
It allows the TensorFlow and PyTorch models to be 
trained directly on Spark DataFrames, exploiting 
the ability of Horovod to scale in parallel to 
hundreds of GPUs, without any advanced 
programming for distributed processing. 
With the latest Apache Spark 3.0 accelerator-
conscious scheduling and columnar processing 
APIs, a development ETL job will hand out data 
within the same pipeline to Horovod running 
Source: humancentered.ai
distributed deep learning training on GPUs.

All about Spark AI Summit 2020

Menu 3

Ankitsharmappt

Comments

All about Spark AI Summit 2020

Recommended