What are the new features of Apache Spark 2.4, which will be released in 2018?

What are the new features of Apache Spark 2.4, which will be released in 2018?

This article is from the Apache Spark Meetup held at Adobe Systems Inc on September 19, 2018.

The upcoming Apache Spark 2.4 release is the fifth in the 2.x series. This article provides an overview of the key features and enhancements in Apache Spark 2.4.

  • The new scheduling model (Barrier Scheduling) enables users to properly embed distributed deep learning training into Spark stages to simplify the distributed training workflow.
  • Added 35 higher-order functions for array/map operations in Spark SQL.
  • Added a new native AVRO data source based on Databricks' spark-avro module.
  • PySpark also introduces eager evaluation mode for all operations for teaching and debuggability.
  • Spark on K8S supports PySpark and R, and supports client-mode.
  • Various enhancements to Structured Streaming. For example, stateful operators in continuous processing.
  • Various performance improvements to built-in data sources. For example, Parquet nested schema pruning.
  • Support for Scala 2.12.

Click on Shishuo.com to download this PPT.

Summarize

The above is what I introduced to you about the new features of Apache Spark 2.4, which will be launched in 2018. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

You may also be interested in:
  • How to use Spark and Scala to analyze Apache access logs
  • Apache Spark 2.0 jobs take a long time to finish when they are finished

<<:  Notes on using $refs in Vue instances

>>:  How to change mysql password under Centos

Recommend

How to find the specified content of a large file in Linux

Think big and small, then redirect. Sometimes Lin...

MySQL 8.0.19 installation and configuration method graphic tutorial

This article records the installation and configu...

A very detailed explanation of the Linux DHCP service

Table of contents 1. DHCP Service (Dynamic Host C...

MySQL merges multiple rows of data based on the group_concat() function

A very useful function group_concat(), the manual...

How to deploy SpringBoot project using Docker

The development of Docker technology provides a m...

How to install Oracle_11g using Docker

Install Oracle_11g with Docker 1. Pull the oracle...

Pure CSS custom multi-line ellipsis problem (from principle to implementation)

How to display text overflow? What are your needs...

Common naming rules for CSS classes and ids

Public name of the page: #wrapper - - The outer e...

MySQL database introduction: detailed explanation of database backup operation

Table of contents 1. Single database backup 2. Co...

Working principle and implementation method of Vue instruction

Introduction to Vue The current era of big front-...

Introduction to version management tool Rational ClearCase

Rational ClearCase is a software configuration ma...

Vue shuttle box realizes up and down movement

This article example shares the specific code for...

Vue implements partial refresh of the page (router-view page refresh)

Using provide+inject combination in Vue First you...

Detailed explanation of the principle of Docker image layering

Base image The base image has two meanings: Does ...

Example code for implementing an Upload component using Vue3

Table of contents General upload component develo...