Yash Srivastava
Yash Srivastava's Blog

Yash Srivastava's Blog

Follow
homebadgesnewsletter

Spark Stages, Tasks, and Jobs

Jan 24, 20232 min read

There are mainly 3 components in spark UI Jobs A spark application can have multiple jobs based on the number of actions (#jobs =#actions) in the...

Spark Stages, Tasks, and Jobs

Basic Spark RDD transformations

Jan 18, 20234 min read

RDD(resilient distributed datasets) are the basic unit of storage in spark. you can think of an rdd as a collection distributed over multiple...

Basic Spark RDD transformations

Spark on YARN architecture

Jan 9, 20232 min read

When we talk about spark on top of Hadoop its generally Hadoop core with Spark compute engine instead of MapReduce, i.e (HDFS, Spark, YARN) Spark...

Spark on YARN architecture

Shared variables in spark

Jan 9, 20232 min read

Sometimes in a spark application, we need to share small data across all the machines for processing. For example, if you want to filter some set of...

Shared variables in spark

What is Apache Spark?

Jan 4, 20232 min read

In simple terms, Apache spark is an in-memory unified parallel compute engine. In Memory,Most of the operations in apache spark happen in memory and...

What is Apache Spark?

Introduction to Hive

Dec 30, 20224 min read

We cannot use an analytical storage system for transactional requirements and vice versa. But have you ever wondered why is that so? Transactional vs...

Introduction to Hive