Yash Srivastava's Blog

Yash Srivastava's Blog

Follow

Follow

Tag

dataengineering

#dataengineering

Read more stories on Hashnode

Articles with this tag

Spark Stages, Tasks, and Jobs

Jan 24, 20232 min read

There are mainly 3 components in spark UI Jobs A spark application can have multiple jobs based on the number of actions (#jobs =#actions) in the...

Spark Stages, Tasks, and Jobs

Basic Spark RDD transformations

Jan 18, 20234 min read

RDD(resilient distributed datasets) are the basic unit of storage in spark. you can think of an rdd as a collection distributed over multiple...

Basic Spark RDD transformations

What is Apache Spark?

Jan 4, 20232 min read

In simple terms, Apache spark is an in-memory unified parallel compute engine. In Memory,Most of the operations in apache spark happen in memory and...

What is Apache Spark?

Introduction to Hive

Dec 30, 20224 min read

We cannot use an analytical storage system for transactional requirements and vice versa. But have you ever wondered why is that so? Transactional vs...

Introduction to Hive

Introduction to SQOOP in Hadoop

Dec 27, 20224 min read

Data ingestion is one of the crucial steps in the data lifecycle and when the source is a relational database, Sqoop can be a very easy and simple...

Introduction to SQOOP in Hadoop

MapReduce in Hadoop

Dec 25, 20224 min read

Although MapReduce is not much used in solving Big Data problems nowadays because of its poor performance compared to spark. But it's still a very...

MapReduce in Hadoop