There are mainly 3 components in spark UI Jobs A spark application can have multiple jobs based on the number of actions (#jobs =#actions) in the...
RDD(resilient distributed datasets) are the basic unit of storage in spark. you can think of an rdd as a collection distributed over multiple...
When we talk about spark on top of Hadoop its generally Hadoop core with Spark compute engine instead of MapReduce, i.e (HDFS, Spark, YARN) Spark...
Sometimes in a spark application, we need to share small data across all the machines for processing. For example, if you want to filter some set of...
In simple terms, Apache spark is an in-memory unified parallel compute engine. In Memory,Most of the operations in apache spark happen in memory and...
We cannot use an analytical storage system for transactional requirements and vice versa. But have you ever wondered why is that so? Transactional vs...