Spark Streams

A streaming job is typically a long-running Spark job but time is not the only factor that is used to classify a job as streaming. You or your system administrator can specify if a job is streaming when you are creating an application.

The Spark Streams page provides information about Spark Streams categorized by name of the job, users running the job, queue in which the job is run, status of the job, and other criteria.

Go to Spark > Streams to view the Spark Streams screen.

The default time is Last 24 hrs. To change the timeline, click the down arrow in the time selection menu. The default grouping is Queue. To change the grouping, click the down arrow in the group selection menu.

Searching for Spark Streams

You can search for Spark Streams using the Search bar. The search criteria includes:

  • user
  • name of the job
  • the current status of the job
  • the queue in which the job is being run

To view the list of all streams, sorted by start time, select Un-group option in the grouping menu.

Details of a Spark Stream

The Streams panel displays detailed information about Spark Streams categorized by name of the stream, users running the stream, queue in which the stream is run, status of the stream, and other criteria. To view the Streams running in a particular queue, click the queue name to expand the list of Streams. The Streams are sorted by Start Time. Select a stream to view its details in the Spark Stream Details page.

The Spark Stream Details page contains the following charts.

Chart NameDetails
Input rateThe chart describes the rate at which the records were put into the system through the duration of the job.
Stream TimeThe chart describes the processing time for the task. The chart also shows the processing delay time - the difference between the scheduled time of the task and the actual time at which the processing started.
Schedule InformationThe number of tasks running at a particular time and the number of tasks that were yet to be executed.
I/OThe chart describes the number of input bytes read and the number of output bytes written during the duration of the task.
Storage MemoryThe chart describes the consumption of the following types of memory: Block Disk Space Used, Block Off Heap Memory User, Block On Heap Memory Used, Block Max Off Heap Memory, and Block Max On Heap Memory.
Memory UsageThe chart describes Total Executor Memory used and Total Heap Memory used by the job
Shuffle InformationThe chart describes the following shuffle information: Shuffle Bytes written, Shuffle local bytes read, Shuffle Remote bytes read, and Shuffle Remote Bytes Read to Disk.
HDFS InformationThe chart describes the number of HDFS bytes read and written.
Core Usage by LocalityThe chart describes the amount of core utilized by process local, by node local and the amount of core wasted. The chart also shows the percentage of core utilized and wasted.
BatchesThe Batches panel lists the batches for the application, sorted by Batch Time. Click a batch to view its details in the side panel.

Viewing Batch Details

The Batches panel lists the batches for the application, sorted by Batch Time. Click a batch to view the Batch details panel.