Spark Job Details

The Spark Job Details page contains the following panels:

  • Job Trends
  • Configurations
  • Spark Stages
  • Timeseries Information
  • Reports
  • Application Logs

The top panel displays the following information.

NameDescription
UserThe name of the user that ran the job.
Final StatusStatus of the job. The state can be one of the following:
Succeeded, Failed, Finished, Finishing, Killed, Running, Scheduled
Start TimeThe time at which the application started running.
DurationThe time taken to run the jobs in the application.
# of JobsThe number of jobs in the application.
# of StagesThe number of stages of the Spark job.
# of TasksThe total number of tasks the stages are broken into.
Avg MemoryThe average memory used by the application of the selected user.
Avg VCoreThe average VCore used by the application of the selected user.
Scheduling DelayThe time taken to start a task.

Job Trends

The Job Trends panel displays a chart showing the pattern of jobs running at a particular time, based on the following factors as shown in the screenshot.

Note: The x-axis denotes the time at which the User executed a job.

NameDescription
Elapsed TimeThe time taken to run the jobs at a particular time.
VCoresThe number of VCores used to run the job.
MemoryThe amount of memory used to run the job.
Input ReadThe size of the input dataset.
Output WrittenThe size of the output written to a file format.

Switching Job Trends View

You can switch between bar chart view and line chart view. Click the view in which you want to display the job trend. Choose the view from the Switch view icon in the top left corner of the Job Trends tile.

Configurations

With Configurations, you can view the Job Configurations and Anomalous hosts.

Job Configurations

Job Configurations displays Current Value and Recommended Value for the following parameters.

  • #Cores: Number of cores in the current job.
  • #Executors: Number of executors in the current job.
  • Executor Memory: Amount of memory used by a job executor.
  • Driver #Cores: Number of driver cores.
  • Driver Memory: Amount of memory used by the driver.

Anomalies

Anomalies board displays system metrics for the host which is used by the Spark job within the duration of that job. The host can be impacted by the usage of CPU, Memory, Network, or Disk.

With Anomalous data, you can monitor the host performance and make predictions on memory, CPU, Network I/O, and disk usage.

To view more details about Anomalous hosts, click the host link in the Anomalies tab.

Anomalies

You can detect anomalies based on the following metrics.

Note: If an anomaly exists, the associated chart is highlighted with the number of anomalies detected.

MetricDescription
CPU UsageThe processor capacity usage of the job on the host.
Memory UsageThe RAM usage of the job on the host.
Network I/OThe network status of the job on the host displaying Sent Bytes and Received Bytes.
Disk UsageThe host storage currently in use by the Spark job. The data is displayed in Write Bytes and Read Bytes.

Spark Stages

Stages are units in which a job is divided into small tasks. You can view Spark Stages in the form of a List or a Timeline. Click More Details to view the details of a particular stage.

List

In a List view, you can see the following fields in Spark Stages.

FieldDescription
Stage IdThe ID of the stage.
Task CountThe number of tasks in the stage.
TimelineThe graphical representation of the duration of the tasks.
DurationThe time taken to complete tasks in that stage.
Max Task MemoryThe maximum memory occupied by tasks.
IO PercentageThe rate of input/output operations (in %).
Shuffle WriteAmount of shuffling data written.
Shuffle ReadAmount of shuffling data read.
PRatioRatio of parallelism in the stage. A higher PRatio is better.
Task SkewThe value of task skewness which is less than -1 or greater than +1. (refer the dashboard)
Failure RateThe rate at which the tasks in the stage fail.
StatusThe status of the stage.

Timeline

The timeframe in which tasks in the stage executed. The timeline also includes the driver execution time. You can sort the timeline of these tasks by Duration and Start Time other than the default view.

Timeseries Information

Timeseries information displays timeseries metrics of the application you are currently viewing. Within the time duration, you can see the time spent by the drivers, denoted by a red box. The drivers help in running Spark applications as sets of processes on a cluster.

Note: You can see the name of the application you are currently viewing, above the user name in the top panel.

Other Timeseries Charts

Chart NameDescription
Schedule InformationThe number of tasks running at a particular time and the number of tasks that were yet to be executed.
IOThe chart describes the number of input bytes read and the number of output bytes written during the duration of the task.
Driver Memory UsageThe amount of memory consumed by the driver.
Executor Memory UsageThe amount of memory used by the executor.
GC and CPU DistributionThe amount of garbage collection (in %) and amount of CPU used (in %) to execute jobs.
Shuffle informationThe chart describes the following shuffle information: Shuffle Bytes written, Shuffle local bytes read, Shuffle Remote bytes read, and Shuffle Remote Bytes Read to Disk.
Storage MemoryThe chart describes the amount of the following types of memory: Block Disk Space Used, Block Off Heap Memory User, Block On Heap Memory Used, Block Max Off Heap Memory, and Block Max On Heap Memory.
HDFS InformationThe chart describes number of HDFS read and written.

Reports

TypeDescription
Efficiency StatisticsDriver versus executor time spent determines how well the Spark program has been written and if the right amount of parallelism is achieved.
SimulationThis determines what should be the ideal number of executors on the Spark program and what would be the effect of such changes to the number of executors on the overall time and utilization.
YARN DiagnosticsThis shows details of the YARN application that was running in that duration executed by the user.
Aggregate MetricsThe aggregated usage of different metrics in that application.

Application Logs

The Application Logs section displays the application logs for Spark jobs that failed that lets you identify exact reason of failure.