The Spark Dashboard provides an overview of the Spark jobs, job status, and other metrics such as memory and core utilization.
To view the Spark Dashboard, click Spark > Dashboard. The dashboard consists of summary panels, a Sankey Diagram with various metrics, and charts that display information about jobs based on other criteria such as memory and core utilization. The default time range is Last 24 hrs. To change the time range, click the down arrow in the time selection menu.
Key Performance Indicators (KPI)
KPIs are user-driven analytics criteria to analyze the performance of jobs.
To create a KPI for a module, do the following.
Click the icon next to KPIs.
A Create window appears.
Do the following in the available indices.
- In Metric, click the drop-down and choose one metric from the available list.
- (Optional) For time-related or memory-related metrics, you can choose the unit of the resultant KPI value. To do so, click the drop-down that appears in the right and select a unit.
- In Operator, click the drop-down and choose the operator to measure the metric you selected.
- In Metric * 2.5, click the text box for available metrics or type a value in that text box.
Note: The default text Metric * 2.5 indicates that you can perform an operation after choosing a metric to quantify the performance measure.
In Name, type a name of your choice for the KPI you are creating.
(Optional) Click the + sign in the extreme right to add another metric for the same KPI.
The KPI is created.
The second panel of the dashboard displays the summary of jobs grouped by the following criteria.
- Number of users
- Number of jobs
- Average CPU time
- Average Memory Allocated
The third panel of the dashboard displays the following information about the jobs for the selected timeline:
- Number of jobs that succeeded.
- Number of jobs that are running.
- Number of jobs that failed.
- Number of jobs that are in the process of finishing.
- Number of jobs that were killed.
- Number of application-level exceptions.
- Number of YARN exceptions.
Click the number to view more details of the jobs per category. The Spark Jobs page is displayed with the applicable filter.
The Metric distributions panel displays the summary of jobs as a Sankey diagram. By default, the chart displays the distribution by Duration. You can choose to display the distribution by VCore, VCore Time, Memory, Memory Time, Used Containers, and GC Time.
Sample of a Sankey Diagram and how to read it
You can gather the following information from this diagram:
- The distribution is displayed by GC Time.
- All jobs are running on the queue named spark.
- The GCTime for 151 jobs (94.38% of the jobs) is between 37 ms to 50.26 seconds.
- 94.38% of the jobs are being run by 9 users.
- A very small percent of the jobs is time-consuming. (0.63% of the jobs took about 6.37 mins of GC time.) For more information on Sankey diagrams, read https://en.wikipedia.org/wiki/Sankey_diagram.
Other Spark Charts
The following charts are also displayed on the Spark Dashboard.
By default, you can see the information from the last 24 hours.
|VCore Usage||The number of physical virtual cores used by a queue in the cluster.|
|Memory Usage||The amount of memory used by a queue in the cluster in a particular timeframe.|
|Error Categories||The number of YARN exceptions and App exceptions within a timeframe|
|Average Job Time||The average and total time taken to execute a job within a timeframe. The time can range from seconds to days.|
|Top 20 Users (By Query)||The top 20 users that ran the highest number of queries within the selected timeframe. By default, you can see the top 20 users from the last 24 hours.|
|Job Execution Count||The number of Spark jobs executed within a timeframe.|