Spark Dashboard

The Spark Dashboard provides an overview of the Spark jobs, job status, and other metrics such as memory and core utilization.

To view the Spark Dashboard, click Spark > Dashboard. The dashboard consists of summary panels, a Sankey Diagram with various metrics, and charts that display information about jobs based on other criteria such as memory and core utilization. The default time range is Last 24 hrs. To change the time range, click the down arrow in the time selection menu.

Key Performance Indicators (KPI)

KPIs are user-driven analytics criteria to analyze the performance of jobs.

Creating KPIs

To create a KPI for a module, do the following.

  1. Click the icon next to KPIs.
    A Create window appears.

  2. Do the following in the available indices.

    1. In Metric, click the drop-down and choose one metric from the available list.
    2. (Optional) For time-related or memory-related metrics, you can choose the unit of the resultant KPI value. To do so, click the drop-down that appears in the right and select a unit.
    3. In Operator, click the drop-down and choose the operator to measure the metric you selected.
    4. In Metric * 2.5, click the text box for available metrics or type a value in that text box.

      Note: The default text Metric * 2.5 indicates that you can perform an operation after choosing a metric to quantify the performance measure.

  3. In Name, type a name of your choice for the KPI you are creating.

  4. (Optional) Click the + sign in the extreme right to add another metric for the same KPI.

  5. Click Save.

The KPI is created.

Summary Panels

The second panel of the dashboard displays the summary of jobs grouped by the following criteria.

  • Number of users
  • Number of jobs
  • Average CPU time
  • Average Memory Allocated

The third panel of the dashboard displays the following information about the jobs for the selected timeline:

  • Number of jobs that succeeded.
  • Number of jobs that are running.
  • Number of jobs that failed.
  • Number of jobs that are in the process of finishing.
  • Number of jobs that were killed.
  • Number of application-level exceptions.
  • Number of YARN exceptions.

Click the number to view more details of the jobs per category. The Spark Jobs page is displayed with the applicable filter.

Metric Distributions

The Metric distributions panel displays the summary of jobs as a Sankey diagram. By default, the chart displays the distribution by Duration. You can choose to display the distribution by VCore, VCore Time, Memory, Memory Time, Used Containers, and GC Time.

Sample of a Sankey Diagram and how to read it

Sample Sankey Diagram

You can gather the following information from this diagram:

  • The distribution is displayed by GC Time.
  • All jobs are running on the queue named spark.
  • The GCTime for 151 jobs (94.38% of the jobs) is between 37 ms to 50.26 seconds.
  • 94.38% of the jobs are being run by 9 users.
  • A very small percent of the jobs is time-consuming. (0.63% of the jobs took about 6.37 mins of GC time.) For more information on Sankey diagrams, read https://en.wikipedia.org/wiki/Sankey_diagram.

Other Spark Charts

The following charts are also displayed on the Spark Dashboard.

note

By default, you can see the information from the last 24 hours.

Chart NameDescription
VCore UsageThe number of physical virtual cores used by a queue in the cluster.
Memory UsageThe amount of memory used by a queue in the cluster in a particular timeframe.
Error CategoriesThe number of YARN exceptions and App exceptions within a timeframe
Average Job TimeThe average and total time taken to execute a job within a timeframe. The time can range from seconds to days.
Top 20 Users (By Query)The top 20 users that ran the highest number of queries within the selected timeframe. By default, you can see the top 20 users from the last 24 hours.
Job Execution CountThe number of Spark jobs executed within a timeframe.