The Spark Thrift Dashboard provides an overview of Spark Thrift application service that enables JDBC and ODBC clients to execute Spark SQL queries.
To view the Spark Dashboard, click Spark Thrift > Dashboard. The dashboard consists of summary panels, a Sankey Diagram with various metrics, and charts that display information about jobs based on other criteria such as memory and core utilization.
The default time range is Last 24 hrs. To view statistics from a custom date range, click the icon and select a time frame and timezone of your choice.
The Spark SQL panel on the left displays a list of thrift servers with their state, either connected or disconnected.
Key Performance Indicators (KPI)
KPIs are user-driven analytics criteria to analyze the performance of jobs.
To create a KPI for a module, do the following.
Click Create KPI .
The Create window is displayed.
Do the following in the available indices.
- In Metric, click the drop-down and choose one metric from the available list.
- (Optional) For time-related or memory-related metrics, you can choose the unit of the resultant KPI value. To do so, click the drop-down that appears in the right and select a unit.
- In Operator, click the drop-down and choose the operator to measure the metric you selected.
- In Metric * 2.5, click the text box for available metrics or type a value in that text box.
The default text Metric * 2.5 indicates that you can perform an operation after choosing a metric to quantify the performance measure.
In Name, type a name for the KPI you are creating.
(Optional) Click the + sign to add another metric to the same KPI.
The KPI is created.
The dashboard displays the summary of jobs grouped by the following criteria.
- Number of users
- Number of Applications
- Average CPU Allocated
- Average Memory Allocated (MB)
Context Metric Distributions
The Context Metric Distributions panel displays the summary of jobs as a Sankey diagram. By default, the chart displays the distribution by Duration. You can choose to display the distribution by Input Data, Output Data, Shuffle Reads, or Shuffle Writes.
Core Usage by Locality
The Core Usage by Locality chart displays the core usage by the following locality types. The chart also displays Core Used and Core Wasted values (in%).
- Process Local: The tasks in this locality are run within the same process as the source data.
- Node Local: The tasks in this locality are run on the same machine as the source data.
- Rack Local: The tasks in this locality are run in the same rack as the source data.
- Any: The tasks in this locality are run anywhere else but not on the same node or rack.
- No pref: The tasks in this locality have no locality preference.
- Idle: The tasks in this locality that are idle.
Zooming-in Core Usage
You can take a closer look at the core usage by zooming in to any timeline on the graph.
To zoom in, drag and drop the mouse pointer on the section or timeline you want to zoom in. The second graph shows a closer view of the section or timeline you selected.
The following charts are also displayed on the Business Intelligence Dashboard.
|VCore Usage||The number of physical virtual cores used by a queue in the cluster.|
|Memory Usage||The amount of memory used by a queue in the cluster in a particular timeframe.|
|Query Duration Distribution||The number of queries grouped by duration.|
|Query Execution Count||The number of queries executed within a timeframe.|
|Average Query Time||The average time taken to execute queries. This metric also displays the Total Execution Time|
|Top 20 Users (By Query)||The top 20 users that executed the highest number of queries.|
|Top 20 Tables (By Query)||The top 20 tables that executed the highest number of queries.|
|Storage Memory||The amount of storage memory used by the Spark Thrift application, including Used Memory and Total Memory.|