Tez Query Details

The Tez Query Details page contains the following panels:

  • Summary
  • Query Trends
  • Recommendations
  • Query
  • YARN Diagnostics
  • MapReduce Stats
  • Query Execution Metrics
  • Query Plan and DAG

The summary panel displays the following information.

Field NameDescription
UserThe name of the user that executed the job.
StateThe state of the job that can be one of the following: Created, Initialized, Compiled, Running, Finished, Exception, or Unknown.
DurationThe duration of the query execution.
Start TimeThe time at which the query execution started.
End TimeThe time at which the query execution ended.
No. of VerticesThe number of vertices in the query.
HDFS ReadThe amount of HDFS data read.
HDFS WrittenThe amount of HDFS data written to an output file format.
Application IDThe ID of the application of the user you are currently viewing.

Query Trends

The Query Trends panel displays a chart showing the pattern of jobs running at a particular time, based on the following factors.

MetricDescription
Elapsed TimeThe time taken to run the jobs at a particular time.
VCoresThe number of VCores consumed to execute the query within a timeframe.
MemoryThe amount of memory used to execute the query within a timeframe.

Comparing Runs

Click Compare Runs to compare different runs of the query. Select the runs that you want to compare. You can choose from upto 10 previous runs of the query. The metrics that are different are highlighted and displayed at the very top of the comparison result.

Recommendations

The Recommendations panel displays recommendations that you can use to improve the performance of the SQL Query.

Query

The Query panel displays the query along with the Join details and the following details of table(s) used in the query.

ColumnDescription
Table NameThe table used in the query.
Filter expressionThe expression used in query filtering.
Total RowsThe total number of rows in the table.
Output RowsThe number of rows returned on executing the query.

YARN Diagnostics

This panel displays the following diagnostics metrics of YARN.

Column NameDescription
Start TimeThe time at which the YARN application started.
End TimeThe time at which the YARN application ended.
StateThe state of the YARN application. The state can be one of the following: Created, Initialized, Compiled, Running,Finished,Exception, or Unknown.
MessageThe diagnostic message in the YARN application.
Message CountThe number of diagnostic messages.

The following details are displayed for jobs in a YARN container.

note

A row contains data for a minute of the selected duration.

Column NameDescription
TimeThe minute at which the job is executed.
Preempted MBThe amount of processes that need priority to run the job.
Preempted VCoresThe number of VCores that need priority to run the job.
Allocated MBThe amount of memory allocated to the query (in Mb).
Avg MemoryThe average a mount of memory used.
Avg VCoreThe average amount of VCores used.
Running ContainersThe number of containers running in the query.
Queue Usage %The amount of queue usage (in %).
Cluster Usage %The amount of cluster usage (in %).
StateThe state of the query using YARN application. The state can be one of the following: Created, Initialized, Compiled, Running,Finished,Exception, or Unknown.
MessageThe diagnostic message.

Map Reduce Stats

This tile displays the statistics of processing of large data sets on a worker node. You can monitor the statistics of following processes by elapsed time.

  • Mappers
  • Reducers

These statistics can be sorted by Duration and Start Time.

Query Execution Metrics

The Query Execution Metrics panel displays the following set of metrics.

Metric TypeMetric NameDescription
TaskCommitted Heap BytesThe maximum amount of memory (in bytes) that can be used for memory management.
CPU MillisecondsThe CPU time (in ms)
GC TimeTime spent by the JVM in garbage collection while executing a query.
Input Records ProcessedThe number of input records processed.
Merge Phase TimeThe time taken to merge a query phase.
Merged Map OutputsThe number of map outputs that were merged.
Output BytesThe number of output bytes written to a file format while executing the query at a given time.
Output RecordsThe number of output records.
Physical Memory BytesThe amount of physical memory the query uses.
Shuffle Bytes To DiskThe number of shuffle records written to disk.
Shuffle Bytes To MemThe number of shuffle records written to memory.
Shuffle BytesThe number of shuffle bytes the query uses.
Shuffle Phase TimeThe time taken to shuffle a query phase.
Spilled RecordsThe number of spilled records.
Virtual Memory BytesThe amount of virtual memory used by queries.
File System MetricsHDFS Bytes WrittenThe number of HDFS bytes written to the query executor.
File Bytes WrittenThe number of file bytes written to the query executor.
HDFS Bytes ReadThe number of HDFS bytes read.
File Bytes ReadThe number of file bytes read.
DAG MetricsTotal Launched TasksThe number of tasks launched in DAG.
Rack Local TasksThe number of tasks that are local to the rack.
Data Local TasksThe number of tasks that are local to the data.
# of Succeeded TasksThe number of tasks that completed successfully.
Application Master Cpu TimeThe time taken by the CPU in application master.
Application Master GC TimeThe time taken by the GC in the application master.
HIVE MetricsHive Files CreatedThe number of HIVE files that were created.

Query DAG and Plan

The panel displays the distribution of query logic in the form of a DAG and a physical execution plan.

DAG

The Direct Acyclic Graph (DAG) is an execution graph that displays a flow diagram of the compiled Hive SQL queries. This graph is a work scheduling graph with finite elements connected in edges and vertices. The order of execution of the jobs in DAG is specified by the directions of the edges in the graph. The graph is acyclic as it has no loops or cycles.

DAG

Plan

Plan is a logical representation of how Spark executes the query, where a query is broken into different logical plans.