MapReduce Dashboard

MapReduce allows you to process huge amounts of data residing in clusters of hardware, easily and in a reliable manner. The algorithm used in MapReduce contains tasks of mapping and reducing where initially a set of data is broken down or processed into another set of data and the elements are broken into key and value pairs. The reduce task is performed right after a mapping task is done and uses the output of a map task as an input.

Using Acceldata MapReduce, you can monitor queries used in MapReduce operations.

Click MapReduce --> Dashboard in the left pane to access the MapReduce dashboard. The dashboard consists of summary panels, a Sankey Diagram, and charts that display information about queries and other related metrics.

Key Performance Indicators (KPI)

KPIs are user-driven analytics criteria to analyze the performance of jobs.

Creating KPIs

To create a KPI for a module, do the following.

  1. Click Create KPI .
    The Create window is displayed.

  2. Do the following in the available indices.

    1. In Metric, click the drop-down and choose one metric from the available list.
    2. (Optional) For time-related or memory-related metrics, you can choose the unit of the resultant KPI value. To do so, click the drop-down that appears in the right and select a unit.
    3. In Operator, click the drop-down and choose the operator to measure the metric you selected.
    4. In Metric * 2.5, click the text box for available metrics or type a value in that text box.
      note

      The default text Metric * 2.5 indicates that you can perform an operation after choosing a metric to quantify the performance measure.

  3. In Name, type a name for the KPI you are creating.

  4. (Optional) Click the + sign to add another metric to the same KPI.

  5. Click Save.

The KPI is created.

Summary Panel

The summary tiles display several aggregated values. You can click the number on each field to view detailed information about that metric.

note

The default time range is Last 24 hrs. To view statistics from a custom date range, click the icon and select a time frame and timezone of your choice.

Metric NameDescription
UsersThe total number of users.
# of QueriesThe number of queries being run during the selected timeframe.
Avg CPU AllocatedThe average of CPU time across all queries.
Avg Memory AllocatedThe average amount of memory allocated across queries.
SucceededThe number of queries executed successfully.
RunningThe number of queries that are in progress.
FailedThe number of queries that failed to execute.
KilledThe number of queries that were killed.

Context Metric Distributions

The Context Metric distributions panel displays the summary of jobs as a Sankey diagram. You can see the flow of the selected queue to users and to the queries.

The following screenshot is an example of a Context Metric Distributions Sankey chart of the last 24 hours displayed by Duration.

Sankey Diagram

You can gather the following information from the chart.

note

To see the distribution in numbers, hover over the Sankey chart.

You can observe the following in Queues.

  • 100% of queries are running in default queue.
  • From Users category, you can gather the following.
    • 71.43% of queries are run by 3 users.
    • 14.29% of queries are run by 4 users.
    • 8.57% of queries are run by 3 users.
    • 5.71% of queries are run by 2 users.
  • From Queries category, you can gather the following.
    • 25 queries (71.43%) are executed within 6.19 seconds to 10.76 seconds.
    • 5 queries (14.29%) are executed within 17.95 seconds to 22.12 seconds.
    • 3 queries (8.57%) are executed within 23.12 seconds to 25.83 seconds.
    • 2 queries (5.71%) are executed within 11.34 seconds to 12.13 seconds.

Viewing Sankey chart by distribution

You can view the Sankey chart by the following distributions.

Distribution MetricDescription
DurationThe duration of the queries executed by users.
MappersThe first phase of processing input data displaying a key value pair.
ReducersThe task of processing mapper output and displaying a key value pair.
GC TimeTime spent by the JVM in garbage collection while executing a query.
Reducer Time AvgThe average time taken to complete the reducer task.
Reducer Time MaxThe maximum time taken to complete the reducer task.
Shuffle Time AvgThe average time taken to transfer the map output from Mapper to Reducer.
Shuffle Time MaxThe maximum time taken to transfer the map output from Mapper to Reducer.
Sort Time AvgThe average time taken to sort out mapper output keys.
Sort Time MaxThe maximum time taken to sort out mapper output keys.

Other MapReduce Charts

The following charts are also displayed on the MapReduce Dashboard.

Chart NameDescription
VCore UsageThe number of physical virtual cores used by a queue in the cluster.
Memory UsageThe amount of memory used by a queue in the cluster.
Query Execution CountThe number of queries executed within a timeframe.
Average Query TimeThe average time taken to execute queries. This metric also displays the Total Execution Time.
Top 20 Users (By Query)The top 20 users that executed the highest number of queries.
Top 20 Tables (By Query)The top 20 tables that executed the highest number of queries.

Queues

In Queues tab, you can see the root queue, default queue, and custom queue(s) defined by the cluster administrator.

root: This is a predefined queue that is a parent of the available queues in your cluster. This queue uses 100% of resources.

default: A designated queue defined by the administrator. This queue contains jobs that do not have a queue allocated.

note

To view memory capacity allocated to or used by resources on a queue, click the queue in the Queues tab.