Job/Query Recommendation

Data Access Applications

Hive Query Recommendation

Queries are slow because of several reasons, some of those are overlooked several of the times. Following list of recommendations are some automatic recommendations that get generated:

hive.auto.convert.join=true;
hive.cbo.enable=true;
hive.vectorized.execution.enabled=true;
hive.groupby.skewindata=true;

Container Size Recommendations

Container sizing is one of the most difficult aspect of ongoing operations. The following recommendations allow system administrators to identify jobs that are wasting resources to correct at

  • Setting at a queue level
  • Individual level

alt-text

Stats Generation Recommendation

Upon observing the system for a certain amount of time, Acceldata can predict the time of availability of resources on the system and recommend when the stats gather should be run on Hive table.

This enables the CBO to act and allows queries to perform better.

Users unknowingly waste a lot of their executor resources. Identifying these scenarios are very difficult in real-time with OEM/Community tools.

Here is a brief demo of exploring the execution of Spark Jobs and the identification that shows the amount of executor that is wasted. Such Users and Applications also qualify as Rogue users.

Spark Executor Wastage





Post this identification, the following courses of actions are possible:

  • Review of the Spark program to shift compute to Executor as opposed to the Driver
  • Highlighting the areas of code which are either I/O intensive or have taken a lot of time
  • Along with the above two, it also shows the yarn diagnostics messages for additional guidance

Hbase Table Hotspotting Fixes

Rowkeys when are not created correctly cause Region and Table hotspotting. This would need a re-design of the row-key, which is discussed in detail here