This is the documentation for Cloudera Manager 5.1.0.
Documentation for other versions is available at Cloudera Documentation.

Impala Queries

The Impala Queries page displays information about the Impala queries that are running and have run in your cluster. You can filter the queries by time period and by specifying simple filtering expressions.

  Note: The Impala query monitoring feature requires Cloudera Impala 1.0.1 and higher.

Viewing Queries

  1. Do one of the following:
    • Select Clusters > Cluster XXX > Impala service name Queries.
    • Select Home > Impala service name and click Queries in the Activities section in the left pane.
    The Impala queries run during the last day are listed in the Queries list.

Configuring Impala Query Monitoring

You can configure the visibility of the Impala query results and the size of the storage allocated to Impala query results.

For information on how to configure whether admin and non-admin users can view all queries, only that user's queries, or no queries, see Configuring Query Visibility.

Query information is stored in-memory in a ring buffer. This has two consequences: if you restart Service Monitor, all queries are lost and older queries eventually will be dropped. For information on how to configure the query store, see Configuring Impala Query Store and Runtime Profile Maximum Size.

Impala Best Practices

To open the Impala Best Practices page, click the Best Practices tab on the Impala service page. The page contains charts to help identify whether Impala best practices are being followed. See the individual charts for a descriptions of each best practice and how to determine if it is being followed. Consult the Impala documentation for further detail on each best practice and for additional best practices.

Adjust the time range to see data on queries run at different times. Click on the charts to get more detail on individual queries. Use the filter box at the top right of the Best Practices page to adjust what data is shown on the page. For example, to see just the queries that took more than ten seconds, make the filter query_duration > 10s.

Create a trigger based on any best practice by choosing Create Trigger from the individual chart drop down menu.

Queries List

Queries appear in the list with the most recent at the top. Each query has summary and detail information. A query summary includes the following default attributes: start and end timestamps, statement, duration, rows produced, user, coordinator, database, and query type. For example:

You can add additional attributes to the summary with the Attribute Selector. In each query summary, the query statement is truncated if it is too long to display. To display the entire statement, hover over a query. The query entry will expand to display the entire query string. To collapse the query display, move the mouse cursor. To display information about query attributes and possible values, hover over a field in a query. For example:

A running query displays under the start timestamp. If an error occurred while processing the query, displays under the complete timestamp.

To kill a running query, select Actions > Cancel. Only an administrator can cancel queries and killing a query creates an audit event. When you cancel a query, replaces the label. Once the page is refreshed, the entry is removed from the list.

When a job fails, asdf replaces the label.

To display query details, select Actions > Details.

To display all the queries run by the same user, select Actions > User's Impala queries.

To display all the queries that used the same resource pool, select Actions > Queries in the same YARN pool.

Filtering Queries

You filter queries by selecting a time range and specifying a filter expression in the text box.

You can use the Time Range Selector or a duration link () to set the time range. (See Time Line for details).

Filter Expressions

Filter expressions specify which entries should display when you run the filter. The simplest expression is made up of three components:
  • Attribute - the query language name of the attribute.
  • Operator - the type of comparison between the attribute and the attribute value. Cloudera Manager supports the standard comparator operators: =, !=, >, <, >=, <=, and RLIKE, which does regular expression matching as specified in the Java Pattern class documentation. Numeric values can be compared with all operators. String values can be compared with =, !=, and RLIKE. Boolean values can be compare with = and !=.
  • Value - the value of the attribute. The value depends on the type of the attribute. For a Boolean value, specify either true or false. When specifying a string value, enclose the value in double quotes.

You create compound filter expressions using the AND and OR operators. When more than one operator is used in an expression, AND is evaluated first, then OR. To change the order of evaluation, surround subexpressions with parentheses.

Compound Expressions

To find all the queries issued by the root user that produced over 100 rows, use the expression:
user = "root" AND rowsProduced > 100
To find all the executing queries issued by users Jack or Jill, use the expression:
executing = true AND (user = "Jack" OR user = "Jill")

Choosing and Running a Filter

  1. Do one of the following:
    • Select a Suggested or Recently Run Filter
      1. Click the to the right of the Search button to display a list of sample and recently run filters, and select a filter. The filter text displays in the text box.
    • Construct a Filter from Attribute Histograms
      1. Optionally, click the Select Attributes link to display a dialog where you can chose which attributes to display in histograms. Check the checkbox next to one or more attributes, and click Close.
      2. Click the Enhance Filter link. Histograms of the selected attributes display with the number of results that match each value of the selected attributes.
      3. Click a histogram bar that represents the range of attribute values to filter on. The color of the histogram bar gets lighter (on the right below)

        and a filter with the attribute value set to the range of the histogram bucket is added to the text box. The range includes the lower bound of the bucket and excludes the upper bound of the bucket. For example:
        (<x>_duration >= 17600.0 AND <x>_duration < 18000.0)
        where <x> is query or application.

        If you click the same histogram bar again, the color reverts to the darker blue and the filter is removed from the text box.

        If you click another histogram bar, another filter is OR'd with the existing filter:
        (<x>_duration >= 17600.0 AND <x>_duration < 18000.0 OR <x>_duration >= 16000.0 AND <x>_duration < 16400.0)
    • Type a Filter
      1. Start typing or press Spacebar in the text box. As you type, filter attributes matching the letter you type display. If you press Spacebar, standard filter attributes display. These suggestions are part of typeahead, which helps build valid queries. For information about the attribute name and supported values for each field, hover over the field in an existing query.
      2. Select an attribute and press Enter.
      3. Press Spacebar to display a drop-down list of operators.
      4. Select an operator and press Enter.
      5. Specify an attribute value in one of the following ways:
        • For attribute values that support typeahead, press Spacebar to display a drop-down list of values and press Enter.
        • Type a value.
  2. Put the cursor on the text box and press Enter or click Search. The list displays the results that match the specified filter. If the histograms are showing, they are redrawn to show only the values for the selected filter. The filter is added to the Recently Run list.

Example: Drilling into Query Results

Suppose we have a set of results with the following duration distribution :

The 0-20.00s bucket has 7 results, but with the current distribution we cannot discriminate between the results in that bucket.

Selecting the left-most histogram bar adds the following filter to the text box:
<x>_duration >= 0.0 AND <x>_duration < 20000.0
where <x> is query or application. After clicking Search again, the histogram appears as follows:

Selecting the histogram with 5 results again refines the filter to:
<x>_duration >= 16000.0 AND <x>_duration < 18000.0
After clicking Search again, the histogram appears as follows:

Filter Attributes

The available filter attributes, their names as they are displayed in Cloudera Manager, their types, and descriptions, are enumerated below.

Attribute

Display Name

Value Type

Description

bytes_streamed

Bytes Streamed

BYTES

The total number of bytes sent between Impala daemons while processing the query.

coordinator_host_id

Coordinator

STRING

The host coordinating the query.

database

Database

STRING

The database on which the query was run.

ddl_type DDL Type STRING The type of DDL query.

executing

Executing

BOOLEAN

Whether the query is currently executing.

file_formats

File Formats

STRING

The file formats used in the query. A file format is a string of the form: File Type/Compression Type, where File Type can take the values: TEXT, PARQUET, SEQUENCE_FILE, and RC_FILE, and Compression Type can take the values: NONE, DEFAULT, BZIP2. For further information, see How Impala Works with Hadoop File Formats.

hbase_bytes_read

HBase Bytes Read

BYTES

The total number of bytes read from HBase by the query.

hbase_bytes_read_per_second

HBase Read Throughput

BYTES_PER_SECOND

The overall HBase read throughput (in B/s) of the query.

hdfs_bytes_read

HDFS Bytes Read

BYTES

The total number of bytes (in GiB) read from HDFS by the query.

hdfs_bytes_read_local

HDFS Local Bytes Read

BYTES

The total number of local bytes read (in GiB) from HDFS by the query.

hdfs_bytes_read_local_percentage

HDFS Local Bytes Read Percentage

NUMBER

The percentage of all bytes read from HDFS by the query that were local.

hdfs_bytes_read_per_second

HDFS Read Throughput

BYTES_PER_SECOND

The overall HDFS read throughput (in B/s) of the query.

hdfs_bytes_read_remote HDFS Remote Bytes Read BYTES The total number of remote bytes read from HDFS by this query.
hdfs_bytes_read_remote_percentage HDFS Remote Bytes Read Percentage NUMBER The percentage of all bytes read from HDFS by this query that were remote.

hdfs_bytes_read_short_circuit

HDFS Short Circuit Bytes Read

BYTES

The total number of bytes (in GiB) read from HDFS by the query that used short-circuit reads.

hdfs_bytes_read_short_circuit_percentage

HDFS Short Circuit Bytes Read Percentage

NUMBER

The percentage of all bytes (in GiB) read from HDFS by the query that used short-circuit reads.

hdfs_bytes_skipped

HDFS Bytes Skipped

BYTES

The total number of bytes that had to be skipped by the query while reading from HDFS. Any number above zero may indicate a problem.

memory_accrual Memory Accrual BYTE_SECONDS The total accrued memory usage by the query. This is computed by multiplying the average aggregate memory usage of the query by the query's duration.
memory_aggregate_peak Aggregate Peak Memory Usage BYTES The highest amount of memory allocated by this query at a particular time across all nodes.
memory_per_node_peak Per Node Peak Memory Usage BYTES The highest amount of memory allocated by any single node that participated in this query. See Node With Peak Memory Usage for the name of the peak node.
memory_per_node_peak_node Node With Peak Memory Usage STRING The node with the highest peak memory usage for this query.
network_address Network Address STRING The network address that issued this query.
pool Pool STRING The name of the YARN pool to which this query was issued. Within YARN, a pool is referred to as a queue.
pool_wait_time Pool Wait Time MILLISECONDS The total amount of time the query spent waiting for pool resources to become available.

query_duration

Duration

MILLISECONDS

The duration of the query in milliseconds.

query_id

Query ID

STRING

The ID of the query.

query_state

Query State

STRING

The current state of the query: CREATED, INITIALIZED, COMPILED, RUNNING, FINISHED, UNKNOWN, EXCEPTION. If the query has failed or been canceled, queryState will be EXCEPTION.

query_status

Query Status

STRING

The status of the query. If the query failed, queryStatus will contain diagnostic info such as Memory limit exceeded, Failed to write row .... If canceled, queryStatus is Canceled. Otherwise, queryStatus is OK.

query_type

Query Type

STRING

The type of the query's SQL statement: DML, DDL, QUERY, UNKNOWN.

rows_produced

Rows Produced

NUMBER

The number of rows returned by the query.

service_name Service Name STRING The name of the Impala service.
session_id Session ID STRING The ID of the session that issued this query.
session_type Session Type STRING The type of the session that issued this query.
stats_missing Stats Mission BOOLEAN Whether the query was flagged with a missing table or column statistics warning during the planning process.

statement

Statement

STRING

The query's SQL statement.

thread_cpu_time Threads: CPU Time MILLISECONDS The sum of the CPU time used by all threads of the query.
thread_cpu_time_percentage Threads: CPU Time Percentage NUMBER The sum of the CPU time used by all threads of the query divided by the total thread time.
thread_network_wait_time Threads: Network Wait Time MILLISECONDS The sum of the time spent waiting for the network by all threads of the query.
thread_network_wait_time_percentage Threads: Network Wait Time Percentage NUMBER The sum of the time spent waiting for the network by all threads of the query divided by the total thread time.
thread_storage_wait_time Threads: Storage Wait Time MILLISECONDS The sum of the time spent waiting for storage by all threads of the query.
thread_storage_wait_time_percentage Threads: Storage Wait Time Percentage NUMBER The sum of the time spent waiting for storage by all threads of the query divided by the total thread time.
thread_total_time Threads: Total Time MILLISECONDS The sum of thread CPU, storage wait, and network wait times used by all threads of the query.

user

User

STRING

The user who issued the query.

Examples

Consider the following filter expressions: user = "root", rowsProduced > 0, fileFormats RLIKE ".TEXT.*", and executing = true. In the examples:

  • The filter attributes are user, rowsProduced, fileFormats, and executing.
  • The operators are =, >, and RLIKE.
  • The filter values are root, 0, .TEXT.*, and true.