View Query Execution Log
Control Messages
(Description valid for ICE/IEE 4.0.4)
Infobright currently does not support the MySQL EXPLAIN command. As a columnar database, the Infobright engine generates an execution plan which is not easily convertible to MySQL-like EXPLAIN notions. Moreover, the actual execution plan is generated dynamically, based on Knowledge Nodes and previous execution steps, thus it is not possible to generate it without the actual execution of a query.
Instead, the Query Execution Log is used to track the execution plan, provide rough statistics, and provide insights into performance bottlenecks.
The query execution log is located:
- in bh.err file in your database directory (configurable as log-error value in the —config file provided at installation, /etc/my-ib.cnf by default),
- on console, if the server is run with —console option
ControlMessages parameter – switching the log on
The parameter (set in brighthouse.ini) is used to define the level of details displayed in Query Execution Log, at the start of every line. Note that the more details are displayed, the larger (and potentially less clear) log is created. Restart the Infobright server to execute the change of ControlMessages parameter.
The execution log for a query may be quite large (several KBs), and the bh.err log may grow substantially. For a server executing a lot of small queries it is recommended either to switch the logging off (ControlMessages = 0), or to implement an automatic backup scheme. A shell script (Unix) which copies the error log to a backup file and re-initializes the bh.err file is available here: http://www.infobright.org/Downloads/Contributed-Software/.
ControlMessages = 0
Query Execution Log is disabled. Only startup information is logged in bh.err.
ControlMessages = 1
Logging of actions without timestamps etc.
ControlMessages = 2
This setting will add the timestamp and a session number in square brackets. The session number is maintained by a querying tool – some tools use different numbers for each subsequent query, some other will keep the number for the next queries. The number in the log is useful to determine which message is generated by which query (running in parallel). For example:
2009-05-11 14:30:10 [6] (t0): 25 all packrows, 25 to open (including 25 full)
2009-05-11 14:30:11 [2] Displaying result: 1000 rows.
These two log messages are connected to two different queries (session 6 and 2) and should not be understood as a sequence of events.
ControlMessages = 3 (Added on 3.4.0 svn rev 7877)
This setting will add the following four values:
[ Total Mem/Free Mem (M), Total CPU cores/Free CPU cores ]
Example:
2010-04-08 15:02:01 [ 3894M/2467M, 4/0.44 ] Total data packs actually loaded (approx.): 0
Total Mem = total memory in system (3894M),
Free Mem = RES + system cache (2467M),
Total CPU cores (4),
Free CPU cores = Free CPU over last minute.
If user specifies query throttle parameters in the .infobright file, then it will add more logs such as:
[ Total/Free Mem(M), Total/Free CPU coress, Waiting queries/Running queries/Throttle/Average query wait time in queue ]
.infobright can have:
<brighthouse>
…
<throttle>
<limit > 4 </limit>
</throttle>
</brighthouse>
Example:
2010-04-08 15:02:01 [ 3894M/2467M, 4/0.44, 0/1/4/0.00 ] Total data packs actually loaded
(approx.): 0
Note that for a large number of short queries (i.e. generating many log lines based on system info) a slowdown may be noticed for ControlMessages 3 and higher.
ControlMessages = 4 (added on 3.4.2)
The same as 3, additionally outputs .infobright configuration to the brighthouse.log file.
Query compilation
When a query starts, its compiled version is displayed in the log, as a sequence of internal Infobright execution steps:
2010-06-03 15:16:50 [1] T:-1 = TABLE_ALIAS(T:0,"dim_dates")
T:-2 = TMP_TABLE(T:-1)
A:-1 = T:-2.ADD_COLUMN(<null>,COUNT,"count(*)","ALL")
RESULT(T:-2)
The compilation steps may be helpful for resolving real table names (as above: table t0 is “dim_dates”). Expressions, constants and column names in the query are transformed into internal Virtual Column objects, which definition may be found in the query compilation log. ADD_COLUMN is used to define and name (alias) output columns.
Query execution steps
The query log is a useful tool for identifying bottlenecks and pinpointing errors. The following messages may indicate problems:
Table ./mydb/clients (41) accessed by MySQL engine.
The Infobright table was accessed by MySQL engine, instead of Infobright one. Possible reasons:
- Unsupported syntax is used.
- Join with non-Infobright table is performed.
- MySQL format is used in export, instead of Infobright text format.
- MySQL is gathering statistics of Infobright tables either for ETL tool, or to optimize query (e.g. outer joins) before sending it back to Infobright engine.
All cases except the last one mean that the query will be very slow.
If the query is not switched to MySQL, then every part of it (WHERE part, aggregations etc.) will produce specific logs.
WHERE conditions
The first phase of query execution is calculating WHERE conditions (except joins, i.e. conditions mixing columns from different tables). The following example comes from TPC-H benchmark database:
select count(*) from lineitem join orders on l_orderkey = o_orderkey
where o_orderkey = 3004704 and l_tax > 0.02 and o_orderpriority = '3-MEDIUM' and l_tax > 0;
2010-06-18 13:54:09 [1] Merging conditions...
2010-06-18 13:54:09 [1] Initial execution plan (non-join):
2010-06-18 13:54:09 [1] Cnd(0): VC:0(t1a0) BET. VC:8(3004704) AND VC:9(3004704) (0)
2010-06-18 13:54:09 [1] Cnd(1): VC:7(t0a0) BET. VC:21(3004704) AND VC:22(3004704) (0)
2010-06-18 13:54:09 [1] Cnd(2): VC:4(t1a5) BET. VC:18(4) AND VC:18(4) (17.21)
2010-06-18 13:54:09 [1] Cnd(3): VC:2(t0a7) BET. VC:10(0.03) AND VC:11(+inf) (19.8)
2010-06-18 13:54:09 [1] Packs/packrows after KN evaluation:
2010-06-18 13:54:09 [1] (t0) Pckrows: 9156, susp. 1 (9155 empty 0 full). Conditions: 2
2010-06-18 13:54:09 [1] (t1) Pckrows: 2289, susp. 1 (2288 empty 0 full). Conditions: 2
2010-06-18 13:54:09 [1] Packrows after exact evaluation (WHERE):
2010-06-18 13:54:09 [1] (t0): 9156 all packrows, 1 to open (including 0 full)
2010-06-18 13:54:09 [1] (t1): 2289 all packrows, 1 to open (including 0 full)
The log reflects main phases of one-dimensional (non-join) condition evaluation.
Merging conditions is a message indicating nontrivial transformations of conditions before execution. Nearly all conditions are somehow transformed (e.g. encoded to an internal value representation), but in this case two steps are important for performance:
join ... on l_orderkey = o_orderkey where o_orderkey = 3004704
TRANSFORMED TO:
[no join condition] where o_orderkey = 3004704 and l_orderkey = 3004704
One-dimensional conditions are usually much faster than joins, so we are trying to make such transformation whenever possible. Additionally, one of the remaining conditions is superfluous:
l_tax > 0.02 and l_tax > 0
TRANSFORMED TO:
l_tax > 0.02
Other transformations include changing all equalities and inequalities into unified BETWEEN form, and changing a text condition into a numerical one, as the column was declared as LOOKUP. We can inspect all the changes by analyzing the final list of transformed conditions (Initial execution plan section of log). The list of conditions contain indication of virtual columns used. VC:n means a virtual column number n (defined in the query compilation section), and its short description is given in the parentheses. E.g. t1a0 means column 0 of table 1 (numbers start with 0), the number of table reflects table order in the FROM part of query; column number reflects column order in table definition. Constants (like 3004704 above) are also defined as virtual columns. Each condition is augmented with its weight (calculated basing on table sizes and Knowledge Nodes), and the conditions with smaller weight are scheduled to be calculated first.
Rough phase of conditions execution starts just after the plan is displayed and finishes with “Packs/packrows after KN evaluation” message. This part of log summarizes the result of rough evaluation (based on Knowledge Grid) of conditions, table by table. E.g.:
2010-06-18 13:54:09 [1] (t0) Pckrows: 9156, susp. 1 (9155 empty 0 full). Conditions: 2
This summary for table t0 contains the following information:
- 9156 – the total size of table (in data packs), which gives about 600 mln rows,
- susp. 1 – the number of suspected packs, i.e. data packs for which we have to uncompress data and check conditions,
- 9155 empty – all these data packs were excluded basing on Knowledge Grid statistics,
- 0 full – these data packs are resolved as fully relevant, i.e. we does not need to uncompress them and check conditions (as they will be met), but we may need the packs for some columns to create output,
- Conditions: 2 – a number of non-join conditions for the given table.
Exact execution will actually open data packs (the suspected ones) and check conditions. Sometimes the number of opened data packs may be lower than expected; e.g. if we manage to exclude the whole data pack with one condition, then we don’t need to open it for the other column. The summary message in “Packrows after exact evaluation” part is constructed in the way similar to the rough part.
In IEE a mechanism of parallel execution of conditions is available. The number of used threads depends on data size, server load and IEE settings. When the parallel execution of a given condition is actually used, a message is displayed:
2011-03-02 12:09:01 [1] Cnd(1) on 8 threads
Joins
Joining conditions are organized in groups regarding the same pairs of tables, and each group is executed by one of the following join algorithms:
- Hash join (hash block algorithm). There are two sides of join: the traversed one (which is cached in a memory hash buffer part by part – usually this is the smaller table) and the matched one (which is then matched against the cached values as a whole, for every contents of the hash buffer). This algorithm is used in cases when at least one join condition is an equality of columns from different tables, or functions of them (e.g. f(t1.a) = g(t2.b)). Several equality conditions can be combined into one for hashing.
- Map join. It is a fast algorithm designed for easy, numerical (equality condition) cases. A smaller table is scanned and a mapping formula (piecewise linear function represented by offsets) is found between a key value and row number. Then this formula is used to transform the key values of the larger table into join result (output tuples).
- Sorter join. In this algorithm two sorters are created and then values are matched in the order of key values. Inequality conditions (like t1.a < t2.b, but except not equal operator) are executed this way.
- Joiner loop is the least efficient joining algorithm in Infobright. It is a quadratic loop through all pairs of rows, for each pair the condition is checked out. This type of join is used when none of the other algorithms can be used, e.g. for an implicit join conditions (e.g. f(t1.a, t2.b) = const).
- If there is more than one condition for the same pair of tables (a conjunction), the algorithm will try to find all equality conditions and execute hash join; if not possible, an inequality will be localized to use sorter join, otherwise all conditions will be executed by loop join. All conditions not executed by the main algorithm (e.g. the second inequality in t1.a between t2.b and t2.c) will be checked against output tuples of the algorithm, and no additional passes are needed.
The messages produced during the join execution depend on the algorithm used.
2010-06-14 14:47:01 [1] Join execution plan:
2010-06-14 14:47:01 [1] Cnd(0): VC:2(t0a0) = VC:3(t1a0) (34.2)
These messages indicate the beginning of joining phase. A list of all join conditions, ordered by theirs weight (a number in parentheses, approximated by table sizes and Knowledge Grid statistics) is displayed. Names of tables joined in each step may be inferred from their numbers (t0, t1 in the example above); they are numbered in order of appearance in query, starting with 0.
Hash join execution is described by messages similar to these:
2010-06-14 14:47:19 [1] Traversed 60398160/150000000 rows.
2010-06-14 14:48:26 [1] Produced 241604746 tuples.
2010-06-14 14:48:44 [1] Traversed 120796320/150000000 rows.
2010-06-14 14:49:56 [1] Produced 483209597 tuples.
2010-06-14 14:50:02 [1] Traversed all 150000000 rows.
2010-06-14 14:50:28 [1] Produced 600037902 tuples.
2010-06-14 14:50:28 [1] Roughly omitted 66.65% packrows.
“Traversed” step is finished when the next chunk of rows (potentially all of them) is decompressed, transformed and stored in memory hash buffer. “Produced x tuples” appears when all rows from the “matched” table (usually this is the larger one) are decompressed and matched against the contents of hash buffer. Resulting tuples are stored in another memory buffer, cached on disk if needed.
For larger joins (like the above example) joining needs many passes through the “traversed” table. Each pass will take similar time, so we can approximate the total joining time basing on the reported progress. The last message indicates the number of data packs we were able to omit in “matching” phase of joining thanks to statistics of hash buffer and KNs.
Map join messages may look like that:
2011-08-15 15:31:14 [1] Join mapping (offset map) created on 399994 rows.
This message reports the size of the smaller table, on which the join mapping formula is found.
Sorter join messages may look like that:
2010-06-14 15:20:36 [1] Joining sorters created for 150000000 and 600037902 tuples.
This message appears when both joined columns are already scanned and turned into sorter buffers. It may take about a half of joining time, depending on data types and value sizes. The next step is to sort the buffers and produce join result by matching keys from both sorter buffers. The buffers are cached on disk if too big for memory.
The following message indicates joiner loop algorithm, which may result in a slow response time:
2009-11-06 18:52:20 [1] Starting joiner loop (4017000000000 rows).
The number in message indicates the number of pairs to be checked. If this number is large (as above), it is probably better to stop query and rethink its formulation, to avoid joins declared in implicit way, e.g.:
...WHERE t1.x + t2.y > 5 ... ---> ...WHERE t1.x > 5 - t2.y ...
The latter formulation will use much faster sorter join algorithm.
2009-05-29 07:41:35 [7] Warning: a big intermediate object created (~268 GB)
The above message means that a substantial disk space will be used to execute the query. In most cases it is not intended and may indicate a mistake in query formulation – e.g. a full Cartesian join of tables with millions of rows.
When a joining (using any of the algorithms) is finished, the final report displays the number of rows in result, numbers of joined tables, information about outer joins etc.:
2010-06-14 14:50:28 [1] Tuples after inner join 0-1 [hash]: 600037902 [11]
A name of algorithm used is also displayed. The last information (like [11]) is the structure of the result: each position indicates one dimension(table), and the symbol on this position means:
- “1” (or other number) when the table is represented as IndexTable object, i.e. it is a vector of tuple numbers, stored in a memory buffer or cached on disk. Dimensions marked as a number create a group of commonly described intermediate result (a kind of virtual subtable); more than one such result may be represented in the same time. E.g. [1122] means that the join result is a cross product of two subtables (one created as a join of first two dimensions, the other as a join of the remaining two ones).*
- “f” when the table is represented as zero/one filter (the row may be present or not). The result of join is interpreted as a cross product of all dimensions described by “f” and all groups of dimensions described as numbers or otherwise.
- “F” and “v” is an indication of Virtual Dimensions Column representation. It is a representation of join result as a filter (“F”) on the “fact” table, and a special version of a vector of tuple numbers (“v”) for a “dimension” table. This is a representation more convenient than the IndexTable one (indicated by numbers).
- “-” when the dimension (table) is not present in query any longer and no description is needed. Only a number of tuples may be inferred from such “forgotten” dimensions.
Initial state is described as “all f”, and each step of joining usually will change the representation. Note that results described as numbers are the most “heavy”, as explicit row numbers must be stored. On the other hand, usually n:m joins cannot avoid such representation.
For many tables and join conditions, messages will describe the whole evolution of result. The following example also comes from TPC-H benchmark database:
select n_name, sum(l_extendedprice), sum(l_discount) as revenue
from customer, orders, lineitem, supplier, nation, region
where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey
and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey
and r_name = 'EUROPE' and o_orderdate >= '1997-01-01' and o_orderdate < '1998-01-01'
group by n_name order by revenue desc;
The join execution log:
2011-03-02 12:18:30 [1] Join execution plan:
2011-03-02 12:18:30 [1] Cnd(1): VC:13(t4a2) = VC:14(t5a0) (16.89)
2011-03-02 12:18:30 [1] Cnd(3): VC:6(t2a0) = VC:7(t1a0) (27.19)
2011-03-02 12:18:30 [1] Cnd(4): VC:11(t3a3) = VC:12(t4a0) (28.11)
2011-03-02 12:18:30 [1] Cnd(5): VC:4(t0a0) = VC:5(t1a1) (29.54)
2011-03-02 12:18:30 [1] Cnd(6): VC:8(t2a2) = VC:9(t3a0) (35.15)
2011-03-02 12:18:30 [1] Cnd(7): VC:10(t0a3) = VC:11(t3a3) (47.17)
2011-03-02 12:18:30 [1] Hash join buffer initialized for up to 277 rows, 4+12 bytes.
2011-03-02 12:18:30 [1] Traversed all 1 rows.
2011-03-02 12:18:30 [1] Produced 5 tuples.
2011-03-02 12:18:30 [1] Tuples after inner join 4-5 [hash]: 5 [fffff-]
2011-03-02 12:18:30 [1] Hash join buffer initialized for up to 33555503 rows, 4+12 bytes.
2011-03-02 12:18:36 [1] Traversed all 22760815 rows.
2011-03-02 12:20:29 [1] Produced 91050840 tuples.
2011-03-02 12:20:29 [1] Tuples after inner join 1-2 [hash]: 91050840 [f22ff-]
2011-03-02 12:20:29 [1] Hash join buffer initialized for up to 277 rows, 4+12 bytes.
2011-03-02 12:20:29 [1] Traversed all 5 rows.
2011-03-02 12:20:29 [1] Produced 200535 tuples.
2011-03-02 12:20:29 [1] Tuples after inner join 3-4 [hash]: 200535 [f22Fv-]
2011-03-02 12:20:30 [1] Hash join buffer initialized for up to 28125749 rows, 4+12 bytes.
2011-03-02 12:20:36 [1] Traversed all 15000000 rows.
2011-03-02 12:21:45 [1] Produced 91050840 tuples.
2011-03-02 12:21:45 [1] Tuples after inner join 0-1 [hash]: 91050840 [2-2Fv-]
2011-03-02 12:21:45 [1] Hash join buffer initialized for up to 376997 rows, 4+16 bytes.
2011-03-02 12:21:49 [1] Traversed all 200535 rows.
2011-03-02 12:23:30 [1] Produced 730648 tuples.
2011-03-02 12:23:30 [1] Tuples after 2 cond. inner join 0-2-3 [hash]: 730648 [--2-2-]
The above join of 6 tables starts with hash joining table 4 and 5 (numbers start with 0). The next steps should be interpreted as follows:
- [fffff-] – Note that some tables are excluded completely, as the engine found that they are used neither by the subsequent joins nor by resulting columns. In this case the last table is omitted.
- [f22ff-] – Hash join of tables 1 and 2 was chosen to be executed as the next one. A result is represented by an IndexTable (an explicit list of tuples), indicated by “2”.
- [f22Fv-] – Tables 3 and 4 are joined by a “large-small” hash join mechanism. At this stage the intermediate result is a Cartesian product of table 0, table 1-2 (already joined), and table 3 (represented as a filter, augmented by a dimension information coming from table 4).
- [2-2Fv-] – Hash join of table 0 and 1-2. The result is also an explicit IndexTable (a vector of tuples numbers), but at this stage we may omit information for table 1 as no longer used.
- [—2-2-] – Note that the last hash join uses 2 conditions joining table 0-2 with 3-4 in one step. The result is a list of tuples stored as an IndexTable object and only two tables are actually represented (needed for the following aggregation).
A big intermediate result reported in a series of joins may suggest that order of joins is not optimal (however, it is virtually impossible to change it by hand, except using subqueries).
Aggregations
Aggregations (GROUP BY, or SELECT DISTINCT) are based on large hash buffers (for determining groups) containing aggregation counters or other buffers (like a buffer for current MAX value). Values are binary encoded (a kind of compression) before sending them to the hash table. An example of log messages (TPC-H benchmark database):
select o_orderstatus, o_orderpriority, count(*), max(o_orderdate) from orders group by 1,2;
2010-06-17 15:26:45 [1] GroupTable initialized for up to 1280 groups, 2+14 bytes (0.061 MB)
2010-06-17 15:26:46 [1] Aggregating: 150000000 tuples left, 0 gr. found so far
2010-06-17 15:27:25 [1] Aggregated (15 gr). Omitted packrows: 0 + 0 partially, out of 2289 total.
A hash-based table for handling groups and aggregation buffers is initialized basing on an upper approximation of a number of groups (“positions” in the log above) and sizes of all values we need to store. The first number (2) is the total size of grouping columns (which may be internally compressed), the second one (14) is a total number of aggregation buffers (counters). SELECT DISTINCT queries are equivalent to a grouping without the aggregation part.
A number of rows (tuples) to be aggregated is reported at the beginning of aggregation loop. Calculations end with a report about a number of groups found, as well as a number of data packrows omitted by rough part; “n + m partially” means that n packrows were omitted completely (both grouping and aggregated columns), and for m packrows some columns were omitted, and some other had to be decompressed. High values of n and m means that the Knowledge Grid is helpful for this query.
For some long-lasting (multipass) aggregations it is possible to forecast approximate total time basing on logged timestamps:
2009-11-06 19:05:57 [1] Aggregating: 1000000000 tuples left, 0 gr. found so far
2009-11-06 19:17:59 [1] Aggregating: 864292787 tuples left, 3677031 gr. found so far
...
Using timestamps we can conclude that 722 sec. was spent on 136 mln. rows (13.6%), which means that the query will probably run for another 1 h 16 minutes.
Distinct modifier inside an aggregating functions needs additional hash-based mechanism, which may potentially need many passes through data. Before aggregation, column statistics are checked to determine an upper approximation of distinct values. An example of count(distinct …) query messages:
2010-06-07 18:07:18 [1] Adding dist. column, min = -inf, max = +inf, dist = 1000000000
2010-06-07 18:07:18 [1] GroupDistinctTable initialized as Hash(63161283), 1+16 bytes.
2010-06-07 18:07:19 [1] Aggregating: 1000000000 tuples left.
2010-06-07 18:29:04 [1] Next distinct pass: 908767724 rows left
2010-06-07 18:39:22 [1] Next distinct pass: 817534159 rows left
...
2010-06-07 19:24:08 [1] Next distinct pass: 5769 rows left
2010-06-07 19:24:09 [1] Aggregated (1 gr). Omitted packrows: 0 + 0 partially, out of 15259 total.
The meaning of messages:
- A column in count(distinct …) is either string or numerical with a wide range of values (thus infinity in the statistics). For a numerical columns with more narrow domain we can use more optimizations. Distinct values are in this case approximated by a number of tuples.
- Distinct values are found by a hashing mechanism. The size of values to be counted is 16 bytes (plus 1 technical). Wide strings are represented by MD5 codes (20 bytes), but in this case the original values appeared to be shorter. The hashing buffer contains a space for 63 mln. values (17 bytes each), which makes 1 GB.
- The buffer is used several times, because the data itself are bigger than 1 GB. Each pass is reported, and the total time may be approximated basing on the time of one pass and the number of rows left.
- The last message displays the number of groups (1 group, because it is just count(distinct…) without GROUP BY part). If any pack was omitted because of rough checks, it is reported as “omitted packrow”. Partially omitted packrow may occur if there is more than one column involved (more aggregations or grouping columns), and it is the situation of a data pack omitted just for one of the aggregations.
If the column statistics indicate that there is a limited number of values, a 0/1 filter is used instead of hash table (it is faster and always one-pass):
2010-06-16 17:16:06 [1] Adding dist. column, min = 0, max = 10, dist = 11
2010-06-16 17:16:06 [1] GroupDistinctTable initialized as Filter(11)
If there is no “GroupDistinctTable initialized” message in the log in spite of using distinct modifier in a query, then probably the modifier was omitted by optimizer basing on column statistics (e.g. all values are distinct anyway).
Sorting
The sorting algorithm works as follows:
- Encode key columns and some of the output columns into memory buffers.
- Encode the rest of output columns as row numbers, if there is only one data pack to be opened.
- Sort the encoded values, rearranging also the stored output values/output row numbers. Three algorithms are used for in-memory sorting of values:
-
o Quick Sort (non-recurrent implementation),
o Counting Sort for sorting small integer domains,
o Heap Sort for sorting with small LIMIT (i.e. TOP n operation). - Quick Sort stores the sorted data on disk in large (64 MB – 2 GB) cache buffers, which are then processed by Merge Sort to create output. The other sorting algorithms are not cached.
- Create output by reading values directly from sorter, or from data pack.
- Send the result to output (MySQL client, export etc.) once a portion is ready.
In case of sorting, the main advantage of log messages is reporting of buffer sizes used. After the sorted columns are analyzed (basing on statistics), an appropriate encoding is determined and buffers are initialized. Consider the following query (TPC-H benchmark database):
select * from nation order by n_name;
There are 4 columns in the table, and we are sorting by char(25), binary collation. Statistics say that the longest string occupy 14 bytes (plus 2 bytes for its length), and an internal compression mechanism may be applied. The reported buffer size says:
2010-06-16 14:33:24 [1] Quick Sort initialized for 25 rows, 6+0+1 bytes each.
2010-06-16 14:33:24 [1] Sorted rows retrieved.
The meaning of the reported sizes is:
- 6 – a size of values to be compared (n_name column, internally compressed),
- 0 – a size of values to be stored together with the sorted ones (the rest of columns, as it is select *),
- 1 – a size of row numbers to be stored instead of original values (thus the previous value might be 0).
Total: 7 bytes x 25 rows = 175 bytes. Apparently no disk caching is needed.
Note that we needed just 1 byte to store a row number of 25-row table.
An example of sorting the join result:
select s_suppkey, s_address, n_name from supplier join nation on n_nationkey=s_nationkey
order by s_suppkey;
2010-06-16 14:47:58 [1] Quick Sort initialized for 1000000 rows, 3+42+1 bytes each.
2010-06-16 15:05:14 [1] Sorted rows retrieved.
Now we have 3 bytes for sorted integers, 42 bytes for address, 1 byte for a row number in table nation. The address is not represented as a row number, because there is a lot of data packs used, and data pack shuffling may occur (degrading the performance radically). The most of the 17 minutes the query took was used to display the result on client console. Note that the result started to be displayed just after “Quick Sort initialized…” message.
The message may not contain the “row number” part if it is not used at all:
2010-06-16 15:19:48 [1] Quick Sort initialized for 1000000 rows, 1+188 bytes each.
For large tables to be sorted, a message is displayed every 100 mln of rows:
2010-06-16 15:14:49 [1] Merge Sort initialized for 600037902 rows, 7+0+4 bytes each.
2010-06-16 15:15:15 [1] Preparing values to sort (16% done).
2010-06-16 15:15:40 [1] Preparing values to sort (33% done).
2010-06-16 15:16:06 [1] Preparing values to sort (49% done).
2010-06-16 15:16:32 [1] Preparing values to sort (66% done).
2010-06-16 15:16:58 [1] Preparing values to sort (83% done).
2010-06-16 15:17:23 [1] Preparing values to sort (99% done).
2010-06-16 15:17:26 [1] Sorted rows retrieved.
Sorting by non-binary collations may make the buffers larger than one may expect. For example, a varchar(100) column containing UTF-8 characters may be sorted as:
2010-06-16 15:36:55 [1] Quick Sort initialized for 80 rows, 2570+109 bytes each.
Sorter needs 2570 bytes to encode one wide string with less than 100-character into a comparable form. Apart of that, 109 bytes are needed to store the original value, as we need it to be displayed as a query result. Even for a large memory settings, when the RAM sorter buffer is 1 GB, we will have to cache the result on disk after passing just 0.3 mln of rows. The same remarks apply to UTF-8 (and other non-binary collations) used in GROUP BY / DISTINCT.
Union
Execution of UNION / UNION ALL also will produce log messages, indicating the algorithm used and result size:
2011-03-02 13:54:18 [1] UNION: materializing components.
2011-03-02 13:54:18 [1] UNION: generating result (84 rows).
“Materializing components” means that a temporary table must be created to store intermediate results. This step is not always needed. For UNION a message “excluding repetitions” indicates that a nontrivial operation of detecting and omitting repeated rows must be performed.
Subqueries
Independent (non-parameterized) subqueries or views generate their own messages when they are evaluated. If more than one “Initial execution plan” lines can be found in a log, the most probable reason is calculating a subquery.
Parameterized subqueries does not display messages, as they are usually executed many times (for each value of a parameter).
End of query
The following messages are displayed on the end of query:
2010-06-16 15:36:55 [1] Result: 80 rows.
2010-06-16 15:36:55 [1] Total data packs actually loaded (approx.): 1
2010-06-16 15:36:55 [1] ----------------------------------------------------------------------------
The number of data packs loaded during the whole query is a good approximation of the overall computing time, as decompression usually takes the most of processing time.