This article outlines some reasons why a particular query may be slower on
one machine than another.
The article assumes that software versions are the
same on the two machines being compared.
Computer systems are deterministic - the same inputs should produce the same
outputs. Unless the systems under comparison are exactly the same, behavior on
these systems cannot be expected to be exactly the same. Factors such as CPU
speed, memory size, memory allocation, memory usage, other concurrent database
and non-database activity can all have an affect on the relative performance of
different systems. Additionally, any differences in data including volumes and
distribution can have an effect. Some of these factors will feed into the
optimizer's decisions and may even cause a difference in execution
plan.
If all the inputs to a query running on a particular version on
different machines are the same, but the performance is different then that
would imply that those differences are introduced by something outside of
Oracle. Different hardware and settings can have a significant affect on the
relative performance on different systems.
NOTE: If the comparison of
different systems is part of a benchmarking exercise then support for this is
provided via the relevant Oracle Account Manager and not through normal Support
Services. Please contact your Account Manager for assistance with benchmarking
queries.
Potential
issues:
Issue can be divided into 2 main categories -
those that influence the optimizer's plan choice resulting in different access
paths for queries, and those that do not change the execution plan but affect
the performance of all or part of the query execution.
Plan Differences -
plan changes can be caused by a number of different issues:
- Is the Query really the same?- Any changes in a SQL statement can have an
impact on the selection of an access path for a particular query. Differences in
statistics can cause the CBO to generate different execution plans, which in
turn can lead to differences in performance of the same query However similar
the statements may be, there may be unexpected differences that result in
reduced (or increased) performance. If a select list of a query is changed to
add or modify columns or a where clause has altered predicate values, then this
can cause the optimizer to choose a different execution plan. Changes in
execution plans used for queries are likely to cause different response times
when the query is executed. Additionally, changes in data volumes in the
underlying objects can make a large difference. Remember that changes in the
predicates (ie the where clause) can result in changes in the proportion of rows
returned from an object - different values may return a larger or smaller
proportion of the rows and this in turn can have a significant affect on the
performance of queries.
- Different statistics - The Cost Based Optimizer (CBO) uses statistics to
determine the execution plan for a particular query. Differences in statistics
can cause the CBO to generate different execution plans, which in turn can lead
to differences in performance of the same query. Assuming that the same
statistics levels are gathered on the different systems and that systems have
the same parameter settings, statistic differences are attributable to data
volume and data distribution differences. Potentially, with reduced sample
sizes, sampling could produce different statistics due to chance groupings of
data that may be the result of differing loading methods etc. On 10g and above
it is recommended to use an estimate sample size of 100% (if it is possible for
this to fit within the window maintenance), even if that means that statistics
are gathered on a reduced frequency. If 100% is not feasible, try using at least
an estimate of 30%. Generally, the accuracy of the statistics overall outweighs
the day to day changes in most applications. Additionally, care should be taken
that column statistics (histograms) are gathered in the same way on the systems.
Note that the defaults for different versions of statistics gathering are not
necessarily the same, for example:
- ESTIMATE_PERCENT: defaults:
- 9i : 100%
- 10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
- 11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage)
- METHOD_OPT: defaults:
- 9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
- 10g and 11g : AUTO - DBMS_STATS decides in which columns a histogram may
help to produce a better plan.
- Note that features such as Dynamic Sampling
and System Statistics can also have an affect
on query access paths even if parameters and gathered statistics on user objects
are the same. See the information in the following sections for details.
- Different volume and distribution of data - although data differences may
appear trivial, they may be enough to change a plan from one to another if the
choice of one access path over another is close in cost terms. Even small
differences in the number of rows can easily be magnified by many orders
dependent on the selected execution plan. Scalability issues do not always mean
that a smaller data set is representative of a larger one. It is not only the
volume of data which can be an issue, but also the method used to load or insert
the data in the first place. For example data with a different distribution
(such as pre-sorted data) may produce different access paths for queries due to
statistical changes which mean that certain access paths appear more attractive
than they actually are.
- Different Block Sizes - The way that
the data itself is stored can have an impact on the access path chosen by a
query. If the block size selected at database or tablespace level are different
on the 2 systems, then this can result in different access paths being chosen.
This is because the objects in question may take up less data blocks overall
with a larger block size making certain operations seem more (or less) efficient
in terms of I/O than on the comparative system. Note that block size changes can
even affect cases where no statistics have been gathered since the default stats
used by the Cost Based Optimizer are derived in part from the block size and the
number of blocks. Any storage differences that could affect the number of blocks
taken up by an object could affect plans. Even the same data in two different
databases (or even in the same database) with different block sizes, pct
used/free or intital/next settings (leading to the use of a different number of
blocks) can lead to different default statistics.
- Different Parameter settings - there are a large number of parameters that
can have an affect on the optimizer. If parameter settings are the same then the
optimizer should produce the same execution plan given the same SQL and
statistics input. Specific things to look out for here are the settings for:
- Dynamic Sampling: If dynamic sampling is
enabled then the optimizer may sample information at runtime to determine a
better plan for a query. If data distribution or volume is different then this
sampling could return different statistics and this could clearly make a
difference to the chosen execution plan. See:
- Optimizer Features Enable: "OPTIMIZER_FEATURES_ENABLE" is an 'umbrella'
parameter that covers a large variety of version specific optimizer changes.
Changing this parameter can result in features being enabled or disabled or the
defaults they use from changing and the result of this can be execution plan
changes. See:
- Note:62337.1 Init.ora
Parameter "OPTIMIZER_FEATURES_ENABLE" Reference Note
- Parallelism : If the parallel degree available on a system is different then
the optimizer may select different execution plans for queries that were relying
on parallel execution for their access paths to be perform well. See:
- Schema Differences - Differences in indexes, table partitioning and storage
clauses can also have a big impact on plan choice. Use SQLTXPLAIN.SQL to verify
whether there are such differences. See:
- Note:215187.1
SQLTXPLAIN.SQL - Enhanced Explain Plan and related diagnostic info for one SQL
statement
- PGA and sort area size differences - the availability of large amounts of
PGA can influence plan choices since sorts or hash joins are less likely to
spill over to disk. With a large PGA, this can mean that plans that may
otherwise have involved high I/O associated with this spill over may be selected
since the reduced I/O may make them a lot more attractive to the optimizer.
- Different NLS settings may affect sort orders and mean that certain indexes
are not available to avoid sorts or would be inefficient to use. See:
- Different Bind Variable Peeking - The default value for
_OPTIM_PEEK_USER_BINDS is true in 9.0 and above.This means the first time a SQL
statement with bind variables is parsed, the optimizer will examine the value of
that bind variable and use that for creating the execution plan. If that bind is
not representative of the majority of the supplied values, then it may cause the
optimizer to choose a different execution plan than when one of these more
representative values is submitted. If that is the case then the chosen plan may
perform in a suboptimal manner. See:
- Different System Statistics - system
statistics gathered using the DBMS_STATS.GATHER_SYSTEM_STATS procedure and
stored in the SYS.AUX_STATS$ table will be used by default when the
"_OPTIMIZER_COST_MODEL" (hidden) parameter (which defaults to CHOOSE) is set to
CHOOSE or CPU. These statistics can cause access paths for queries to change
dependent upon the recorded capability of a system. Systems with different
capabilities may use different access paths for the same query. See:
- Note:470316.1 Using
Actual System Statistics (Collected CPU and IO information
- Note:149560.1 Collect
and Display System Statistics (CPU and IO) for CBO usage
- Note:153761.1 Scaling
the System to Improve CBO optimizer
Query Performance
Differences with Unchanged Execution Plan
- Is the Query really the same?- As with a changed execution plan, any changes
in a SQL statement can have an impact on performance. Adding or changing columns
in a select list, especially the addition of functions or scalar sub-queries may
add significantly to the CPU usage of a query since these will be executed for
every row returned.
- Platform - Differences in any of the technology stack underneath the
database can cause differences in performance. CPU specification, memory, I/O
subsystems, Operating System versions and patch level can all have an impact.
Price, physical size and branding are irrelevant. Platform choices are made for
a variety of reasons, price and performance are just two of them. Similar or
even dissimilar price is no guarantee of similar or dissimilar performance.
- Network - Any operation that requires pulling data across a network, for
example across a dblink, will be affected by network performance. Even same
specification networks can perform differently depending on the load on the
particular network segment. The larger the amount of data being sent across a
network, the greater impact such variances in network performance are likely to
have.
- CPU - CPU intensive operations will take longer when executed on slower
CPUs. If an application is CPU intensive, it does not matter how 'powerful' in
terms of concurrent users and throughput a system is, if this processing is not
scale-able and spreadable over the resources available, then the performance may
come down to the raw CPU speed.
- Memory - It is possible that memory constraints may mean that more sorts or
join information spills to disk incurring the inevitable I/O penalties and
reducing the performance of the queries. Additionally, operating system memory
usage and constraints due to concurrent usage of oracle and non-oracle users
should be monitored to avoid swapping and paging.
- Disk - I/O throughput can make a large difference to I/O dependent queries.
Different I/O subsystems may be tuned for massive throughput as opposed to raw
speed. Additionally, Network based storage may depend on network speed and load
on the network itself. These problems can be alleviated to a degree by buffer
cache and o/s caching. See:
- Different Block Sizes - Just as in the section above regarding plan differences, using
different block sizes can have an affect even if the access path has not
changed. Storing data in a different block size can impact the number of I/Os
required and thus change the relative level of stressing of the I/O subsystem.
- Concurrency - If the concurrent usage of a system is not the same as that it
is being compared with, then the relative performance of that system may appear
to be different. Ensure that the load profiles of the systems being compared are
the same in order to make valid comparisons.
- Schema Differences - Differences in indexes, table partitioning and storage
clauses can have a big impact on execution time. If the data is stored in a
different way then more blocks may be required to be read to return the same
information. Use SQLTXPLAIN.SQL to verify whether there are such differences.
See:
- Note:215187.1
SQLTXPLAIN.SQL - Enhanced Explain Plan and related diagnostic info for one SQL
statement
- Parallelism - When the Optimizer selects an execution plan that it believes
should be using Parallel Execution then it has to assume that sufficient
resource is actually available at execution time to actually execute the query
at the desired level of parallelism. If this resource is unavailable at
execution time, then the query may be executed at a reduced level of parallelism
or in serial, which could have a significant affect on the overall performance
of a query and of the system as a whole. See: