What is the secret behind Exadata’s higher
throughput?
Exadata
ships less data through the pipes between the storage and the database nodes
and other nodes in the RAC cluster.
Also it’s ability to do massive parallelism by running parallel processes
across all the nodes in the cluster provides it much higher level of
throughput.
It also has much bigger pipes in the cluster using Infiniband interconnect for
inter-instance data block transfers as high as 5X of fiberchannel networks.
What are the different Exadata configurations?
The
Exadata Appliance configuration comes as a Full Rack, Half Rack, Quarter Rack
or 1/8th rack.
What are the key Hardware components?
DB
ServerStorage
Server CellsHigh
Speed Infiniband SwitchCisco
Switch
What are the Key Software
Features?
Smart
Scan,Smart
Flash CacheStorage
IndexExadata
Hybrid Columnar Compression (EHCC)IORM
(I/O Resource Manager)
What is a Cell and Grid Disk?
Cell and
Grid Disk are a logical component of the physical Exadata storage. A cell or
Exadata Storage server cell is a combination of Disk Drives put together to
store user data. Each Cell Disk corresponds to a LUN (Logical Unit) which has
been formatted by the Exadata Storage Server Software. Typically, each cell has
12 disk drives mapped to it.
Grid Disks are created on top of Cell Disks and are presented to Oracle ASM as
ASM disks. Space is allocated in chunks from the outer tracks of the Cell disk
and moving inwards. One can have multiple Grid Disks per Cell disk.
What are the Parallelism instance parameter used in Exadata?
The
parameter PARALLEL_FORCE_LOCAL can be specified at the session level for a
particular job.
How do you Test performance of Exadata?
You can
use the “calibrate” commands at the cellcli command line.
What are the ways to migrate onto Exadata?
Depending
on the downtime allowed there are several options:
Oracle
DataGuardTraditional
Export/ImportTablespace
transportationGoldengate
Replication after a data restore onto Exadata.
What types of operations does Exadata “offload”?
Some of
the operations that are offloaded from the database host to the cell servers
are:
Predicate
filteringColumn
project filteringJoin
processingBackups
What is cellcli?
This is
the command line utility used to managed the cell storage.
How do you create obtain info on the Celldisks?
At the
cellcli command line you can issue the “list celldisk” command.
How would you create a grid disk?
At the
cellcli command you would need to issue the “create grididsk all ..” command.
What are the cellinit.ora and the cellip.ora
files used for?
These
files have the hostnames and the ip address of all the nodes in the cluster.
They are used to run commands on remote database and cellserver nodes from a
local host.
Example:
cat /etc/oracle/cell/network-config/cellinit.ora
ipaddress1=192.168.47.21/24
$ cat /etc/oracle/cell/network-config/cellip.ora
cell=”192.168.47.21:5042″
cell=”192.168.47.22:5042″
cell=”192.168.47.23:5042″
What operating systems does Exadata support?
Exadata
has traditionally run Oracle Linux OS. Recently, Solaris has also been made
available on this engineered system.
How can I tell the current cell software version
and also available versions to rollback to?
Useful commands
imagehistory
imageinfo
I’ve heard I should keep an eye on the temperature of the cells.
How can I do that?
Inlet Ambient Temperature across all cells can
be determined with the following command:
dcli -g cell_group -l root "ipmitool sensor
| grep 'Inlet Amb Temp'"
I see STORAGE in my explain plan but it doesn't appear that my
query is offloading based on runtime and IO throughput rates. Why?
This can happen if the query is run serially and direct reads
aren't chosen.
Offloading only happens for direct reads (reads to PGA)
PQ processes generally always use direct reads,therefore offload
Foreground processes running serially may choose direct or
buffered reads
Note that in some cases buffered reads are actually better
My offload performance was great, but all of a sudden it degraded
by over 20%. What may be causing this?
Check for a cell reboot and resultant
offline disks. Also, check for issues above the storage layer such as a change
in execution plan, which can cause this.
One of my cells rebooted. What should I
do?
Confirm if the cell is back up and if
griddisks are online in asm. Check to make sure each cell has 34 or 36
griddisks
online:
SQL> select failgroup, mode_status,
count(*) from v$asm_disk group by failgroup, mode_status;
Also consider these sources for information:
ASMCMD> lsdg (see offline_disks column
in output)
ASM instance alert log
v$asm_disk.repair_timer (seconds remaining until
drop)
v$asm_disk.mode_status
My ASM instance is hung trying to discover
disks?
Confirm hung on discovery (instead
of hang for a different reason) via ASM instance alert log. Once confirmed, try
kfod to see if it hangs as well.
If kfod hangs, check that libskgxp11.so
has the same size and checksum as libskgxpr.so in all ORACLE_HOMEs (includingCRS_HOME).
If not, relink the oracle binary with RDS.
Check cellinit.ora and cellip.ora files for
correctness
Check both master and slave diskmon trace files
I suspect network problems. How do I know
my InfiniBand fabric is okay?
Refer to MOS note: 1053498.1 & try the
following utilities:
These return fabric-wide info, so run from
one node, not all
/usr/bin/ibdiagnet –c 1000
Check for errors/warnings summary at the end of
output
Search for details in log files
(/tmp/ibdiagnet*)
/opt/oracle.cellos/SupportTools/ibdiagtools/infinicheck
This script generates a lot of IB traffic to
measure throughput, so it will impact performance while it is running
/usr/sbin/iblinkinfo.pl Shows link status on all
ports
What is IORM and what is its role in Exadata?
IORM stand for I/O Resource Manager which manages
the I/Os of multiple database on storage cell.
How we can check whether oracle best practice has
been configured on Exadata?
We can execute Exacheck and verify the best
practice setup on Exadata machine.
How many networks required in Exadata?
Public/Client Network --- For Application
Connectivity
Management Network --- For Exadata H/W management
Private Network -- For cluster inter connectivity
and Storage connectivity
Which ASM parameters are responsible for Auto
disk management in Exadata?
_AUTO_MANAGE_MAX_ONLINE_TRIES --- It controls
maximum number of attempts to make disk Online
_AUTO_MANAGE_EXADATA_DISKS --- It control auto
disk management feature
_AUTO_MANAGE_NUM_TRIES --- It
controls maximum number of attempt to perform an automatic operation
Smart Scan:
What is Smart Scan ?
The data
search and retrieval processing can be offloaded to the Exadata Storage
Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle
Database can optimize the performance of operations that perform table and
index scans by performing the scans inside Exadata Storage Server, rather than
transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing
What is predicate filtering and what all conditional operators are
supported by predicate filtering ?
Exadata
Storage Server enables predicate filtering for table scans. Rather than
returning all the rows for the database to evaluate, Exadata Storage Server
returns only the rows that match the filter condition.
In addition, many common SQL functions can be evaluated by Exadata
Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular
version is contained in V$SQLFN_METADATA.
SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering
include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN,
[NOT] IN, EXISTS, IS OF type, NOT, AND, OR.
How to cache an object in
smart cache?
DBA can cache an
object by setting CELL_FLASH_CACHE setting to KEEP. The default value of this
setting is DEFAULT.
What are Exadata Cell
node Unique Features?
Cell node unique
feature:
Smart Scans
Hybrid Columnar Compression
Storage Indexes
Flash Cache
ExadataI/O Resource
Management in Multi-Database Environment
What is the location of
Cell Nodes Logs and Traces?
Cell Nodes Logs and Traces
$ADR_BASE/diag/asm/cell/`hostname`/trace/alert.log
$ADR_BASE/diag/asm/cell/`hostname`/trace/ms-odl.*
$ADR_BASE/diag/asm/cell/`hostname`/trace/svtrc__0.trc
-- ps -ef | grep "cellsrv 100"
$ADR_BASE/diag/asm/cell/`hostname`/incident/*
/var/log/messages*, dmesg
/var/log/sa/*
/var/log/cellos/*
How to disable to
Smart Scan?
Cell_offload_processing=false
(default true)
General Cell related
Database views?
Cell related Database
view's:
sql> select * from
sys.GV_$CELL_STATE;
sql> select * from
sys.GV_$CELL;
sql> select * from
sys.GV_$CELL_THREAD_HISTORY;
sql> select * from
sys.GV_$CELL_REQUEST_TOTALS;
sql> select * from
sys.GV_$CELL_CONFIG;
What is Bloom Filter ?
Offloaded
joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient
probabilistic data structure that is used to test whether an element is a
member of a set. The properties of a Bloom filter make is a very efficient way
of determining which values are not in a set. This is very useful for
processing join conditions where a significant proportion of the data does not
fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel
join operations. When two tables are joined via a hash join, the first table
(typically the smaller table) is scanned and the rows that satisfy the WHERE
clause predicates (for that table) are used to create a hash table. During the
hash table creation, a Bloom filter bit string is also created based on the
join column. The bit string is then sent as an additional predicate to the
second table scan. After the WHERE clause predicates relating to the second
table are applied, the resulting rows are tested using the Bloom filter. Any
rows rejected by the Bloom filter must fail the join criteria and are
discarded. Any rows that match using the Bloom filter are sent to the hash
join.
With Exadata, the Bloom filter is passed to the storage servers as an
additional predicate. Processing the Bloom filter inside Exadata Storage Server
can reduce the amount of data transported to the database server to process a
join, which in turn can speed up query performance.
Will bloom filter work with
all type of joins?
No, bloom
filter works only with hash joins.
How to Identify a Bloom Filter
in an Execution plan:
You can identify a bloom filter in a plan when
you see :BF0000 in the Name column of the execution plan.
How to disable Bloom
Fliter?
To disable the feature, the initialization
parameter _bloom_pruning_enabled must be set to FALSE.
Does Smart Scan work with
encrypted and compressed data ?
Yes,
smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and
encrypted columns. For encrypted tablespaces, Exadata Storage Server can
decrypt blocks and return the decrypted blocks to Oracle Database, or it can
perform row and column filtering on encrypted data. Significant CPU savings can
be made within the database server by offloading the CPU-intensive decryption
task to Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so
that column projection and row filtering can be executed along with
decompression at the storage level to save CPU cycles on the database servers.
What are prerequisites for
Smart Scan to occur ?
There are
4 basic requirements that must be met for Smart Scans to occur :
1) There must be a full scan of an object; that is, full table
scans, fast full index scans and fast full bitmap index scans.
2)The scan must use Oracle’s Direct Path Read mechanism. Direct path
reads are generally used by Oracle when reading directly into PGA memory (as
opposed to into the buffer cache).
- Direct-path reads are automatically used for
parallel queries
- Direct-path reads may be used for serial
queries
o Not
used by default for serial small table scans
o Use
_serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored
on Exadata cells. The disk group must also have the disk group attribute
settings as below :-
'compatible.rdbms' = 11.2.0.0.0' (or
later)
'compatible.asm' = 11.2.0.0.0' (or
later)
' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables
Smart Scan.
The default value of the parameter is TRUE, meaning that Smart
Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and
the database uses Exadata storage to serve data blocks similar to traditional
storage.
What are the situations
that prevent Smart Scan from happening ?
Scan
on a clustered tableScan
on an index-organized tableFast
full scan on a compressed indexFast
full scan on a reverse key indexesThe
table has row-level dependency tracking enabledThe
ORA_ROWSCN pseudocolumn is being fetchedThe
optimizer wants the scan to return rows in ROWID orderThe
command is CREATE INDEX using NOSORTA
LOB or LONG column is being selected or queriedA
SELECT .. VERSIONS flashback query is being executedTo
evaluate a predicate based on a virtual columnMore
than 255 columns are referenced in the queryThe
data is encrypted and cell-based decryption is disabledIf
table has CACHE propertyIf
_serial_direct_read is turned off (NEVER)If
the partitioned object's size is less than _small_table_threshold.Offloading
is not done on serial DMLs.Serial
Direct read is not applied for these cases (and thus no smart scan) : - sql from a table function - sql from dbms_sql - sql from plsql triggerSmart
scan is not enabled for the sql in plsql when plsql package is called by
'CALL plsql‘Does
not work on serial queries issued from shared serversThe
SQL in question must not be quarantined.High
CPU usage on the Storage Cells
What does it mean by sql is quarantined ?
When a
cellsrv process (on an exadata storage server cell) terminates and, if the
thread that caused the death (or rather encountered the death) was performing
smart scan, we determine the sql step that the thread was performing and make a
note of it in a file on the cell. When the cellsrv restarts, the plan step is
considered quarantined. When there are 3 quarantines, smart scan for a database
gets quarantined. When smart scan gets quarantined, cellsrv will not perform
any predicate evaluation/filtering. It will issue parallel IOs, and return the
blocks as is.
How to verify from the explain plan that smart scan is being used
?
Unfortunately,
the normal execution plan output produced by the DBMS_XPLAN package will not
show you whether a Smart Scan is used or not.
If you explain plan output shows a "TABLE ACCESS STORAGE FULL"
operation and the predicate section shows a "storage()" predicate
associated to the plan. Both of these characteristics indicate that a Smart Scan
was possible, but neither provides a definitive verification.
How to Verify from 10046 trace that Smart Scan is happening ?
One of
the most straightforward ways to determine whether a Smart Scan is being used
by the query, is to enable a 10046
trace on the statement in question. Tracing is a fairly foolproof way to verify
whether a Smart Scan was used or not. If Smart Scan was used, there will be 'CELL SMART TABLE SCAN' or 'CELL SMART INDEX SCAN' events in the trace file.
Are there any cases where smart
scan is considered as an inefficient method for data retrieval ?
1) Yes,
when the data has lots of chained rows, it is possible for the cell not able to
determine the locale of the chained row and then the compute node will have to
do additional single block reads to fulfill the request.
2) Frequently updated tables, as in order to do smart scan a checkpoint needs
to be done and frequent ones slow down the queries.
For example, if Oracle notices that a block is “newer” than
the current query, the process of finding an age-appropriate version of the
block is left for the database layer to deal with. This
effectively pauses the Smart Scan processing while the
database does its traditional read consistency processing.
What are Storage Indexes ?
Storage
Indexes is a very useful Exadata feature which is transparent to database and
are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional
B-Tree or bitmapped indexes. They are not capable of identifying a set of
records that has a certain value in a given column. Rather, they are a feature
of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units,
which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are
performed, the storage software can check the predicates against the Storage
Index metadata (maximum and minimum values) before doing the requested I/O. Any
storage region that cannot possibly have a matching row is skipped.
Since the indexes are in-memory on the cell server, if a cell server is
restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given
column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT
statement, or during other direct-path loads.
Under what conditions
Storage Index would be used ?
In order
for a storage index to be used, a query must include or make use of all the
following :
1. Storage Indexes can only be used with statements that do Smart Scans. The
main requirements are that the optimizer must choose a full scan and that the
I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE
clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <,
>, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL
What conditions prevent the
use of Storage Indexes ?
1.
Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison
operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter
"_smu_debug_mode=134217728"
or alternatively
"_enable_minscn_cr"=false
What are the statistics
related to Storage Indexes ?
There is
only one database statistic related to storage indexes. The statistic, 'Cell
Physical IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been
avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given
SQL statement in order to
determine whether Storage Indexes were used on that particular statement.
Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';
What is Smart Scan ?
The data
search and retrieval processing can be offloaded to the Exadata Storage
Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle
Database can optimize the performance of operations that perform table and
index scans by performing the scans inside Exadata Storage Server, rather than
transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing
What is predicate filtering
and what all conditional operators are supported by predicate filtering ?
Exadata
Storage Server enables predicate filtering for table scans. Rather than
returning all the rows for the database to evaluate, Exadata Storage Server
returns only the rows that match the filter condition.
In addition, many common SQL functions can be evaluated by Exadata
Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular
version is contained in V$SQLFN_METADATA.
SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering
include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN,
[NOT] IN, EXISTS, IS OF type, NOT, AND, OR.
What is Bloom Filter ?
Offloaded
joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient
probabilistic data structure that is used to test whether an element is a
member of a set. The properties of a Bloom filter make is a very efficient way
of determining which values are not in a set. This is very useful for
processing join conditions where a significant proportion of the data does not
fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel
join operations. When two tables are joined via a hash join, the first table
(typically the smaller table) is scanned and the rows that satisfy the WHERE
clause predicates (for that table) are used to create a hash table. During the
hash table creation, a Bloom filter bit string is also created based on the
join column. The bit string is then sent as an additional predicate to the
second table scan. After the WHERE clause predicates relating to the second
table are applied, the resulting rows are tested using the Bloom filter. Any
rows rejected by the Bloom filter must fail the join criteria and are
discarded. Any rows that match using the Bloom filter are sent to the hash
join.
With Exadata, the Bloom filter is passed to the storage servers as an
additional predicate. Processing the Bloom filter inside Exadata Storage Server
can reduce the amount of data transported to the database server to process a
join, which in turn can speed up query performance.
Will bloom filter work with
all type of joins?
No, bloom
filter works only with hash joins.
Does Smart Scan work with
encrypted and compressed data ?
Yes,
smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and
encrypted columns. For encrypted tablespaces, Exadata Storage Server can
decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform
row and column filtering on encrypted data. Significant CPU savings can be made
within the database server by offloading the CPU-intensive decryption task to
Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so
that column projection and row filtering can be executed along with
decompression at the storage level to save CPU cycles on the database servers.
What are prerequisites for
Smart Scan to occur ?
There are
4 basic requirements that must be met for Smart Scans to occur :
1) There must be a full scan of an object; that is, full table
scans, fast full index scans and fast full bitmap index scans.
2)The scan must use Oracle’s Direct Path Read mechanism. Direct path
reads are generally used by Oracle when reading directly into PGA memory (as
opposed to into the buffer cache).
- Direct-path reads are automatically used for
parallel queries
- Direct-path reads may be used for serial
queries
o Not
used by default for serial small table scans
o Use
_serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored
on Exadata cells. The disk group must also have the disk group attribute
settings as below :-
'compatible.rdbms' = 11.2.0.0.0' (or
later)
'compatible.asm' = 11.2.0.0.0' (or
later)
' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables
Smart Scan.
The default value of the parameter is TRUE, meaning that Smart
Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and
the database uses Exadata storage to serve data blocks similar to traditional
storage.
What are the situations
that prevent Smart Scan from happening ?
Scan
on a clustered tableScan
on an index-organized tableFast
full scan on a compressed indexFast
full scan on a reverse key indexesThe
table has row-level dependency tracking enabledThe
ORA_ROWSCN pseudocolumn is being fetchedThe
optimizer wants the scan to return rows in ROWID orderThe
command is CREATE INDEX using NOSORTA
LOB or LONG column is being selected or queriedA
SELECT .. VERSIONS flashback query is being executedTo
evaluate a predicate based on a virtual columnMore
than 255 columns are referenced in the queryThe
data is encrypted and cell-based decryption is disabledIf
table has CACHE propertyIf
_serial_direct_read is turned off (NEVER)If
the partitioned object's size is less than _small_table_threshold.Offloading
is not done on serial DMLs.Serial
Direct read is not applied for these cases (and thus no smart scan) : - sql from a table function - sql from dbms_sql - sql from plsql triggerSmart
scan is not enabled for the sql in plsql when plsql package is called by
'CALL plsql‘Does
not work on serial queries issued from shared serversThe
SQL in question must not be quarantined.High
CPU usage on the Storage Cells
·
What is offloading and how it works?
It refers to the fact that part of the
traditional SQL processing done by the database can be “offloaded” from the
database layer to the storage layer
The primary benefit of Offloading is the
reduction in the volume of data that must be returned to the database server.
This is one of the major bottlenecks of most large databases.
What does it mean by sql is
quarantined ?
When a
cellsrv process (on an exadata storage server cell) terminates and, if the
thread that caused the death (or rather encountered the death) was performing
smart scan, we determine the sql step that the thread was performing and make a
note of it in a file on the cell. When the cellsrv restarts, the plan step is
considered quarantined. When there are 3 quarantines, smart scan for a database
gets quarantined. When smart scan gets quarantined, cellsrv will not perform
any predicate evaluation/filtering. It will issue parallel IOs, and return the
blocks as is.
How to verify from the
explain plan that smart scan is being used ?
Unfortunately,
the normal execution plan output produced by the DBMS_XPLAN package will not
show you whether a Smart Scan is used or not.
If you explain plan output shows a "TABLE ACCESS STORAGE FULL"
operation and the predicate section shows a "storage()" predicate
associated to the plan. Both of these characteristics indicate that a Smart
Scan was possible, but neither provides a definitive verification.
How to Verify from 10046
trace that Smart Scan is happening ?
One of
the most straightforward ways to determine whether a Smart Scan is being used
by the query, is to enable a 10046
trace on the statement in question. Tracing is a fairly foolproof way to verify
whether a Smart Scan was used or not. If Smart Scan was used, there will be 'CELL SMART TABLE SCAN' or 'CELL SMART INDEX SCAN' events in the trace file.
Are there any cases where
smart scan is considered as an inefficient method for data retrieval ?
1) Yes,
when the data has lots of chained rows, it is possible for the cell not able to
determine the locale of the chained row and then the compute node will have to
do additional single block reads to fulfill the request.
2) Frequently updated tables, as in order to do smart scan a checkpoint needs
to be done and frequent ones slow down the queries.
For example, if Oracle notices that a block is “newer” than
the current query, the process of finding an age-appropriate version of the
block is left for the database layer to deal with. This
effectively pauses the Smart Scan processing while the
database does its traditional read consistency processing.
What are Storage Indexes ?
Storage
Indexes is a very useful Exadata feature which is transparent to database and
are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional
B-Tree or bitmapped indexes. They are not capable of identifying a set of
records that has a certain value in a given column. Rather, they are a feature
of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units,
which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are
performed, the storage software can check the predicates against the Storage
Index metadata (maximum and minimum values) before doing the requested I/O. Any
storage region that cannot possibly have a matching row is skipped.
Since the indexes are in-memory on the cell server, if a cell server is
restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given
column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT
statement, or during other direct-path loads.
Under what conditions
Storage Index would be used ?
In order
for a storage index to be used, a query must include or make use of all the
following :
1. Storage Indexes can only be used with statements that do Smart Scans. The
main requirements are that the optimizer must choose a full scan and that the
I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE
clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <,
>, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL
What conditions prevent the
use of Storage Indexes ?
1.
Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison
operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter
"_smu_debug_mode=134217728"
or alternatively
"_enable_minscn_cr"=false
What are the statistics
related to Storage Indexes ?
There is
only one database statistic related to storage indexes. The statistic, 'Cell Physical
IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been
avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given
SQL statement in order to
determine whether Storage Indexes were used on that particular statement.
Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';
How to diagnose Smart Scan
causing wrong results ?
Refer Note 1260804.1 - Exadata: How to diagnose smart scan and wrong results.
HardDisk Scrubbing
1) How to Verify Harddisk scrub Activity enabled
on Exadata ?
Answer:
cellcli –e list cell attributes name,hardDiskScrubInterval
2) How to Verity Harddisk Scrub Activity started on Exadata Cell
node ?
Answer : On Cell node root# cd $CELLTRACE
In the cell alert log.
=>Scrubbing started:
“Begin scrubbing CellDisk: CD_00_celadm07” in cell alert.log
=>Scrubbing completed:
“Finished scrubbing CellDisk: CD_00_celadm07, scrubbed blocks (1MB): 1488,
found bad blocks: 0” in cell alert.log
3) How to Check Exadata Harddisk Scrub Schedule time on Exadata
cell node ?
Answer :
cellcli –e list cell attributes name,hardDiskScrubInterval
Exmaple output : exatestcel12 biweekly
4) How to Stop Exadata Harddisk Scrub Activity ?
Answer :
=>Stop scrubbing job: (will stop the current running scrubbing job as well)
cellcli –e alter cell hardDiskScrubInterval=none
5) How to Alter Harddisk Scrub Activity frequency on Exadata Cell
node ?
Answer :
cellcli –e alter cell hardDiskScrubInterval=daily
cellcli –e alter cell hardDiskScrubInterval=weekly
cellcli –e alter cell hardDiskScrubInterval=biweekly
6) How to Alter the Harddisk Scrib Activity Specific Date and time
:
Answer :
cellcli –e alter cell hardDiskScrubStartTime=‘2013-08-07T21:19:22-07:00’
Hybrid Columnar Compression
What are the types of compression available with
EHCC ?
EHCC is
available with following types:
1. Warehouse Compression
Within warehouse compression there are two subtypes:
a. Query HIGH
b. Query LOW
2. Archive Compression
Within archive compression there are two subtypes:
a. Archive HIGH
b. Archive LOW
Can CTAS be used to create EHCC tables ?
Yes, CTAS can be used to create EHCC tables.
Example :
SQL> create table example compress for query high as select * from example2;
In this example "compress for query high" is specific to EHCC
that will create table with compression enabled.
How to convert existing / non-EHCC table to use
EHCC ?
Existing
/ non-EHCC tables can be converted to EHCC using alter table commands as below.
Example:
SQL >
alter table example move compress for query high;
SQL > alter table example compress for query high;
In first
case as move is specifed, existing data will be moved to new compression
format.
In second case, new data loaded in existing table will get EHCC compression
keeping existing data as it is.
How to disable EHCC compression on a table ?
To
disable or uncompress EHCC table use alert table command as below:
SQL>
alter table example nocompress;
SQL> alter table example move nocompress;
Can we use DBMS_REDEFINITION
online redefinition package for EHCC tables ?
Yes.
DBMS_REDEFINITION does support EHCC tables and can be used to move data in
online mode.
How does update works for EHCC table ?
When a
row is updated in EHCC table, in the background we do a delete followed by
insert of the row.
While doing so there are two major changes happens at row level:
1. ROWID changes for updated row as we have done a delete followed by insert.
2. Updated row comes out of the EHCC compression and is stored into OLTP
compression format.
What is Compression Unit?
Hybrid
Columnar Compression uses a logical construct called a compression unit (CU) to
store a set of rows. When you load data into a table, the database stores
groups of rows in columnar format, with the values for each column stored and
compressed together. After the database has compressed the column data for a
set of rows, the database fits the data into the compression unit.
How to check compression format for a particular
row ?
Use
dbms_compression.get_compression_type() to get the compression type at row
level.
SQL>
desc dbms_compression
FUNCTION GET_COMPRESSION_TYPE RETURNS NUMBER
Argument
Name
Type
In/Out Default?
------------------------------ ----------------------- ------ --------
OWNNAME
VARCHAR2
IN
TABNAME
VARCHAR2
IN
ROW_ID
ROWID
IN
Returns the compression type for the input rowid. The constants defined for
compression type :
COMP_NOCOMPRESS = 1
COMP_FOR_OLTP = 2
COMP_FOR_QUERY_HIGH = 4
COMP_FOR_QUERY_LOW = 8
COMP_FOR_ARCHIVE_HIGH = 16
COMP_FOR_ARCHIVE_LOW = 32
Example :
SQL>
select DBMS_COMPRESSION.GET_COMPRESSION_TYPE
('SCOTT','EXAMPLE','AAASi7AAFAAABR6AAA') FROM DUAL;
How to get estimate of compression ratios for
EHCC compression types ?
Use
dbms_compression.get_compression_ratio().
desc
dbms_compression
PROCEDURE GET_COMPRESSION_RATIO
Argument
Name
Type
In/Out Default?
------------------------------ ----------------------- ------ --------
SCRATCHTBSNAME
VARCHAR2
IN
OWNNAME
VARCHAR2
IN
TABNAME
VARCHAR2
IN
PARTNAME
VARCHAR2
IN
COMPTYPE
NUMBER
IN
BLKCNT_CMP
BINARY_INTEGER OUT
BLKCNT_UNCMP
BINARY_INTEGER OUT
ROW_CMP
BINARY_INTEGER OUT
ROW_UNCMP
BINARY_INTEGER OUT
CMP_RATIO
NUMBER
OUT
COMPTYPE_STR
VARCHAR2
OUT
SUBSET_NUMROWS
NUMBER
IN DEFAULT
How to enable tracing for EHCC ?
Following
events can be used to enable tracing for EHCC.
event="trace[ADVCMP_COMP]
disk low“ (Tracing for EHCC loading)
event="trace[ADVCMP_DECOMP] disk low“ (Tracing for EHCC queries)
event="trace[ADVCMP_MAIN] disk low“ (Tracing for everything)
Where
level can be one of the following:
lowest, low, medium, high, highest
How to query stats about CUs ?
Use below
SQL to capture stats about CUs.
SQL> select name, value from v$sysstat where name like '%cell CU%';
NAME
VALUE
---------------------------------------------------------------- ----------
cell CUs sent
uncompressed
0
cell CUs sent
compressed
0
cell CUs sent head
piece
0
cell CUs processed for
uncompressed
0
cell CUs processed for
compressed
0
SQL> select name, value from v$sysstat where name
like 'CC%';
NAME
VALUE
---------------------------------------------------------------- ----------
CC CUs
Decompressed
22591
CC Query High CUs
Decompressed
11466
CC Compressed Length Decompressed
807050919
CC Decompressed Length
Decompressed
8367891105
CC Columns
Decompressed
68403
CC Total Columns for
Decompression
271092
CC Total Rows for
Decompression
113019839
CC Pieces Buffered for
Decompression
22698
CC Total Pieces for
Decompression
22698
CC Scan CUs
Decompressed
70
CC Turbo Scan CUs
Decompressed
11126
What storage types are supported for EHCC or HCC
?
Apart
from Exadata, HCC is supported on Pillar Axiom and Sun ZFS Storage Appliance
(ZFSSA).
Can we specify different compression type for
each partition in a table ?
Yes.
Compression can be specified at the level of a segment and hence EHCC can be
specified at partition level.
For Example:
SQL>
create table orders (cid, pid, sid, price, discount, odate)
partition by range (cid)
partition p1 values less than (100000) nocompress,
partition p2 values less than (200000) compress for archive low,
partition p3 values less than (300000) compress for query high,
partition p4 values less than (maxvalue) compress for query low)
as select * from prev_orders;
Does EHCC support function based indexes ?
Yes, all
index types are supported including function based indexes.
Is EHCC suitable for single row access methods ?
Yes, EHCC
compression type like Query HIGH and Query LOW are optimized for query on
single row.
Single row access might be expensive for Archive HIGH and Archive LOW as
compared to Query compression but still it can be used.
Does EHCC uncompress CUs to access the data?
Will this result in multiple I/Os for a single row access ?
Yes, to
access a row we read the entire CU that could be comprised of multiple blocks.
This decompression happens at cell side so we use the CPU and memory available
at Cell side.
Having said this, EHCC query architecture is optimized to increase scan query
performance by taking advantage of fewer number of blocks reads.
Can we use EHCC on normal server with SUN ZFS
filesystem ?
Yes, EHCC
can be used with any server with storage as ZFS.
While performing the compression, it checks for certain libraries and if those
libraries fall into supported storage type which is Exadata, SUN ZFSSA and
Pillar, EHCC compression is achieved.
What are supported RDBMS version for EHCC ?
All
11.2.X.X and 12c RDBMS version supports EHCC.
If table is already OLTP compression on, will I
be able to enable EHCC on the same table ?
Yes, It
can be coverted to EHCC compression.
Use "alter table [table name] move compress for [compression type]"
to make all existing rows to move to EHCC".
But at any time you can have only one compression type enabled at the table
level.
Does EHCC supports BLOB data type ?
EHCC does
not support BLOB. As BLOB is unstructured data, have to use secure files
instead.
Does EHCC supports LONG data type ?
No, LONG
datatype is not supported for EHCC.
Can both EHCC and OLTP compression will be on
for a single table ?
Only one
compression type can be enabled at table level. In case EHCC is enabled on
table level and there are single row updates on the table then those updated
rows goes into OLTP compressed blocks. With this we can have mix of EHCC
compressed and OLTP compressed rows in a table.
In a warehouse with rows being added to an EHCC
table via ETL, can the new rows be converted to EHCC, or the whole table to be
periodically recompressed ?
If ETL is
loading with direct load, the new rows will be added to EHCC compression else
it will go to OLTP compression.
In case new rows goes to OLTP compression, you will have to do "alter
table move" to move the data to EHCC compression periodically.
Are there any application code changes if we
convert tables to EHCC ?
No. there
are no application level code changes required to read or update data in EHCC
tables.
Can we join a regular table with EHCC table in a
query ?
Yes, we
can join between EHCC and non EHCC table.
Which view to check the compression type ?
Use
dba_tables to check the compression type.
Example :
SQL>
Select owner, table_name, compress_for from sys.dba_tables where compression =
'ENABLED' order by 1,2;
OWNER
TABLE_NAME
COMPRESS_FOR
------------------------------ ------------------------------ ------------
INT_AGIIE
I_POSITIONS
BASIC
INT_AGIIE
X_INT_AGIIE_STATUS
BASIC
SCOTT
AK
QUERY HIGH
SCOTT
EHCC
QUERY HIGH
SCOTT
EHCC_W
QUERY LOW
SCOTT
EXAMPLE
ARCHIVE LOW
SCOTT
HCCTEST
QUERY LOW
SCOTT
OAVAL
QUERY HIGH
SCOTT
TAB1
QUERY LOW
SYS
AJAY
QUERY LOW
SYS
EHCC1
QUERY HIGH
For
partition tables, use DBA_TAB_PARTITOONS as well, since compression can be
different at the partition level.
What's the difference between query low and
query high ?
Query
High gives around 10X compression ratio where as Query Log gives around 6X
compression ratio.
Query performance is better in case of Query High as compared to query Low.
Load time is better in case of Query Low as compared to Query High.
Does Oracle Golden Gate supports EHCC from
source to target replication ?
Oracle
GoldenGate fully supports basic, advanced, and EHCC compressed tables in
integrated capture mode only. In classic capture mode, Oracle GoldenGate can
deliver to,
but not capture from, regular and EHCC compressed tables on Oracle Exadata.
If one table is marked as query high, can it be
moved to archive high or any other EHCC method directly ?
Yes, this
can be done using "alter table" command. If move is specified with
alter table command, existing data will be moved to new compression format
where as if move is no specified, new compression will be effective from
subsequent loads.
What is the impact on performance if taking a
table out of EHCC? specially if the table is large ?
To take a
table out of EHCC you need to use "alter table .. move nocompress".
When you do this, the table is
locked and offline for DML. To do this online, one can use online redefinition
package (DBMS_REDEFINITION). Apart from this there is no other performance
impact.
If order to improve performance of this activity one should run alter table
command in parallel.
In EHCC we may have a table to have mix of HCC
and OLTP compression. Does this create extra overhead with fetching data when
range scan is used ?
Yes, it
can cause extra over head if data is spread across multiple compression blocks.
To avoid this, one should move the table to EHCC compression type regularly to
move the OLTP compressed rows to EHCC.
Is the row migrated out of CU when update occurs
causing row migration ?
Yes, row
goes out of CU and is stored in OLTP compressed block.
Default segment size is 8MB, can EHCC
compression make it less than 8MB ?
No, it
does not make the segment size smaller, rather it accommodates more data
within 8MB segment size.
Flash Cache Compression
How to enable Flash Cache compression
1.
Flush the dirty blocks from Flash Disks to the GridDisks
CellCLI> ALTER FLASHCACHE all flush
To
monitor the flush operation, open a new cell cellcli session and run command:
CellCLI> LIST CELLDISK attributes name, flushstatus, flusherror
While
flush operation is working, the attribute flushstatus will report working
CellCLI> LIST CELLDISK attributes name, flushstatus, flusherror
CD_00_celadm08
CD_01_celadm08
CD_02_celadm08
CD_03_celadm08
CD_04_celadm08
CD_05_celadm08
CD_06_celadm08
CD_07_celadm08
CD_08_celadm08
CD_09_celadm08
CD_10_celadm08
CD_11_celadm08
FD_00_celadm08 working
FD_01_celadm08 working
FD_02_celadm08 working
FD_03_celadm08 working
FD_04_celadm08 working
FD_05_celadm08 working
FD_06_celadm08 working
FD_07_celadm08 working
FD_08_celadm08 working
FD_09_celadm08 working
FD_10_celadm08 working
FD_11_celadm08 working
FD_12_celadm08 working
FD_13_celadm08 working
FD_14_celadm08 working
FD_15_celadm08 working
To validate
that flush operation was successful use any of following methods, opening a new
cellcli session
Attribute
flushstatus reports complete
CellCLI>
LIST CELLDISK attributes name, flushstatus, flusherror
CD_00_celadm08
CD_01_celadm08
CD_02_celadm08
CD_03_celadm08
CD_04_celadm08
CD_05_celadm08
CD_06_celadm08
CD_07_celadm08
CD_08_celadm08
CD_09_celadm08
CD_10_celadm08
CD_11_celadm08
FD_00_celadm08 complete
FD_01_celadm08 complete
FD_02_celadm08 complete
FD_03_celadm08 complete
FD_04_celadm08 complete
FD_05_celadm08 complete
FD_06_celadm08 complete
FD_07_celadm08 complete
FD_08_celadm08 complete
FD_09_celadm08 complete
FD_10_celadm08 complete
FD_11_celadm08 complete
FD_12_celadm08 complete
FD_13_celadm08 complete
FD_14_celadm08 complete
FD_15_celadm08 complete
Verify
cachedby attribute is null. If a FlashDisk is reported, flush is still in
progress
CellCLI> LIST GRIDDISK attributes name,cachedby
CellCLI> list griddisk attributes name,cachedby
DATA_DMORL_CD_00_celadm08
DATA_DMORL_CD_01_celadm08
DATA_DMORL_CD_02_celadm08
DATA_DMORL_CD_03_celadm08
DATA_DMORL_CD_04_celadm08
DATA_DMORL_CD_05_celadm08
DATA_DMORL_CD_06_celadm08
DATA_DMORL_CD_07_celadm08
DATA_DMORL_CD_08_celadm08
DATA_DMORL_CD_09_celadm08
DATA_DMORL_CD_10_celadm08
DATA_DMORL_CD_11_celadm08
DBFS_DG_CD_02_celadm08
DBFS_DG_CD_03_celadm08
DBFS_DG_CD_04_celadm08
DBFS_DG_CD_05_celadm08
DBFS_DG_CD_06_celadm08
DBFS_DG_CD_07_celadm08
DBFS_DG_CD_08_celadm08
DBFS_DG_CD_09_celadm08
DBFS_DG_CD_10_celadm08
DBFS_DG_CD_11_celadm08
RECO_DMORL_CD_00_celadm08
RECO_DMORL_CD_01_celadm08
RECO_DMORL_CD_02_celadm08
RECO_DMORL_CD_03_celadm08
RECO_DMORL_CD_04_celadm08
RECO_DMORL_CD_05_celadm08
RECO_DMORL_CD_06_celadm08
RECO_DMORL_CD_07_celadm08
RECO_DMORL_CD_08_celadm08
RECO_DMORL_CD_09_celadm08
RECO_DMORL_CD_10_celadm08
RECO_DMORL_CD_11_celadm08
The
number of dirty buffers (unflushed) will be 0 after flush is complete
CellCLI> LIST METRICCURRENT FC_BY_DIRTY
CellCLI> LIST METRICCURRENT FC_BY_DIRTY
FC_BY_DIRTY FLASHCACHE
0.000 MB
2. Drop
the FlashCache
CellCLI> DROP FLASHCACHE all
3. Drop
the FlashLog
CellCLI> DROP FLASHLOG all
4. Drop
the celldisks on the FlashDisks
CellCLI> DROP CELLDISK all flashdisk
5. Enable
flash cache compression using the following commands, based on the system:
If
Exadata Storage Cell Server image is 11.2.3.3.0 and the Storage Cell is X3-2:
CellCLI> ALTER CELL flashCacheCompX3Support=true
CellCLI> ALTER CELL flashCacheCompress=true
If
Exadata Storage Cell server image is 11.2.3.3.1 or higher:
CellCLI> ALTER CELL flashCacheCompress=true
6.
Validate the PhysicalDisk/Luns have the new size (Reference Table 2. PhysicalDisk size for FlashDisks)
CellCL>LIST PHYSICALDISK attributes name,physicalSize where
disktype=flashdisk
|
AURA 2.0 / F40/X3
|
AURA 2.1 / F80 / X4
|
|
Compression OFF
|
Compression ON
|
Compression OFF
|
Compression ON
|
Physical Disk Size
|
93.13 G
|
186.26 G
|
186.26 G
|
372.53 G
|
Flash Cache Size
|
1489 G
|
2979 G
|
2979 G
|
5959 G
|
Table 1. PhysicalDisk size for FlashDisks
7.
Create the celldisks on the FlashDisks
CellCLI> CREATE CELLDISK all flashdisk
Validate
all sixteen FlashDisks are present
CellCLI> LIST CELLDISK where disktype=flashdisk
8. Create
the FlashLog
CellCLI> CREATE FLASHLOG all
Validate
FlashLog is in normal mode
CellCLI> LIST FLASHLOG detail
9.
Create the FlashCache
CellCLI> CREATE FLASHCACHE all
Validate
FlashCache is in normal mode
CellCLI > LIST FLASHCACHE detail
How to disable Flash Cache compression
1. Flush the dirty blocks from Flash Disks to the GridDisks
CellCLI> ALTER FLASHCACHE all flush
To
validate that flush operation was successful use any of following methods,
opening a new cellcli session
Verify
cachedby attribute is null. If a FlashDisk is reported, flush is still in
progress
CellCLI> LIST GRIDDISK attributes name,cachedby
The
number of dirty buffers (unflushed) will be 0 after flush is complete
CellCLI> LIST METRICCURRENT fc_by_dirty
2. Drop
the FlashCache
CellCLI> DROP FLASHCACHE all
3. Drop
the FlashLog
CellCLI> DROP FLASHLOG all
4. Drop
the celldisks on the FlashDisks
CellCLI> DROP CELLDISK all flashdisk
5.
Disable Flash Cache Compression
If
Exadata Storage Cell Server image is 11.2.3.3.0 and the Storage Cell is X3-2:
CellCLI> ALTER CELL flashCacheCompX3Support=true
CellCLI> ALTER CELL flashCacheCompress=false
If
Exadata Storage Cell server image is 11.2.3.3.1 or higher:
CellCLI> ALTER CELL flashCacheCompress=false
Validate
it has been disabled by checking cell attributes:
CellCLI> LIST CELL attributes name,flashCacheCompress
Correct
values are FALSE or a null string.
6.
Validate the Physical Disks have been shrunk
CellCLI> LIST PHYSICALDISK attributes name,physicalSize,status
where disktype=flashdisk
The
status should be normal.
Use the
following table to validate the expected size when Compression is OFF
|
|
AURA 2.0 / F40/X3
|
AURA 2.1 / F80 / X4
|
|
Compression OFF
|
Compression ON
|
Compression OFF
|
Compression ON
|
Physical Disk Size
|
93.13 G
|
186.26 G
|
186.26 G
|
372.53 G
|
Flash Cache Size
|
1489 G
|
2979 G
|
2979 G
|
5959 G
|
Table 2. PhysicalDisk size for FlashDisks
7.
Create the celldisks on the FlashDisks
CellCLI> create celldisk all flashdisk
Validate
all sixteen FlashDisks are present
CellCLI> list celldisk where disktype=flashdisk
8. Create
the FlashLog
CellCLI> create flashlog all
Validate
FlashLog is in normal mode
CellCLI> list flashlog detail
9.
Create the FlashCache
CellCLI> create flashcache all
Validate
FlashCache is in normal mode
CellCLI > list flashcache detail
Scrubbing
How to
Verify Harddisk scrub Activity enabled on Exadata ?
Answer:
cellcli
–e list cell attributes name,hardDiskScrubInterval
Example
output : exatestcel12 biweekly !! Enabled for
biweekely
How to
Verity Harddisk Scrub Activity started on Exadata Cell node ?
On Cell
node root# cd $CELLTRACE
In the
cell alert log.
=>Scrubbing
started:
“Begin
scrubbing CellDisk: CD_00_celadm07” in cell alert.log
=>Scrubbing
completed:
“Finished
scrubbing CellDisk: CD_00_celadm07, scrubbed blocks (1MB): 1488,
found bad
blocks: 0” in cell alert.log
How to
Check Exadata Harddisk Scrub Schedule time on Exadata cell node ?
cellcli
–e list cell attributes name,hardDiskScrubInterval
Exmaple
output : exatestcel12 biweekly
How to
Stop Exadata Harddisk Scrub Activity ?
=>Stop
scrubbing job: (will stop the current running scrubbing job as well)
cellcli
–e alter cell hardDiskScrubInterval=none
How to
Alter Harddisk Scrub Activity frequency on Exadata Cell node ?
cellcli
–e alter cell hardDiskScrubInterval=daily
cellcli
–e alter cell hardDiskScrubInterval=weekly
cellcli
–e alter cell hardDiskScrubInterval=biweekly
How to
Alter the Harddisk Scrib Activity Specific Date and time :
cellcli
–e alter cell hardDiskScrubStartTime=‘2013-08-07T21:19:22-07:00’
Is
Disk Scrubbing needed on Extreme Flash Cells and Disks
Scrubbing
only is necessary on spinning disk cells (High Capacity or older High
Performance models) and that its not necessary to configure on EF disks or
cells.
Flash
cache:
1. What
is Write back Flash cache?
Write
back flash cache provides the ability to cache write I/Os directly to PCI flash
in addition to read I/Os.
Exadata
storage software version 11.2.3.2.1 is the minimum version required to use
write back flash cache.
2. What
software and hardware versions are supported?
Write-back
Smart Flash Cache requirement has been amended to state that both grid
infrastructure and database homes must run 11.2.0.3.9 or later.
Database
homes running 11.2.0.2 that cannot promptly upgrade to 11.2.0.3 to meet the
amended requirement must install Patch 17342825.
As long
as the minimum software requirements are met, any Exadata hardware with
flashcache (V2 and later) can take advantage of this feature.
Since
April 2017, Oracle Exadata Deployment Assistant (OEDA) enables Write-Back Flash
Cache by default if the following conditions are met:
1. GI and
DB home must be
11.2.0.4.1
or higher
12.1.0.2
or higher
12.2.0.2
or higher
AND
2. DATA
diskgroup has HIGH redundancy
Additional
Notes:
Extreme
Flash is always checked by default and cannot be modified.
A user
can check/uncheck the WBFC checkbox if not using Extreme Flash to manually
disable
3. When
should I use the write back flash cache?
If your
application writes intensively and if you find significant waits for "free
buffer waits" or high IO times to check for write bottleneck in AWR
reports, then you should consider using the write back flash cache
4. How to
determine if you have write back flash cache enabled?
Execute:
#dcli -g
~/cell_group -l root cellcli -e "list cell attributes flashcachemode"
Results:
flashCacheMode:
WriteBack -> write back flash cache
is enabled
flashCacheMode:
WriteThrough -> write back flash
cache is not enabled
5. How
can I enable the write back flash cache?
Preparation:
Before
performing the below steps, you should perform some prerequisite checks to
ensure that all cells are in the proper state.
Please perform the following check as root from one of the compute
nodes:
Check all
grdidksk asmdeactivationoutcome and asmmodestatus to ensure that all griddisks
on all cells are “Yes” and “ONLINE” respectively.
# dcli -g
cell_group -l root cellcli -e list griddisk attributes asmdeactivationoutcome,
asmmodestatus
Check
that all of the flashcache are in the “normal” state and that no flash disks
are in a degraded or critical state:
# dcli -g
cell_group -l root cellcli -e list flashcache detail
You may
instead use the setWBFC.sh script (version 1.0.0.21_20160602) to enable or
disable Writeback Flash Cache on the Exadata Machine DB nodes or Sun
Supercluster DB nodes. This script now supports X5-2 High Capacity storage
storage cells. This script automates all of the steps described below and supports
rolling and non-rolling. Please refer to the readme for further details.
The
script is supported for Storage cell software versions:
-
11.2.3.2.1 and above
-
11.2.3.3.1 enhancements that no longer require cellsrv service shutdown and
restart. See README.
-
12.1.1.1.1
-
12.1.2.1.0 (Except for X5-2 with Extreme Flash as WBFC will be enabled
automatically)
GI
versions
- 11.2.0.3.9 and above
- 12.1
Enabling Write Back Flash Cache when Storage cell
software version is 11.2.3.3.1 or higher
Note:
With
11.2.3.3.1 or higher it is not required to stop cellsrv process or inactivate
griddisk.
To reduce
performance impact on the application, execute the change during a period of
reduced workload.
alidate
all the Physical Disks are in NORMAL state before modifying FlashCache. The
following command should return no rows.
# dcli -l
root -g cell_group cellcli -e "list physicaldisk attributes
name,status"|grep -v normal
2. Drop
the flash cache
# dcli -l
root -g cell_group cellcli -e drop flashcache
3. Set
flashCacheMode attribute to writeback
# dcli -l
root -g cell_group cellcli -e "alter cell flashCacheMode=writeback"
4.
Re-create the flash cache
# dcli -l
root -g cell_group cellcli -e create flashcache all
5. Check
attribute flashCacheMode is WriteBack:
# dcli -l
root -g cell_group cellcli -e list cell detail | grep flashCacheMode
6.
Validate griddisk attributes cachingPolicy and cachedby
# cellcli
-e list griddisk attributes name,cachingpolicy,cachedby
Enabling
Write Back Flash Cache when Storage cell software version is lower than 11.2.3.3.1
Note: With versions lower than 11.2.3.3.1, to make
changes to the FlashCachemode, it requires to stop the cellsrv process. That introduce the option of making the
changes cell by cell (rolling) or all cells (non rolling) having CRS down.
A. Enable
Write Back Flash Cache using a ROLLING method
(RDBMS
& ASM instance is up - enabling write-back flashcache one cell at a time)
Log onto the first cell that you wish to
enable write-back FlashCache
1. Drop
the flash cache on that cell
# cellcli
-e drop flashcache
2. Check
if ASM will be OK if the grid disks go OFFLINE. The following command should
return 'Yes' for the grid disks being listed:
# cellcli
-e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
3.
Inactivate the griddisk on the cell
# cellcli
–e alter griddisk all inactive
4. Shut
down cellsrv service
# cellcli
-e alter cell shutdown services cellsrv
5. Set
the cell flashcache mode to writeback
# cellcli
-e "alter cell flashCacheMode=writeback"
6.
Restart the cellsrv service
# cellcli
-e alter cell startup services cellsrv
7.
Reactivate the griddisks on the cell
# cellcli
-e alter griddisk all active
8. Verify
all grid disks have been successfully put online using the following command:
# cellcli
-e list griddisk attributes name, asmmodestatus
9.
Recreate the flash cache
# cellcli
-e create flashcache all
10. Check
the status of the cell to confirm that it's now in WriteBack mode:
# cellcli
-e list cell detail | grep flashCacheMode
11.
Repeat these same steps again on the next cell. However, before taking another
storage server offline, execute the following making sure
'asmdeactivationoutcome' displays YES:
# cellcli
-e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
B . Enable Write Back Flash Cache using a
NON-ROLLING method
(RDBMS & ASM instances are down while
enabling write-back flashcache)
1. Drop
the flash cache on that cell
# cellcli
-e drop flashcache
2. Shut
down cellsrv service
# cellcli
-e alter cell shutdown services cellsrv
3. Set
the cell flashcache mode to writeback
# cellcli
-e "alter cell flashCacheMode=writeback"
4.
Restart the cellsrv service
# cellcli
-e alter cell startup services cellsrv
5.
Recreate the flash cache
# cellcli
-e create flashcache all
6. How
can I disable the write back flash cache?
* Please
note, disabling write back flash cache is not a typical operation and should
only be done under the guidance of Oracle Support.
Disabling
the write back flash cache requires flushing the dirty blocks to disk before
changing the “flashcacheMode” to “writethrough”. The flush can be performed in parallel across
all cells using the dcli command. Once
the flush begins, all caching to the flash cache is stopped. Therefore, applications will experience some
performance impact that will vary depending upon the nature of the workload.
The following steps are common to all cells and can be performed in parallel
use the dcli utility from one of the compute nodes as shown below.
Disabling
Write Back Flash Cache when Storage cell software version is 11.2.3.3.1 or
higher
Note:
With
11.2.3.3.1 or higher there is no need to stop cellsrv process or inactivate
griddisk.
To reduce
performance impact on the application, execute the change during a reduce
workload.
1.
Validate all the Physical Disks are in NORMAL state before modifying
FlashCache. The following command should return no rows.
# dcli -l
root -g cell_group cellcli -e “list physicaldisk attributes name,status”|grep
–v normal
2.
Determine amount of dirty data in the flash cache.
# cellcli
-e "list metriccurrent attributes name,metricvalue where name like
\'FC_BY_DIRTY.*\' "
3. Flush
the flash cache
# dcli -g
cell_group -l root cellcli -e "alter flashcache all flush"
4. Check
the progress of the flushing of flash cache
The
flushing process is complete when FC_BY_DIRTY is 0 MB
# dcli -g
cell_group -l root cellcli -e "list metriccurrent attributes
name,metricvalue where name like \'FC_BY_DIRTY.*\' "
or
flushstatus attribute is "Completed"
# dcli -g
cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus,
flusherror" | grep FD
5. If
flushing the cache is complete, drop the flash cache
# dcli -g
cell_group -l root cellcli -e drop flashcache
6. Set
the flashCacheMode attribute to writethrough
# dcli -g
cell_group -l root cellcli -e "alter cell
flashCacheMode=writethrough"
7.
Re-create the flash cache
# dcli -g
cell_group -l root cellcli -e create flashcache all
8. Check
the attribute flashcacheMode is Writethrough
# dcli -g
cell_group -l root cellcli -e list cell detail | grep flashCacheMode
Disabling Write Back Flash Cache when Storage
cell software version is lower than
11.2.3.3.1
Note:
With versions lower than 11.2.3.3.1, to make changes to the FlashCachemode, it
requires to stop the cellsrv process. That introduce the option of making the
changes cell by cell (rolling) or all cells (non rolling) having CRS down.
A.
Disable Write Back Flash Cache using a ROLLING method
(RDBMS & ASM instance is up - disabling
write-back flashcache one cell at a time)
1. Check
griddisk status by verifying the griddisk attribute “asmdeactivationoutcome” =
“Yes” for all griddisks on this cell. Do
not proceed if a griddisk is returned using the following command. The following command should return no rows.
#
dcli -g cell_group -l root cellcli -e
"list griddisk where asmdeactivationoutcome != 'Yes' attributes
name,asmmodestatus,asmdeactivationoutcome"
2.
Determine amount of bytes to be flushed by determining how much is dirty is in
the flash cache. This will provide the
number of bytes of data that needs to be de-staged to disk per cell, which will
give an indication of how long the flush will take.
# dcli -g
cell_group -l root cellcli -e "list metriccurrent attributes
name,metricvalue where name like \'FC_BY_DIRTY.*\' "
3. Flush
the flashcache across all cells
To
perform this step, it is recommended to have two separate sessions: one to
execute the command below and the other to monitor its progression in the next
step.
Issue the
following command to begin the flush in one of the two sessions:
# dcli -g
cell_group -l root cellcli -e "alter flashcache all flush"
If any
errors occur, they will be displayed in this session, otherwise, this session
will show a successful flush across all cells.
4. Check
the flush status across all cells
a.
Execute the following command every few minutes in the second session to
monitor the progress. As dirty blocks
are de-staged to disk, this count will reduce to zero (0). This will take some time and you can
determine a time estimate as you execute the following command over time:
# dcli -g
cell_group -l root cellcli -e "list metriccurrent attributes
name,metricvalue where name like \'FC_BY_DIRTY.*\' "
b. The
following command should return
"working" for each flash disk on each cell while the cache is
being flushed and "completed" when it is finished. Execute the
following command in the second session:
# dcli -g
cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus,
flusherror" | grep FD
DO NOT
CONTINUE UNTIL flushstatus is COMPLETE for all the celldisks
The
following steps are to be executed individually on each cell, one at a
time. All steps that must be performed
directly on a cell use the cellcli utility.
Log onto the first cell that will have the
write back flash cache disabled.
5. Drop
the flashcache for this cell after the flush completes
# cellcli
-e drop flashcache
6.
Inactivate all griddisks on the cell
# cellcli
-e alter griddisk all inactive
7. Shut
down the cellsrv service
# cellcli
-e alter cell shutdown services cellsrv
8. Reset
the cell flash cache state to writethrough
# cellcli
-e "alter cell flashCacheMode=writethrough"
9.
Restart the cellsrv service
# cellcli
-e alter cell startup services cellsrv
10.
Reactivate the griddisks on the cell
# cellcli
-e alter griddisk all active
11.
Recreate the flash cache
# cellcli
-e create flashcache all
12. Check
the status of this cell flash cache state
# cellcli
-e list cell detail | grep flashCacheMode
13. Check
the griddisks of the cell
Before
moving on to the next cell, check the attribute “asmModestatus” of all of the
griddisks and make sure they are all “ONLINE” and the attribute “asmdeactivationoutcome”
is set to “Yes”. It may be necessary to
execute the following command several times until the “asmModestatus” shows
“ONLINE”.
# cellcli
-e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
B.
Disable Write Back Flash Cache using a NON-ROLLING method
(RDBMS & ASM instances are down while
disabling write-back flashcache)
To reduce
the time for a total outage, the flashcache flush operation can be performed in
advance of shutting down the entire cluster.
1.
Determine amount of bytes to be flushed by determining how much is dirty in the
flash cache.
# dcli -g
cell_group -l root cellcli -e "list metriccurrent attributes
name,metricvalue where name like \'FC_BY_DIRTY.*\' "
2. Flush
the flashcache across all cells
*To
perform this step, it is recommended to have two separate sessions: one to
execute the command below and the other to monitor its progression in the next
step.
Issue the
following command to begin the flush in one of the two sessions:
# dcli -g
cell_group -l root cellcli -e "alter flashcache all flush"
If any
errors occur, they will be displayed in this session, otherwise, this session
will show a successful flush across all cells.
Once the
flush begins, all caching to the flash cache is stopped. Therefore, applications will experience some
performance impact that will vary depending upon the nature of the workload.
3. Check
the flush status across all cells
a.
Execute the following command in the second session every few minutes to
monitor the progress. As dirty blocks
are de-staged to disk, this count will reduce to zero (0). This will take some time and you can
determine a time estimate as you execute the following command over time:
# dcli -g
cell_group -l root cellcli -e "list metriccurrent attributes
name,metricvalue where name like \'FC_BY_DIRTY.*\' "
b. Check
the status. For each flash disk it will return "working" while the
cache is being flushed, and "Completed" when it is finished:
# dcli -g
cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus,
flusherror" | grep FD
4.
Shutdown all databases and the entire cluster
Ensure
the entire cluster is shut down (all databases and CRS). Not doing so will result in a crash of the
all databases and the cluster when the cellsrv service is shut down and
restarted. It is recommended to shut
down each database with the srvctl command.
However, you can shut down the entire cluster at once logged on as root
on one of the compute nodes as follows:
# cd
$GRID_HOME/bin
#
./crsctl stop cluster -all
$GRID_HOME
refers to the location where the Grid Infrastructure is installed. Typically: /u01/app/11.2.0.3/grid.
5. Drop
the flash cache across all cells after flush completes
# dcli -g
cell_group -l root cellcli -e drop flashcache
6. Shut
down the cellsrv service across all cells
# dcli -g
cell_group -l root cellcli -e alter cell shutdown services cellsrv
7. Reset
the cell flashcache state to writethrough across all cells
# dcli -g
cell_group -l root cellcli -e "alter cell flashCacheMode=writethrough"
8.
Restart the cellsrv service across all cells
# dcli -g
cell_group -l root cellcli -e alter cell startup services cellsrv
9.
Recreate the flash cache across all cells
# dcli -g
cell_group -l root cellcli -e create flashcache all
10. Check
the status of all cells’ flash cache state
# dcli -g
cell_group -l root cellcli -e list cell detail | grep flashCacheMode
The
flashCacheMode state of all cells should now be in writethrough
11. Start
up the cluster and all databases. As
root on one of the compute nodes, issue the following command:
# cd
$GRID_HOME/bin
#
./crsctl start cluster -all
7. What
is the performance benefit we can get by enabling the write back flash cache?
Write-back flash cache significantly
improves the write intensive operations because writing to flash cache is
faster than writing to Hard disks.
Depending on your application, on X3-2
machines write performance can be improved up to 20X IOPS than disk and 10X
more write IOPs than to disk on V2 and X2
8. Can I
disable write back Flash cache on certain disk groups?
Yes, you can disable cell level flash caching
for grid disks that do not need it when using Write Back Flash Cache. For
further details, refer to our best practice recommendations here
9. Is
Write-back flash cache persistent upon reboots?
Yes
Using
Exadata storage software version 11.2.3.2.1, Exadata Smart Flash Cache is
persistent across Exadata Storage server restarts.