Saturday, February 25, 2017

Exadata Question and Answers

General :
What environment is a good fit for Exadata?
Exadata was originally designed for the warehouse environment. Later it was enhanced for use in the OLTP databases as well.

What are the advantages of Exadata?
The Exadata cluster allows for consistent performance while allowing for increased throughput. As load increases on the cluster the performance remains consistent by utilizing inter-instance and intra-instance parallelism.

It should not be expected that just moving to Exadata will improve performance. In most cases it will especially if the current database host is overloaded.

What is the secret behind Exadata’s higher throughput?

Exadata ships less data through the pipes between the storage and the database nodes and other nodes in the RAC cluster.
Also it’s ability to do massive parallelism by running parallel processes across all the nodes in the cluster provides it much higher level of throughput.
It also has much bigger pipes in the cluster using Infiniband interconnect for inter-instance data block transfers as high as 5X of fiberchannel networks.

What are the different Exadata configurations?

The Exadata Appliance configuration comes as a Full Rack, Half Rack, Quarter Rack or 1/8th rack.

What are the key Hardware components?

DB Server
Storage Server Cells
High Speed Infiniband Switch
Cisco Switch

What are the Key Software Features?

Smart Scan,
Smart Flash Cache
Storage Index
Exadata Hybrid Columnar Compression (EHCC)
IORM (I/O Resource Manager)

What is a Cell and Grid Disk?

Cell and Grid Disk are a logical component of the physical Exadata storage. A cell or Exadata Storage server cell is a combination of Disk Drives put together to store user data. Each Cell Disk corresponds to a LUN (Logical Unit) which has been formatted by the Exadata Storage Server Software. Typically, each cell has 12 disk drives mapped to it.
Grid Disks are created on top of Cell Disks and are presented to Oracle ASM as ASM disks. Space is allocated in chunks from the outer tracks of the Cell disk and moving inwards. One can have multiple Grid Disks per Cell disk.

What are the Parallelism instance parameter used in Exadata?

The parameter PARALLEL_FORCE_LOCAL can be specified at the session level for a particular job.

How do you Test performance of Exadata?

You can use the “calibrate” commands at the cellcli command line.

What are the ways to migrate onto Exadata?

Depending on the downtime allowed there are several options:

Oracle DataGuard
Traditional Export/Import
Tablespace transportation
Goldengate Replication after a data restore onto Exadata.

What types of operations does Exadata “offload”?

Some of the operations that are offloaded from the database host to the cell servers are:

Predicate filtering
Column project filtering
Join processing
Backups

What is cellcli?

This is the command line utility used to managed the cell storage.

How do you create obtain info on the Celldisks?

At the cellcli command line you can issue the “list celldisk” command.

How would you create a grid disk?

At the cellcli command you would need to issue the “create grididsk all ..” command.

What are the cellinit.ora and the cellip.ora files used for?

These files have the hostnames and the ip address of all the nodes in the cluster. They are used to run commands on remote database and cellserver nodes from a local host.
Example:
cat /etc/oracle/cell/network-config/cellinit.ora
ipaddress1=192.168.47.21/24
$ cat /etc/oracle/cell/network-config/cellip.ora
cell=”192.168.47.21:5042″
cell=”192.168.47.22:5042″
cell=”192.168.47.23:5042″

What operating systems does Exadata support?

Exadata has traditionally run Oracle Linux OS. Recently, Solaris has also been made available on this engineered system.

How can I tell the current cell software version and also available versions to rollback to?

Useful commands

imagehistory

imageinfo

I’ve heard I should keep an eye on the temperature of the cells. How can I do that?

Inlet Ambient Temperature across all cells can be determined with the following command:

dcli -g cell_group -l root "ipmitool sensor | grep 'Inlet Amb Temp'"

I see STORAGE in my explain plan but it doesn't appear that my query is offloading based on runtime and IO throughput rates. Why?

This can happen if the query is run serially and direct reads aren't chosen.

Offloading only happens for direct reads (reads to PGA)

PQ processes generally always use direct reads,therefore offload

Foreground processes running serially may choose direct or buffered reads

Note that in some cases buffered reads are actually better

 

My offload performance was great, but all of a sudden it degraded by over 20%. What may be causing this?

 Check for a cell reboot and resultant offline disks. Also, check for issues above the storage layer such as a change in execution plan, which can cause this.

 One of my cells rebooted. What should I do?

 Confirm if the cell is back up and if griddisks are online in asm. Check to make sure each cell has 34 or 36 griddisks

 online:

 SQL> select failgroup, mode_status, count(*) from v$asm_disk group by failgroup, mode_status;

 

Also consider these sources for information:

 ASMCMD> lsdg (see offline_disks column in output)

ASM instance alert log

v$asm_disk.repair_timer (seconds remaining until drop)

v$asm_disk.mode_status

 My ASM instance is hung trying to discover disks?

  Confirm hung on discovery (instead of hang for a different reason) via ASM instance alert log. Once confirmed, try kfod to see if it hangs as well.

 If kfod hangs, check that libskgxp11.so has the same size and checksum as libskgxpr.so in all ORACLE_HOMEs (includingCRS_HOME). If not, relink the oracle binary with RDS.

Check cellinit.ora and cellip.ora files for correctness

Check both master and slave diskmon trace files

 I suspect network problems. How do I know my InfiniBand fabric is okay?

 Refer to MOS note: 1053498.1 & try the following utilities:

 These return fabric-wide info, so run from one node, not all

/usr/bin/ibdiagnet –c 1000

Check for errors/warnings summary at the end of output

Search for details in log files (/tmp/ibdiagnet*)

/opt/oracle.cellos/SupportTools/ibdiagtools/infinicheck

This script generates a lot of IB traffic to measure throughput, so it will impact performance while it is running

/usr/sbin/iblinkinfo.pl Shows link status on all ports

 

What is IORM and what is its role in Exadata?

IORM stand for I/O Resource Manager which manages the I/Os of multiple database on storage cell.

 

How we can check whether oracle best practice has been configured on Exadata?

We can execute Exacheck and verify the best practice setup on Exadata machine.

 

How many networks required in Exadata?

Public/Client Network --- For Application Connectivity

Management Network --- For Exadata H/W management

Private Network -- For cluster inter connectivity and Storage connectivity

 

Which ASM parameters are responsible for Auto disk management in Exadata?

_AUTO_MANAGE_MAX_ONLINE_TRIES --- It controls maximum number of attempts to make disk Online

_AUTO_MANAGE_EXADATA_DISKS --- It control auto disk management feature

_AUTO_MANAGE_NUM_TRIES    --- It controls maximum number of attempt to perform an automatic operation

Smart Scan:


What is Smart Scan ?

The data search and retrieval processing can be offloaded to the Exadata Storage Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle Database can optimize the performance of operations that perform table and index scans by performing the scans inside Exadata Storage Server, rather than transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing

What is predicate filtering and what all conditional operators are supported by predicate filtering ?

 Exadata Storage Server enables predicate filtering for table scans. Rather than returning all the rows for the database to evaluate, Exadata Storage Server returns only the rows that match the filter condition.
 In addition, many common SQL functions can be evaluated by Exadata Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.
   SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR.


 How to cache an object in smart cache?

 DBA can cache an object by setting CELL_FLASH_CACHE setting to KEEP. The default value of this setting is DEFAULT.

 What are Exadata Cell node Unique Features?

 Cell node unique feature:

 Smart Scans

Hybrid Columnar Compression

Storage Indexes

Flash Cache

ExadataI/O Resource Management in Multi-Database Environment

 

What is the location of Cell Nodes Logs and Traces?

 

Cell Nodes Logs and Traces

 

$ADR_BASE/diag/asm/cell/`hostname`/trace/alert.log

$ADR_BASE/diag/asm/cell/`hostname`/trace/ms-odl.*

$ADR_BASE/diag/asm/cell/`hostname`/trace/svtrc__0.trc -- ps -ef | grep "cellsrv 100"

$ADR_BASE/diag/asm/cell/`hostname`/incident/*

/var/log/messages*, dmesg

/var/log/sa/*

/var/log/cellos/*

 

 How to disable to Smart Scan?

 

Cell_offload_processing=false (default true)

 

General Cell related Database views?

 

Cell related Database view's:

sql> select * from sys.GV_$CELL_STATE;

sql> select * from sys.GV_$CELL;

sql> select * from sys.GV_$CELL_THREAD_HISTORY;

sql> select * from sys.GV_$CELL_REQUEST_TOTALS;

sql> select * from sys.GV_$CELL_CONFIG;

 

What is Bloom Filter ?

Offloaded joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. The properties of a Bloom filter make is a very efficient way of determining which values are not in a set. This is very useful for processing join conditions where a significant proportion of the data does not fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel join operations. When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the WHERE clause predicates (for that table) are used to create a hash table. During the hash table creation, a Bloom filter bit string is also created based on the join column. The bit string is then sent as an additional predicate to the second table scan. After the WHERE clause predicates relating to the second table are applied, the resulting rows are tested using the Bloom filter. Any rows rejected by the Bloom filter must fail the join criteria and are discarded. Any rows that match using the Bloom filter are sent to the hash join.
With Exadata, the Bloom filter is passed to the storage servers as an additional predicate. Processing the Bloom filter inside Exadata Storage Server can reduce the amount of data transported to the database server to process a join, which in turn can speed up query performance.


Will bloom filter work with all type of joins?

No, bloom filter works only with hash joins.

 

How to Identify a Bloom Filter in an Execution plan:

 You can identify a bloom filter in a plan when you see :BF0000 in the Name column of the execution plan.

How to disable Bloom Fliter?

 To disable the feature, the initialization parameter _bloom_pruning_enabled must be set to FALSE.

Does Smart Scan work with encrypted and compressed data ?

Yes, smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and encrypted columns. For encrypted tablespaces, Exadata Storage Server can decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform row and column filtering on encrypted data. Significant CPU savings can be made within the database server by offloading the CPU-intensive decryption task to Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so that column projection and row filtering can be executed along with decompression at the storage level to save CPU cycles on the database servers.


What are prerequisites for Smart Scan to occur ?

There are 4 basic requirements that must be met for Smart Scans to occur :
 1) There must be a full scan of an object; that is, full table scans, fast full index scans and fast full bitmap index scans.
 2)The scan must use Oracle’s Direct Path Read mechanism. Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache).
      - Direct-path reads are automatically used for parallel queries
      - Direct-path reads may be used for serial queries
             o Not used by default for serial small table scans
             o Use _serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored on Exadata cells. The disk group must also have the disk group attribute settings as below :-
        'compatible.rdbms' = 11.2.0.0.0' (or later)
        'compatible.asm' = 11.2.0.0.0' (or later)
        ' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables Smart Scan.
   The default value of the parameter is TRUE, meaning that Smart Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and the database uses Exadata storage to serve data blocks similar to traditional storage.


What are the situations that prevent Smart Scan from happening ?

Scan on a clustered table
Scan on an index-organized table
Fast full scan on a compressed index
Fast full scan on a reverse key indexes
The table has row-level dependency tracking enabled
The ORA_ROWSCN pseudocolumn is being fetched
The optimizer wants the scan to return rows in ROWID order
The command is CREATE INDEX using NOSORT
A LOB or LONG column is being selected or queried
A SELECT .. VERSIONS flashback query is being executed
To evaluate a predicate based on a virtual column
More than 255 columns are referenced in the query
The data is encrypted and cell-based decryption is disabled
If table has CACHE property
If _serial_direct_read is turned off (NEVER)
If the partitioned object's size is less than _small_table_threshold.
Offloading is not done on serial DMLs.
Serial Direct read is not applied for these cases (and thus no smart scan) :
          - sql from a table function
          - sql from dbms_sql
          - sql from plsql trigger
Smart scan is not enabled for the sql in plsql when plsql package is called by 'CALL plsql‘
Does not work on serial queries issued from shared servers
The SQL in question must not be quarantined.
High CPU usage on the Storage Cells


What does it mean by sql is quarantined ?

When a cellsrv process (on an exadata storage server cell) terminates and, if the thread that caused the death (or rather encountered the death) was performing smart scan, we determine the sql step that the thread was performing and make a note of it in a file on the cell. When the cellsrv restarts, the plan step is considered quarantined. When there are 3 quarantines, smart scan for a database gets quarantined. When smart scan gets quarantined, cellsrv will not perform any predicate evaluation/filtering. It will issue parallel IOs, and return the blocks as is.

How to verify from the explain plan that smart scan is being used ?

Unfortunately, the normal execution plan output produced by the DBMS_XPLAN package will not show you whether a Smart Scan is used or not.
If you explain plan output shows a "TABLE ACCESS STORAGE FULL" operation and the predicate section shows a "storage()" predicate associated to the plan. Both of these characteristics indicate that a Smart Scan was possible, but neither provides a definitive verification.

How to Verify from 10046 trace that Smart Scan is happening ?

One of the most straightforward ways to determine whether a Smart Scan is being used by the query, is to enable a 10046
trace on the statement in question. Tracing is a fairly foolproof way to verify whether a Smart Scan was used or not. If Smart Scan was used, there will be 'CELL SMART TABLE SCAN' or 'CELL SMART INDEX SCAN' events in the trace file.

Are there any cases where smart scan is considered as an inefficient method for data retrieval ?

1) Yes, when the data has lots of chained rows, it is possible for the cell not able to determine the locale of the chained row and then the compute node will have to do additional single block reads to fulfill the request.
2) Frequently updated tables, as in order to do smart scan a checkpoint needs to be done and frequent ones slow down the queries.
    For example, if Oracle notices that a block is “newer” than the current query, the process of finding an age-appropriate version of the block is left for the database layer to deal with. This
    effectively pauses the Smart Scan processing while the database does its traditional read consistency processing.


What are Storage Indexes ?

Storage Indexes is a very useful Exadata feature which is transparent to database and are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.

Since the indexes are in-memory on the cell server, if a cell server is restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads.


Under what conditions Storage Index would be used ?

In order for a storage index to be used, a query must include or make use of all the following :
1. Storage Indexes can only be used with statements that do Smart Scans. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL


What conditions prevent the use of Storage Indexes ?

1. Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter

"_smu_debug_mode=134217728"

or alternatively

"_enable_minscn_cr"=false

What are the statistics related to Storage Indexes ?

There is only one database statistic related to storage indexes. The statistic, 'Cell Physical IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given SQL statement in order to
determine whether Storage Indexes were used on that particular statement.

Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';

What is Smart Scan ?

The data search and retrieval processing can be offloaded to the Exadata Storage Servers. This feature is called Smart Scan. Using this Smart Scan, Oracle Database can optimize the performance of operations that perform table and index scans by performing the scans inside Exadata Storage Server, rather than transporting all the data to the database server.
Smart Scan capabilities includes :-
1) Predicate Filtering
2) Column filtering
3) Join Processing


What is predicate filtering and what all conditional operators are supported by predicate filtering ?

 Exadata Storage Server enables predicate filtering for table scans. Rather than returning all the rows for the database to evaluate, Exadata Storage Server returns only the rows that match the filter condition.
 In addition, many common SQL functions can be evaluated by Exadata Storage Server during predicate filtering.
The definitive list of which functions are offloadable for your particular version is contained in V$SQLFN_METADATA.
   SQL> select * from v$sqlfn_metadata where offloadable = 'YES';
A list of conditional operators that are supported by predicate filtering include =, !=, <, >, <=, >=, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR.


What is Bloom Filter ?

Offloaded joins are accomplished by creating what is called a bloom filter.
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. The properties of a Bloom filter make is a very efficient way of determining which values are not in a set. This is very useful for processing join conditions where a significant proportion of the data does not fulfill the join criteria.
Oracle Database 10g Release 2 first used Bloom filters to optimize parallel join operations. When two tables are joined via a hash join, the first table (typically the smaller table) is scanned and the rows that satisfy the WHERE clause predicates (for that table) are used to create a hash table. During the hash table creation, a Bloom filter bit string is also created based on the join column. The bit string is then sent as an additional predicate to the second table scan. After the WHERE clause predicates relating to the second table are applied, the resulting rows are tested using the Bloom filter. Any rows rejected by the Bloom filter must fail the join criteria and are discarded. Any rows that match using the Bloom filter are sent to the hash join.
With Exadata, the Bloom filter is passed to the storage servers as an additional predicate. Processing the Bloom filter inside Exadata Storage Server can reduce the amount of data transported to the database server to process a join, which in turn can speed up query performance.


Will bloom filter work with all type of joins?

No, bloom filter works only with hash joins.


Does Smart Scan work with encrypted and compressed data ?

Yes, smart scan works with encrypted and compressed data.
Exadata Storage Server performs Smart Scans on encrypted tablespaces and encrypted columns. For encrypted tablespaces, Exadata Storage Server can decrypt blocks and return the decrypted blocks to Oracle Database, or it can perform row and column filtering on encrypted data. Significant CPU savings can be made within the database server by offloading the CPU-intensive decryption task to Exadata cells.
Smart Scan works in conjunction with Exadata Hybrid Columnar Compression so that column projection and row filtering can be executed along with decompression at the storage level to save CPU cycles on the database servers.


What are prerequisites for Smart Scan to occur ?

There are 4 basic requirements that must be met for Smart Scans to occur :
 1) There must be a full scan of an object; that is, full table scans, fast full index scans and fast full bitmap index scans.
 2)The scan must use Oracle’s Direct Path Read mechanism. Direct path reads are generally used by Oracle when reading directly into PGA memory (as opposed to into the buffer cache).
      - Direct-path reads are automatically used for parallel queries
      - Direct-path reads may be used for serial queries
             o Not used by default for serial small table scans
             o Use _serial_direct_read=TRUE to force direct path reads
3) Each segment being scanned must be on a disk group that is completely stored on Exadata cells. The disk group must also have the disk group attribute settings as below :-
        'compatible.rdbms' = 11.2.0.0.0' (or later)
        'compatible.asm' = 11.2.0.0.0' (or later)
        ' cell.smart_scan_capable' = TRUE
4) The CELL_OFFLOAD_PROCESSING initialization parameter enables or disables Smart Scan.
   The default value of the parameter is TRUE, meaning that Smart Scan is enabled by default. If it is set to FALSE, Smart Scan is disabled and the database uses Exadata storage to serve data blocks similar to traditional storage.


What are the situations that prevent Smart Scan from happening ?

Scan on a clustered table
Scan on an index-organized table
Fast full scan on a compressed index
Fast full scan on a reverse key indexes
The table has row-level dependency tracking enabled
The ORA_ROWSCN pseudocolumn is being fetched
The optimizer wants the scan to return rows in ROWID order
The command is CREATE INDEX using NOSORT
A LOB or LONG column is being selected or queried
A SELECT .. VERSIONS flashback query is being executed
To evaluate a predicate based on a virtual column
More than 255 columns are referenced in the query
The data is encrypted and cell-based decryption is disabled
If table has CACHE property
If _serial_direct_read is turned off (NEVER)
If the partitioned object's size is less than _small_table_threshold.
Offloading is not done on serial DMLs.
Serial Direct read is not applied for these cases (and thus no smart scan) :
          - sql from a table function
          - sql from dbms_sql
          - sql from plsql trigger
Smart scan is not enabled for the sql in plsql when plsql package is called by 'CALL plsql‘
Does not work on serial queries issued from shared servers
The SQL in question must not be quarantined.
High CPU usage on the Storage Cells

·        What is offloading and how it works?

It refers to the fact that part of the traditional SQL processing done by the database can be “offloaded” from the database layer to the storage layer

 

The primary benefit of Offloading is the reduction in the volume of data that must be returned to the database server. This is one of the major bottlenecks of most  large databases.

What does it mean by sql is quarantined ?

When a cellsrv process (on an exadata storage server cell) terminates and, if the thread that caused the death (or rather encountered the death) was performing smart scan, we determine the sql step that the thread was performing and make a note of it in a file on the cell. When the cellsrv restarts, the plan step is considered quarantined. When there are 3 quarantines, smart scan for a database gets quarantined. When smart scan gets quarantined, cellsrv will not perform any predicate evaluation/filtering. It will issue parallel IOs, and return the blocks as is.

How to verify from the explain plan that smart scan is being used ?

Unfortunately, the normal execution plan output produced by the DBMS_XPLAN package will not show you whether a Smart Scan is used or not.
If you explain plan output shows a "TABLE ACCESS STORAGE FULL" operation and the predicate section shows a "storage()" predicate associated to the plan. Both of these characteristics indicate that a Smart Scan was possible, but neither provides a definitive verification.


How to Verify from 10046 trace that Smart Scan is happening ?

One of the most straightforward ways to determine whether a Smart Scan is being used by the query, is to enable a 10046
trace on the statement in question. Tracing is a fairly foolproof way to verify whether a Smart Scan was used or not. If Smart Scan was used, there will be 'CELL SMART TABLE SCAN' or 'CELL SMART INDEX SCAN' events in the trace file.

Are there any cases where smart scan is considered as an inefficient method for data retrieval ?

1) Yes, when the data has lots of chained rows, it is possible for the cell not able to determine the locale of the chained row and then the compute node will have to do additional single block reads to fulfill the request.
2) Frequently updated tables, as in order to do smart scan a checkpoint needs to be done and frequent ones slow down the queries.
    For example, if Oracle notices that a block is “newer” than the current query, the process of finding an age-appropriate version of the block is left for the database layer to deal with. This
    effectively pauses the Smart Scan processing while the database does its traditional read consistency processing.


What are Storage Indexes ?

Storage Indexes is a very useful Exadata feature which is transparent to database and are maintained automatically.
They are not indexes that are stored in the database like Oracle’s traditional B-Tree or bitmapped indexes. They are not capable of identifying a set of records that has a certain value in a given column. Rather, they are a feature of the storage server software that is designed to eliminate disk I/O.
They work by storing minimum and maximum column values for disk storage units, which are 1 Megabyte (MB) by default and are called region indexes.
Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.

Since the indexes are in-memory on the cell server, if a cell server is restarted the SIs are lost and must be rebuilt.
They are generally created during the first smart scan that references a given column after a storage server has been restarted.
They can also be created when a table is created via a CREATE TABLE AS SELECT statement, or during other direct-path loads.


Under what conditions Storage Index would be used ?

In order for a storage index to be used, a query must include or make use of all the following :
1. Storage Indexes can only be used with statements that do Smart Scans. The main requirements are that the optimizer must choose a full scan and that the I/O must be done via the direct path read mechanism.
2. In order for a statement to use a Storage Index, there must be a WHERE clause with at least one predicate.
3. Storage Indexes can be used with the following set of operators: =, <, >, BETWEEN, >=, <=, IN, IS NULL, IS NOT NULL


What conditions prevent the use of Storage Indexes ?

1. Storage Indexes are not created on CLOBs.
2. Storage Indexes do not work with predicates that use the != comparison operator.
3. Storage Indexes do not work on predicates that use the % wildcard.
4. Storage Indexes are created and maintained for eight-columns per table.
5. Customer had set either of the following init.ora parameter

"_smu_debug_mode=134217728"

or alternatively

"_enable_minscn_cr"=false



What are the statistics related to Storage Indexes ?

There is only one database statistic related to storage indexes. The statistic, 'Cell Physical IO Bytes
Saved by Storage Index', keeps track of the accumulated I/O that has been avoided by the use of Storage Indexes.
Since the statistic is cumulative, it must be checked before and after a given SQL statement in order to
determine whether Storage Indexes were used on that particular statement.

Run the following query to check the statistics for Storage Indexes :
select name, value from v$sysstat where name like '%storage%';


How to diagnose Smart Scan causing wrong results ?


Refer Note 1260804.1 - Exadata: How to diagnose smart scan and wrong results.


HardDisk  Scrubbing

1) How to Verify Harddisk scrub Activity enabled on Exadata ?
Answer:
cellcli –e list cell attributes name,hardDiskScrubInterval

2) How to Verity Harddisk Scrub Activity started on Exadata Cell node ?
Answer : On Cell node root# cd $CELLTRACE
In the cell alert log.
=>Scrubbing started:
“Begin scrubbing CellDisk: CD_00_celadm07” in cell alert.log
=>Scrubbing completed:
“Finished scrubbing CellDisk: CD_00_celadm07, scrubbed blocks (1MB): 1488,
found bad blocks: 0” in cell alert.log

3) How to Check Exadata Harddisk Scrub Schedule time on Exadata cell node ?
Answer :
cellcli –e list cell attributes name,hardDiskScrubInterval
Exmaple output : exatestcel12 biweekly

4) How to Stop Exadata Harddisk Scrub Activity ?
Answer :
=>Stop scrubbing job: (will stop the current running scrubbing job as well)
cellcli –e alter cell hardDiskScrubInterval=none

5) How to Alter Harddisk Scrub Activity frequency on Exadata Cell node ?
Answer :
cellcli –e alter cell hardDiskScrubInterval=daily
cellcli –e alter cell hardDiskScrubInterval=weekly
cellcli –e alter cell hardDiskScrubInterval=biweekly

6) How to Alter the Harddisk Scrib Activity Specific Date and time :
Answer :
cellcli –e alter cell hardDiskScrubStartTime=‘2013-08-07T21:19:22-07:00’



Hybrid Columnar Compression



What are the types of compression available with EHCC ?

EHCC is available with following types:

1. Warehouse Compression
Within warehouse compression there are two subtypes:
a. Query HIGH
b. Query LOW

2. Archive Compression
Within archive compression there are two subtypes:
a. Archive HIGH
b. Archive LOW

Can CTAS be used to create EHCC tables ?

Yes, CTAS can be used to create EHCC tables.
Example :

SQL> create table example compress for query high as select * from example2;

 In this example "compress for query high" is specific to EHCC that will create table with compression enabled.



How to convert existing / non-EHCC table to use EHCC ?

Existing / non-EHCC tables can be converted to EHCC using alter table commands as below.
Example:

SQL > alter table example move compress for query high;
SQL > alter table example compress for query high;

In first case as move is specifed, existing data will be moved to new compression format.
In second case, new data loaded in existing table will get EHCC compression keeping existing data as it is.


How to disable EHCC compression on a table ?

To disable or uncompress EHCC table use alert table command as below:

SQL> alter table example nocompress;
SQL> alter table example move nocompress;

 

 

Can we use DBMS_REDEFINITION online redefinition package for EHCC tables ?

Yes. DBMS_REDEFINITION does support EHCC tables and can be used to move data in online mode.

How does update works for EHCC table ?

When a row is updated in EHCC table, in the background we do a delete followed by insert of the row.
While doing so there are two major changes happens at row level:
1. ROWID changes for updated row as we have done a delete followed by insert.
2. Updated row comes out of the EHCC compression and is stored into OLTP compression format.

What is Compression Unit?

Hybrid Columnar Compression uses a logical construct called a compression unit (CU) to store a set of rows. When you load data into a table, the database stores groups of rows in columnar format, with the values for each column stored and compressed together. After the database has compressed the column data for a set of rows, the database fits the data into the compression unit.

How to check compression format for a particular row ?

Use dbms_compression.get_compression_type() to get the compression type at row level.

SQL> desc dbms_compression
FUNCTION GET_COMPRESSION_TYPE RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 OWNNAME                        VARCHAR2                IN
 TABNAME                        VARCHAR2                IN
 ROW_ID                         ROWID                   IN


Returns the compression type for the input rowid. The constants defined for compression type :
COMP_NOCOMPRESS = 1
COMP_FOR_OLTP = 2
COMP_FOR_QUERY_HIGH = 4
COMP_FOR_QUERY_LOW = 8
COMP_FOR_ARCHIVE_HIGH = 16
COMP_FOR_ARCHIVE_LOW = 32
Example :

SQL> select DBMS_COMPRESSION.GET_COMPRESSION_TYPE ('SCOTT','EXAMPLE','AAASi7AAFAAABR6AAA') FROM DUAL;

 

 

How to get estimate of compression ratios for EHCC compression types ?

Use dbms_compression.get_compression_ratio().

desc dbms_compression
PROCEDURE GET_COMPRESSION_RATIO
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SCRATCHTBSNAME                 VARCHAR2                IN
 OWNNAME                        VARCHAR2                IN
 TABNAME                        VARCHAR2                IN
 PARTNAME                       VARCHAR2                IN
 COMPTYPE                       NUMBER                  IN
 BLKCNT_CMP                     BINARY_INTEGER          OUT
 BLKCNT_UNCMP                   BINARY_INTEGER          OUT
 ROW_CMP                        BINARY_INTEGER          OUT
 ROW_UNCMP                      BINARY_INTEGER          OUT
 CMP_RATIO                      NUMBER                  OUT
 COMPTYPE_STR                   VARCHAR2                OUT
 SUBSET_NUMROWS                 NUMBER                  IN     DEFAULT

 

How to enable tracing for EHCC ?

Following events can be used to enable tracing for EHCC.

event="trace[ADVCMP_COMP] disk low“  (Tracing for EHCC loading)
event="trace[ADVCMP_DECOMP] disk low“ (Tracing for EHCC queries)
event="trace[ADVCMP_MAIN] disk low“  (Tracing for everything)

Where level can be one of the following:
lowest, low, medium, high, highest


How to query stats about CUs ?

Use below SQL to capture stats about CUs.

SQL> select name, value from v$sysstat where name like '%cell CU%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell CUs sent uncompressed                                                0
cell CUs sent compressed                                                  0
cell CUs sent head piece                                                  0
cell CUs processed for uncompressed                                       0
cell CUs processed for compressed                                         0


SQL> select name, value from v$sysstat where name like 'CC%';
NAME                                                               VALUE
---------------------------------------------------------------- ----------
CC CUs Decompressed                                                22591
CC Query High CUs Decompressed                                     11466
CC Compressed Length Decompressed                                  807050919
CC Decompressed Length Decompressed                                8367891105
CC Columns Decompressed                                            68403
CC Total Columns for Decompression                                 271092
CC Total Rows for Decompression                                    113019839
CC Pieces Buffered for Decompression                               22698
CC Total Pieces for Decompression                                  22698
CC Scan CUs Decompressed                                           70
CC Turbo Scan CUs Decompressed                                     11126

 

 

What storage types are supported for EHCC or HCC ?

Apart from Exadata, HCC is supported on Pillar Axiom and Sun ZFS Storage Appliance (ZFSSA).


Can we specify different compression type for each partition in a table ?

Yes. Compression can be specified at the level of a segment and hence EHCC can be specified at partition level.
For Example:

SQL> create table orders (cid, pid, sid, price, discount, odate)
partition by range (cid)
partition p1 values less than (100000) nocompress,
partition p2 values less than (200000) compress for archive low,
partition p3 values less than (300000) compress for query high,
partition p4 values less than (maxvalue) compress for query low)
as select * from prev_orders;

 

 

Does EHCC support function based indexes ?

Yes, all index types are supported including function based indexes.


Is EHCC suitable for single row access methods ?

Yes, EHCC compression type like Query HIGH and Query LOW are optimized for query on single row.
Single row access might be expensive for Archive HIGH and Archive LOW as compared to Query compression but still it can be used.


Does EHCC uncompress CUs to access the data? Will this result in multiple I/Os for a single row access ? 

Yes, to access a row we read the entire CU that could be comprised of multiple blocks.
This decompression happens at cell side so we use the CPU and memory available at Cell side.
Having said this, EHCC query architecture is optimized to increase scan query performance by taking advantage of fewer number of blocks reads.


Can we use EHCC on normal server with SUN ZFS filesystem ?

Yes, EHCC can be used with any server with storage as ZFS.
While performing the compression, it checks for certain libraries and if those libraries fall into supported storage type which is Exadata, SUN ZFSSA and Pillar, EHCC compression is achieved.


What are supported RDBMS version for EHCC ?

All 11.2.X.X and 12c RDBMS version supports EHCC.


If table is already OLTP compression on, will I be able to enable EHCC on the same table ?

Yes, It can be coverted to EHCC compression.
Use "alter table [table name] move compress for [compression type]" to make all existing rows to move to EHCC".
But at any time you can have only one compression type enabled at the table level.


Does EHCC supports BLOB data type ?

EHCC does not support BLOB. As BLOB is unstructured data, have to use secure files instead.


Does EHCC supports LONG data type ?

No, LONG datatype is not supported for EHCC.


Can both EHCC and OLTP compression will be on for a single table ?

Only one compression type can be enabled at table level. In case EHCC is enabled on table level and there are single row updates on the table then those updated rows goes into OLTP compressed blocks. With this we can have mix of EHCC compressed and OLTP compressed rows in a table.


In a warehouse with rows being added to an EHCC table via ETL, can the new rows be converted to EHCC, or the whole table to be periodically recompressed ?

If ETL is loading with direct load, the new rows will be added to EHCC compression else it will go to OLTP compression.
In case new rows goes to OLTP compression, you will have to do "alter table move" to move the data to EHCC compression periodically.


Are there any application code changes if we convert tables to EHCC ?

No. there are no application level code changes required to read or update data in EHCC tables.


Can we join a regular table with EHCC table in a query ?

Yes, we can join between EHCC and non EHCC table.


Which view to check the compression type ?

Use dba_tables to check the compression type.
Example :

SQL> Select owner, table_name, compress_for from sys.dba_tables where compression = 'ENABLED' order by 1,2;

OWNER                          TABLE_NAME                     COMPRESS_FOR
------------------------------ ------------------------------ ------------
INT_AGIIE                      I_POSITIONS                    BASIC
INT_AGIIE                      X_INT_AGIIE_STATUS             BASIC
SCOTT                          AK                             QUERY HIGH
SCOTT                          EHCC                           QUERY HIGH
SCOTT                          EHCC_W                         QUERY LOW
SCOTT                          EXAMPLE                        ARCHIVE LOW
SCOTT                          HCCTEST                        QUERY LOW
SCOTT                          OAVAL                          QUERY HIGH
SCOTT                          TAB1                           QUERY LOW
SYS                            AJAY                           QUERY LOW
SYS                            EHCC1                          QUERY HIGH

For partition tables, use DBA_TAB_PARTITOONS as well, since compression can be different at the partition level.


What's the difference between query low and query high ?

Query High gives around 10X compression ratio where as Query Log gives around 6X compression ratio.
Query performance is better in case of Query High as compared to query Low.
Load time is better in case of Query Low as compared to Query High.


Does Oracle Golden Gate supports EHCC from source to target replication ?

Oracle GoldenGate fully supports basic, advanced, and EHCC compressed tables in integrated capture mode only. In classic capture mode, Oracle GoldenGate can deliver to,
but not capture from, regular and EHCC compressed tables on Oracle Exadata.



If one table is marked as query high, can it be moved to archive high or any other EHCC method directly ?

Yes, this can be done using "alter table" command. If move is specified with alter table command, existing data will be moved to new compression format where as if move is no specified, new compression will be effective from subsequent loads.


What is the impact on performance if taking a table out of EHCC? specially if the table is large ?

To take a table out of EHCC you need to use "alter table .. move nocompress". When you do this, the table is
locked and offline for DML. To do this online, one can use online redefinition package (DBMS_REDEFINITION). Apart from this there is no other performance impact.
If order to improve performance of this activity one should run alter table command in parallel.


In EHCC we may have a table to have mix of HCC and OLTP compression. Does this create extra overhead with fetching data when range scan is used ?

Yes, it can cause extra over head if data is spread across multiple compression blocks.
To avoid this, one should move the table to EHCC compression type regularly to move the OLTP compressed rows to EHCC.


Is the row migrated out of CU when update occurs causing row migration ?

Yes, row goes out of CU and is stored in OLTP compressed block.


Default segment size is 8MB, can EHCC compression make it less than 8MB ?

No, it does not make the segment size smaller, rather it accommodates more data within 8MB segment size.


Flash Cache Compression

How to enable Flash Cache compression

 1. Flush the dirty blocks from Flash Disks to the GridDisks


CellCLI> ALTER FLASHCACHE all flush

To monitor the flush operation, open a new cell cellcli session and run command:

CellCLI> LIST CELLDISK attributes name, flushstatus, flusherror

While flush operation is working, the attribute flushstatus will report working

CellCLI> LIST CELLDISK attributes name, flushstatus, flusherror
         CD_00_celadm08                   
         CD_01_celadm08                   
         CD_02_celadm08                   
         CD_03_celadm08                   
         CD_04_celadm08                   
         CD_05_celadm08                   
         CD_06_celadm08                   
         CD_07_celadm08                   
         CD_08_celadm08                   
         CD_09_celadm08                   
         CD_10_celadm08                   
         CD_11_celadm08                   
         FD_00_celadm08    working        
         FD_01_celadm08    working        
         FD_02_celadm08    working        
         FD_03_celadm08    working        
         FD_04_celadm08    working        
         FD_05_celadm08    working        
         FD_06_celadm08    working        
         FD_07_celadm08    working        
         FD_08_celadm08    working        
         FD_09_celadm08    working        
         FD_10_celadm08    working        
         FD_11_celadm08    working        
         FD_12_celadm08    working        
         FD_13_celadm08    working        
         FD_14_celadm08    working        
         FD_15_celadm08    working        

 

To validate that flush operation was successful use any of following methods, opening a new cellcli session

Attribute flushstatus reports complete

CellCLI> LIST CELLDISK attributes name, flushstatus, flusherror
         CD_00_celadm08                   
         CD_01_celadm08                   
         CD_02_celadm08                   
         CD_03_celadm08                   
         CD_04_celadm08                   
         CD_05_celadm08                   
         CD_06_celadm08                   
         CD_07_celadm08                   
         CD_08_celadm08                   
         CD_09_celadm08                   
         CD_10_celadm08                   
         CD_11_celadm08                   
         FD_00_celadm08    complete       
         FD_01_celadm08    complete       
         FD_02_celadm08    complete       
         FD_03_celadm08    complete       
         FD_04_celadm08    complete       
         FD_05_celadm08    complete       
         FD_06_celadm08    complete       
         FD_07_celadm08    complete       
         FD_08_celadm08    complete       
         FD_09_celadm08    complete       
         FD_10_celadm08    complete       
         FD_11_celadm08    complete       
         FD_12_celadm08    complete       
         FD_13_celadm08    complete       
         FD_14_celadm08    complete       
         FD_15_celadm08    complete 

Verify cachedby attribute is null.  If a FlashDisk is reported, flush is still in progress

CellCLI> LIST GRIDDISK attributes name,cachedby

CellCLI> list griddisk attributes name,cachedby
         DATA_DMORL_CD_00_celadm08
         DATA_DMORL_CD_01_celadm08
         DATA_DMORL_CD_02_celadm08
         DATA_DMORL_CD_03_celadm08
         DATA_DMORL_CD_04_celadm08
         DATA_DMORL_CD_05_celadm08
         DATA_DMORL_CD_06_celadm08
         DATA_DMORL_CD_07_celadm08
         DATA_DMORL_CD_08_celadm08
         DATA_DMORL_CD_09_celadm08
         DATA_DMORL_CD_10_celadm08
         DATA_DMORL_CD_11_celadm08
         DBFS_DG_CD_02_celadm08
         DBFS_DG_CD_03_celadm08
         DBFS_DG_CD_04_celadm08
         DBFS_DG_CD_05_celadm08
         DBFS_DG_CD_06_celadm08
         DBFS_DG_CD_07_celadm08
         DBFS_DG_CD_08_celadm08
         DBFS_DG_CD_09_celadm08
         DBFS_DG_CD_10_celadm08
         DBFS_DG_CD_11_celadm08
         RECO_DMORL_CD_00_celadm08
         RECO_DMORL_CD_01_celadm08
         RECO_DMORL_CD_02_celadm08
         RECO_DMORL_CD_03_celadm08
         RECO_DMORL_CD_04_celadm08
         RECO_DMORL_CD_05_celadm08
         RECO_DMORL_CD_06_celadm08
         RECO_DMORL_CD_07_celadm08
         RECO_DMORL_CD_08_celadm08
         RECO_DMORL_CD_09_celadm08
         RECO_DMORL_CD_10_celadm08
         RECO_DMORL_CD_11_celadm08

The number of dirty buffers (unflushed)  will be 0 after flush is complete

CellCLI> LIST METRICCURRENT FC_BY_DIRTY

CellCLI>  LIST METRICCURRENT FC_BY_DIRTY
         FC_BY_DIRTY     FLASHCACHE      0.000 MB

2. Drop the FlashCache

CellCLI> DROP FLASHCACHE all

 

3. Drop the FlashLog

CellCLI> DROP FLASHLOG all

 

4. Drop the celldisks on the FlashDisks

CellCLI> DROP CELLDISK all flashdisk

 

5. Enable flash cache compression using the following commands, based on the system:

If Exadata Storage Cell Server image is 11.2.3.3.0 and the Storage Cell is X3-2:

CellCLI> ALTER CELL flashCacheCompX3Support=true

CellCLI> ALTER CELL flashCacheCompress=true

If Exadata Storage Cell server image is 11.2.3.3.1 or higher:


CellCLI> ALTER CELL flashCacheCompress=true

 

6. Validate the PhysicalDisk/Luns have the new size (Reference Table 2. PhysicalDisk size for FlashDisks)

CellCL>LIST PHYSICALDISK attributes name,physicalSize where disktype=flashdisk

 

AURA 2.0 / F40/X3

AURA 2.1 / F80 / X4

Compression OFF

Compression ON

Compression OFF

Compression ON

Physical Disk Size

93.13 G

186.26 G

186.26 G

372.53 G

Flash Cache Size

1489 G

2979 G

2979 G

5959 G

 Table 1. PhysicalDisk size for FlashDisks

 

7. Create  the celldisks on the FlashDisks

CellCLI> CREATE CELLDISK all flashdisk

Validate all sixteen FlashDisks are present

CellCLI> LIST CELLDISK where disktype=flashdisk

 

8. Create the FlashLog

CellCLI> CREATE FLASHLOG all

Validate FlashLog is in normal mode

CellCLI> LIST FLASHLOG detail

 

9.  Create the FlashCache

CellCLI> CREATE FLASHCACHE all

Validate FlashCache is in normal mode

CellCLI > LIST FLASHCACHE detail

 

How to disable Flash Cache compression



1. Flush the dirty blocks from Flash Disks to the GridDisks


CellCLI> ALTER FLASHCACHE all flush

 

To validate that flush operation was successful use any of following methods, opening a new cellcli session

Verify cachedby attribute is null.  If a FlashDisk is reported, flush is still in progress

CellCLI> LIST GRIDDISK attributes name,cachedby

The number of dirty buffers (unflushed)  will be 0 after flush is complete

CellCLI> LIST METRICCURRENT fc_by_dirty

 

2. Drop the FlashCache

CellCLI> DROP FLASHCACHE all

 

3. Drop the FlashLog

CellCLI> DROP FLASHLOG all

 

4. Drop the celldisks on the FlashDisks

CellCLI> DROP CELLDISK all flashdisk

 

5. Disable Flash Cache Compression

If Exadata Storage Cell Server image is 11.2.3.3.0 and the Storage Cell is X3-2:

CellCLI> ALTER CELL flashCacheCompX3Support=true

CellCLI> ALTER CELL flashCacheCompress=false

If Exadata Storage Cell server image is 11.2.3.3.1 or higher:


CellCLI> ALTER CELL flashCacheCompress=false

 

Validate it has been disabled by checking cell attributes:

CellCLI> LIST CELL attributes name,flashCacheCompress

Correct values are FALSE or a null string.

 

6. Validate the Physical Disks have been shrunk

CellCLI> LIST PHYSICALDISK attributes name,physicalSize,status where disktype=flashdisk

The status should be normal.  

Use the following table to validate the expected size when Compression is OFF

 

 

AURA 2.0 / F40/X3

AURA 2.1 / F80 / X4

Compression OFF

Compression ON

Compression OFF

Compression ON

Physical Disk Size

93.13 G

186.26 G

186.26 G

372.53 G

Flash Cache Size

1489 G

2979 G

2979 G

5959 G

 Table 2. PhysicalDisk size for FlashDisks

7. Create  the celldisks on the FlashDisks

CellCLI> create celldisk all flashdisk

Validate all sixteen FlashDisks are present

CellCLI> list celldisk where disktype=flashdisk


8. Create the FlashLog

CellCLI> create flashlog all

Validate FlashLog is in normal mode

CellCLI> list flashlog detail

 

9.  Create the FlashCache

CellCLI> create flashcache all

Validate FlashCache is in normal mode

 

CellCLI > list flashcache detail

Scrubbing

How to Verify Harddisk scrub Activity enabled on Exadata ?

Answer:

cellcli –e list cell attributes name,hardDiskScrubInterval

Example output : exatestcel12 biweekly       !! Enabled for biweekely


How to Verity Harddisk Scrub Activity started on Exadata Cell node ?

On Cell node root# cd $CELLTRACE

In the cell alert log.

=>Scrubbing started:

“Begin scrubbing CellDisk: CD_00_celadm07” in cell alert.log

=>Scrubbing completed:

“Finished scrubbing CellDisk: CD_00_celadm07, scrubbed blocks (1MB): 1488,

found bad blocks: 0” in cell alert.log

How to Check Exadata Harddisk Scrub Schedule time on Exadata cell node ?

cellcli –e list cell attributes name,hardDiskScrubInterval

Exmaple output : exatestcel12 biweekly

 

How to Stop Exadata Harddisk Scrub Activity ?

=>Stop scrubbing job: (will stop the current running scrubbing job as well)

cellcli –e alter cell hardDiskScrubInterval=none

 

How to Alter Harddisk Scrub Activity frequency on Exadata Cell node ?

cellcli –e alter cell hardDiskScrubInterval=daily

cellcli –e alter cell hardDiskScrubInterval=weekly

cellcli –e alter cell hardDiskScrubInterval=biweekly

 

How to Alter the Harddisk Scrib Activity Specific Date and time :

cellcli –e alter cell hardDiskScrubStartTime=‘2013-08-07T21:19:22-07:00’


 Is Disk Scrubbing needed on Extreme Flash Cells and Disks

Scrubbing only is necessary on spinning disk cells (High Capacity or older High Performance models) and that its not necessary to configure on EF disks or cells. 

 

Flash cache:

1. What is Write back Flash cache?

 

Write back flash cache provides the ability to cache write I/Os directly to PCI flash in addition to read I/Os.

Exadata storage software version 11.2.3.2.1 is the minimum version required to use write back flash cache.

2. What software and hardware versions are supported?

Write-back Smart Flash Cache requirement has been amended to state that both grid infrastructure and database homes must run 11.2.0.3.9 or later.

Database homes running 11.2.0.2 that cannot promptly upgrade to 11.2.0.3 to meet the amended requirement must install Patch 17342825.

As long as the minimum software requirements are met, any Exadata hardware with flashcache (V2 and later) can take advantage of this feature.

Since April 2017, Oracle Exadata Deployment Assistant (OEDA) enables Write-Back Flash Cache by default if the following conditions are met:

1. GI and DB home must be

 

11.2.0.4.1 or higher

12.1.0.2 or higher

12.2.0.2 or higher

AND

 

2. DATA diskgroup has HIGH redundancy

Additional Notes:

Extreme Flash is always checked by default and cannot be modified.

A user can check/uncheck the WBFC checkbox if not using Extreme Flash to manually disable


3. When should I use the write back flash cache?

If your application writes intensively and if you find significant waits for "free buffer waits" or high IO times to check for write bottleneck in AWR reports, then you should consider using the write back flash cache


4. How to determine if you have write back flash cache enabled?

Execute:

#dcli -g ~/cell_group -l root cellcli -e "list cell attributes flashcachemode"

 

Results:

flashCacheMode: WriteBack  -> write back flash cache is enabled

flashCacheMode: WriteThrough  -> write back flash cache is not enabled

 

5. How can I enable the write back flash cache?

Preparation:

Before performing the below steps, you should perform some prerequisite checks to ensure that all cells are in the proper state.  Please perform the following check as root from one of the compute nodes:

 

Check all grdidksk asmdeactivationoutcome and asmmodestatus to ensure that all griddisks on all cells are “Yes” and “ONLINE” respectively.

# dcli -g cell_group -l root cellcli -e list griddisk attributes asmdeactivationoutcome, asmmodestatus

 

Check that all of the flashcache are in the “normal” state and that no flash disks are in a degraded or critical state:

# dcli -g cell_group -l root cellcli -e list flashcache detail

 You may instead use the setWBFC.sh script (version 1.0.0.21_20160602) to enable or disable Writeback Flash Cache on the Exadata Machine DB nodes or Sun Supercluster DB nodes. This script now supports X5-2 High Capacity storage storage cells. This script automates all of the steps described below and supports rolling and non-rolling. Please refer to the readme for further details.

 The script is supported for Storage cell software versions:

 - 11.2.3.2.1 and above

- 11.2.3.3.1 enhancements that no longer require cellsrv service shutdown and restart.  See README.

- 12.1.1.1.1

- 12.1.2.1.0 (Except for X5-2 with Extreme Flash as WBFC will be enabled automatically)

 

GI versions

-  11.2.0.3.9 and above

-  12.1

 Enabling  Write Back Flash Cache when Storage cell software version is 11.2.3.3.1 or higher

Note:

 

With 11.2.3.3.1 or higher it is not required to stop cellsrv process or inactivate griddisk.

To reduce performance impact on the application, execute the change during a period of reduced workload.

alidate all the Physical Disks are in NORMAL state before modifying FlashCache. The following command should return no rows.

# dcli -l root -g cell_group cellcli -e "list physicaldisk attributes name,status"|grep -v normal

 

2. Drop the flash cache

# dcli -l root -g cell_group cellcli -e drop flashcache

3. Set flashCacheMode attribute to writeback


# dcli -l root -g cell_group cellcli -e "alter cell flashCacheMode=writeback"

4. Re-create the flash cache

# dcli -l root -g cell_group cellcli -e create flashcache all

5. Check attribute flashCacheMode is WriteBack:

# dcli -l root -g cell_group cellcli -e list cell detail | grep flashCacheMode

6. Validate griddisk attributes cachingPolicy and cachedby

 

# cellcli -e list griddisk attributes name,cachingpolicy,cachedby

Enabling Write Back Flash Cache when Storage cell software version is  lower than 11.2.3.3.1

Note:  With versions lower than 11.2.3.3.1, to make changes to the FlashCachemode, it requires to stop the cellsrv process.  That introduce the option of making the changes cell by cell (rolling) or all cells (non rolling) having CRS down.

A. Enable Write Back Flash Cache using a ROLLING method

(RDBMS & ASM instance is up - enabling write-back flashcache one cell at a time)

     Log onto the first cell that you wish to enable write-back FlashCache

 1. Drop the flash cache on that cell

# cellcli -e drop flashcache

 2. Check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for the grid disks being listed:

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

 

3. Inactivate the griddisk on the cell

# cellcli –e alter griddisk all inactive

 

4. Shut down cellsrv service

# cellcli -e alter cell shutdown services cellsrv

 

5. Set the cell flashcache mode to writeback

# cellcli -e "alter cell flashCacheMode=writeback"

 

6. Restart the cellsrv service

# cellcli -e alter cell startup services cellsrv

 

7. Reactivate the griddisks on the cell

# cellcli -e alter griddisk all active

 

8. Verify all grid disks have been successfully put online using the following command:

# cellcli -e list griddisk attributes name, asmmodestatus

 

9. Recreate the flash cache

# cellcli -e create flashcache all

 

10. Check the status of the cell to confirm that it's now in WriteBack mode:

# cellcli -e list cell detail | grep flashCacheMode

 

11. Repeat these same steps again on the next cell. However, before taking another storage server offline, execute the following making sure 'asmdeactivationoutcome' displays YES:

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

 

 B . Enable Write Back Flash Cache using a NON-ROLLING method

    (RDBMS & ASM instances are down while enabling write-back flashcache)

 

1. Drop the flash cache on that cell

# cellcli -e drop flashcache

 

2. Shut down cellsrv service

# cellcli -e alter cell shutdown services cellsrv

 

3. Set the cell flashcache mode to writeback

# cellcli -e "alter cell flashCacheMode=writeback"

 

4. Restart the cellsrv service

# cellcli -e alter cell startup services cellsrv

 

5. Recreate the flash cache

# cellcli -e create flashcache all

 6. How can I disable the write back flash cache?

 * Please note, disabling write back flash cache is not a typical operation and should only be done under the guidance of Oracle Support.

Disabling the write back flash cache requires flushing the dirty blocks to disk before changing the “flashcacheMode” to “writethrough”.  The flush can be performed in parallel across all cells using the dcli command.  Once the flush begins, all caching to the flash cache is stopped.  Therefore, applications will experience some performance impact that will vary depending upon the nature of the workload. The following steps are common to all cells and can be performed in parallel use the dcli utility from one of the compute nodes as shown below.

 Disabling Write Back Flash Cache when Storage cell software version is 11.2.3.3.1 or higher

 Note:

 With 11.2.3.3.1 or higher there is no need to stop cellsrv process or inactivate griddisk.

To reduce performance impact on the application, execute the change during a reduce workload.

 1. Validate all the Physical Disks are in NORMAL state before modifying FlashCache. The following command should return no rows.

 # dcli -l root -g cell_group cellcli -e “list physicaldisk attributes name,status”|grep –v normal

 2. Determine amount of dirty data in the flash cache. 

 # cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 

3. Flush the flash cache 

 

# dcli -g cell_group -l root cellcli -e "alter flashcache all flush"

 

4. Check the progress of the flushing of flash cache 

 

The flushing process is complete when FC_BY_DIRTY is 0 MB

 

# dcli -g cell_group -l root cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 or flushstatus attribute is "Completed"

 # dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD

 5. If flushing the cache is complete, drop the flash cache

 

# dcli -g cell_group -l root cellcli -e drop flashcache

 

6. Set the flashCacheMode attribute to writethrough   

 

# dcli -g cell_group -l root cellcli -e "alter cell flashCacheMode=writethrough"

 

7. Re-create the flash cache  

 # dcli -g cell_group -l root cellcli -e create flashcache all

 8. Check the attribute flashcacheMode is Writethrough

# dcli -g cell_group -l root cellcli -e list cell detail | grep flashCacheMode

  Disabling Write Back Flash Cache when Storage cell software version is  lower than 11.2.3.3.1

Note: With versions lower than 11.2.3.3.1, to make changes to the FlashCachemode, it requires to stop the cellsrv process. That introduce the option of making the changes cell by cell (rolling) or all cells (non rolling) having CRS down.

 A.  Disable Write Back Flash Cache using a ROLLING method

   (RDBMS & ASM instance is up - disabling write-back flashcache one cell at a time)

 

1. Check griddisk status by verifying the griddisk attribute “asmdeactivationoutcome” = “Yes” for all griddisks on this cell.  Do not proceed if a griddisk is returned using the following command.  The following command should return no rows.

# dcli  -g cell_group -l root cellcli -e "list griddisk where asmdeactivationoutcome != 'Yes' attributes name,asmmodestatus,asmdeactivationoutcome"

 2. Determine amount of bytes to be flushed by determining how much is dirty is in the flash cache.  This will provide the number of bytes of data that needs to be de-staged to disk per cell, which will give an indication of how long the flush will take. 

# dcli -g cell_group -l root cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 3. Flush the flashcache across all cells

 

To perform this step, it is recommended to have two separate sessions: one to execute the command below and the other to monitor its progression in the next step.

 Issue the following command to begin the flush in one of the two sessions:

# dcli -g cell_group -l root cellcli -e "alter flashcache all flush"

 If any errors occur, they will be displayed in this session, otherwise, this session will show a successful flush across all cells.

4. Check the flush status across all cells

 

a. Execute the following command every few minutes in the second session to monitor the progress.  As dirty blocks are de-staged to disk, this count will reduce to zero (0).  This will take some time and you can determine a time estimate as you execute the following command over time: 

# dcli -g cell_group -l root cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 

b. The following command should return  "working" for each flash disk on each cell while the cache is being flushed and "completed" when it is finished. Execute the following command in the second session:

# dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD

 DO NOT CONTINUE UNTIL flushstatus is COMPLETE for all the celldisks

The following steps are to be executed individually on each cell, one at a time.  All steps that must be performed directly on a cell use the cellcli utility.

 Log onto the first cell that will have the write back flash cache disabled.

5. Drop the flashcache for this cell after the flush completes

# cellcli -e drop flashcache

 

6. Inactivate all griddisks on the cell

# cellcli -e alter griddisk all inactive

 

7. Shut down the cellsrv service

# cellcli -e alter cell shutdown services cellsrv

 

8. Reset the cell flash cache state to writethrough

# cellcli -e "alter cell flashCacheMode=writethrough"

 

9. Restart the cellsrv service

# cellcli -e alter cell startup services cellsrv

 

10. Reactivate the griddisks on the cell

# cellcli -e alter griddisk all active

11. Recreate the flash cache

# cellcli -e create flashcache all

12. Check the status of this cell flash cache state

# cellcli -e list cell detail | grep flashCacheMode

13. Check the griddisks of the cell

Before moving on to the next cell, check the attribute “asmModestatus” of all of the griddisks and make sure they are all “ONLINE” and the attribute “asmdeactivationoutcome” is set to “Yes”.  It may be necessary to execute the following command several times until the “asmModestatus” shows “ONLINE”. 

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

 B. Disable Write Back Flash Cache using a NON-ROLLING method

   (RDBMS & ASM instances are down while disabling write-back flashcache)

To reduce the time for a total outage, the flashcache flush operation can be performed in advance of shutting down the entire cluster. 

1. Determine amount of bytes to be flushed by determining how much is dirty in the flash cache. 

# dcli -g cell_group -l root cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 2. Flush the flashcache across all cells

 *To perform this step, it is recommended to have two separate sessions: one to execute the command below and the other to monitor its progression in the next step.

Issue the following command to begin the flush in one of the two sessions:

# dcli -g cell_group -l root cellcli -e "alter flashcache all flush"

If any errors occur, they will be displayed in this session, otherwise, this session will show a successful flush across all cells.

Once the flush begins, all caching to the flash cache is stopped.  Therefore, applications will experience some performance impact that will vary depending upon the nature of the workload.

 3. Check the flush status across all cells 

 a. Execute the following command in the second session every few minutes to monitor the progress.  As dirty blocks are de-staged to disk, this count will reduce to zero (0).  This will take some time and you can determine a time estimate as you execute the following command over time: 

# dcli -g cell_group -l root cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

 

b. Check the status. For each flash disk it will return "working" while the cache is being flushed, and "Completed" when it is finished:

# dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD

 4. Shutdown all databases and the entire cluster

 Ensure the entire cluster is shut down (all databases and CRS).  Not doing so will result in a crash of the all databases and the cluster when the cellsrv service is shut down and restarted.  It is recommended to shut down each database with the srvctl command.  However, you can shut down the entire cluster at once logged on as root on one of the compute nodes as follows:

 

# cd $GRID_HOME/bin

# ./crsctl stop cluster -all

 $GRID_HOME refers to the location where the Grid Infrastructure is installed.  Typically: /u01/app/11.2.0.3/grid.

 5. Drop the flash cache across all cells after flush completes

# dcli -g cell_group -l root cellcli -e drop flashcache

 6. Shut down the cellsrv service across all cells

# dcli -g cell_group -l root cellcli -e alter cell shutdown services cellsrv

 7. Reset the cell flashcache state to writethrough across all cells

# dcli -g cell_group -l root cellcli -e "alter cell flashCacheMode=writethrough"

8. Restart the cellsrv service across all cells

# dcli -g cell_group -l root cellcli -e alter cell startup services cellsrv

9. Recreate the flash cache across all cells

# dcli -g cell_group -l root cellcli -e create flashcache all

 

10. Check the status of all cells’ flash cache state

# dcli -g cell_group -l root cellcli -e list cell detail | grep flashCacheMode

 The flashCacheMode state of all cells should now be in writethrough

 

11. Start up the cluster and all databases.  As root on one of the compute nodes, issue the following command:

 # cd $GRID_HOME/bin

# ./crsctl start cluster -all

 

7. What is the performance benefit we can get by enabling the write back flash cache?

    Write-back flash cache significantly improves the write intensive operations because writing to flash cache is faster than writing to Hard disks.

   Depending on your application, on X3-2 machines write performance can be improved up to 20X IOPS than disk and 10X more write IOPs than to disk on V2 and X2

  8. Can I disable write back Flash cache on certain disk groups?

  Yes, you can disable cell level flash caching for grid disks that do not need it when using Write Back Flash Cache. For further details, refer to our best practice recommendations here

 9. Is Write-back flash cache persistent upon reboots?

   Yes

Using Exadata storage software version 11.2.3.2.1, Exadata Smart Flash Cache is persistent across Exadata Storage server restarts.

 

 

 

No comments:

Post a Comment