Monday, March 6, 2023

Exadata : Increase or decrease the number of activated CPUs / cores (Capacity-On-Demand)

 

It may happen that you will need to decrease or increase the CPUs/cores activated on some Exadata. This feature, known as "Capacity-On-Demand", is available since X4-[28].



Each server we want to increase / decrease the number of cores would have to be rebooted

I will use the rac-status.sh script to check the status of all the running resources before the maintenance and after the maintenance

Keep in mind that there is a minimum and maximum number of cores that can be activated.

It is possible to have a different number of cores activated on database servers part of the same GI -- it is not recommended though


Save the status of the resources before the maintenance

Before any maintenance, I like to save a status of every resource to be able to compare with an after maintenance status to ensure that everything is back to normal after a maintenance and then avoid any unpleasantness.


 ./rac-status.sh -a | tee -a ~/status_before_cpu_change


2/ Ensure that the ~/dbs_group file is up to date

This step is optional, the ~/dbs_group file is supposed to be quite static; I personally like to double check it before an important maintenance to not forget any node.


-- The 2 below commands should return the same output

 ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort

 cat ~/dbs_group


-- if not, update the ~/dbs_group file

ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > ~/dbs_group

 cat ~/dbs_group


3/ Check the current configuration

Here, we are just checking the current configuration to know what we will modify.


 dcli -g ~/dbs_group -l root "dbmcli -e list dbserver attributes corecount, cpucount, pendingCoreCount" | awk 'BEGIN{printf("%10s%10s%10s%10s\n\n", "Node", "Cpu", "Cores", "Pending")} {printf("%10s|%10s|%10s|%10s\n", $1, $2, $3, $4)}'

      Node          Cpu       Cores    Pending


 exadadb01:|     36/36|     72/72|          

 exadadb02:|     36/36|     72/72|          

 exadadb03:|     36/36|     72/72|          

 exadadb04:|     36/36|     72/72|          


It is expected the Pending column to be empty at this stage.



4/ Modify the pending core count

Depending on your needs, you can modify the pending core count on one node or on all the nodes of the Exadata (you can also modify only on some nodes by updating the ~/dbs_group file accordingly)

Here, I will be modifying the number of CPUs to 16 instead of 36 so I will set the number of cores to 32 as this Exadata is a X6-2 (then 2 cores per CPU)


-- To modify the pending core count on one server

dbmcli -e alter dbserver pendingCoreCount = 32 force


-- To modify the pending core count on all the nodes in one command

 dcli -g ~/dbs_group -l root "dbmcli -e alter dbserver pendingCoreCount = 32 force"


5/ Verify the pending core count setting before reboot

We can see here that the Pending column contains our new setting that will be applied at next reboot.


 dcli -g ~/dbs_group -l root "dbmcli -e list dbserver attributes corecount, cpucount, pendingCoreCount" | awk 'BEGIN{printf("%10s%10s%10s%10s\n\n", "Node", "Cpu", "Cores", "Pending")} {printf("%10s|%10s|%10s|%10s\n", $1, $2, $3, $4)}'


      Node          Cpu       Cores    Pending


 exadadb01:|     36/36|     72/72|   32/32 

 exadadb02:|     36/36|     72/72|   32/32

 exadadb03:|     36/36|     72/72|   32/32

 exadadb04:|     36/36|     72/72|   32/32



6/ Reboot

A reboot is needed to apply the changes. Note here that you can balance the database services to a server that won't reboot to avoid any downtime from an application perspective.


reboot


7/ Verify the pending core count setting after reboot

Check that everything looks good after reboot.


 dcli -g ~/dbs_group -l root "dbmcli -e list dbserver attributes corecount, cpucount, pendingCoreCount" | awk 'BEGIN{printf("%10s%10s%10s%10s\n\n", "Node", "Cpu", "Cores", "Pending")} {printf("%10s|%10s|%10s|%10s\n", $1, $2, $3, $4)}'

      Node          Cpu       Cores    Pending


 exadadb01:|     16/36|     32/32|          

 exadadb02:|     16/36|     32/32|          

 exadadb03:|     16/36|     32/32|          

 exadadb04:|     16/36|     32/32|       


8/ Verify the status of the resources after the reboot(s)

Here, we check the status of the resources running and compared with the status before the maintenance to be sure we are idempotent.


./rac-status.sh | tee -a ~/status_after_cpu_change

 diff ~/status_before_cpu_change ~/status_after_cpu_change