Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Wednesday, December 03, 2014

Exadata: Defining the Threshold for Exadata Cell

In Exadata an alert is automatically triggered when a predefined hardware or software issue is detected, or when a metric exceeds a threshold. By default, there are no thresholds defined but you can define your own if you want.


1- List the thresholds currently defined on the Exadata cell.

CellCLI> list threshold
CellCLI>

2- The LIST ALERTDEFINITION command displays all available sources of the alerts on the cell. You can use this list to remind yourself which metrics can have thresholds associated with them.

CellCLI> list alertdefinition 
CellCLI> list alertdefinition cl_fsut detail 

3- Create a warning threshold for file system utilization on the root (/) file system.

CellCLI> list metriccurrent cl_fsut
         CL_FSUT         "/"                     63 %
         CL_FSUT         "/boot"                 28 %
         CL_FSUT         "/dev/shm"              0 %
         CL_FSUT         "/opt/oracle"           36 %
         CL_FSUT         "/var/log/oracle"       7 %

CellCLI> list metriccurrent cl_fsut detail
         name:                   CL_FSUT
         alertState:             normal
         collectionTime:         2014-12-02T13:33:50+03:00
         metricObjectName:       "/"
         metricType:             Instantaneous
         metricValue:            63 %
         objectType:             CELL_FILESYSTEM

         name:                   CL_FSUT
         alertState:             normal
         collectionTime:         2014-12-02T13:33:50+03:00
         metricObjectName:       "/boot"
         metricType:             Instantaneous
         metricValue:            28 %
         objectType:             CELL_FILESYSTEM

         name:                   CL_FSUT
         alertState:             normal
         collectionTime:         2014-12-02T13:33:50+03:00
         metricObjectName:       "/dev/shm"
         metricType:             Instantaneous
         metricValue:            0 %
         objectType:             CELL_FILESYSTEM

         name:                   CL_FSUT
         alertState:             normal
         collectionTime:         2014-12-02T13:33:50+03:00
         metricObjectName:       "/opt/oracle"
         metricType:             Instantaneous
         metricValue:            36 %
         objectType:             CELL_FILESYSTEM

         name:                   CL_FSUT
         alertState:             normal
         collectionTime:         2014-12-02T13:33:50+03:00
         metricObjectName:       "/var/log/oracle"
         metricType:             Instantaneous
         metricValue:            7 %
         objectType:             CELL_FILESYSTEM

Set the warning level to a value slightly larger than the utilization you observe above.

CellCLI> create threshold cl_fsut."/" comparison='>', warning=64
Threshold cl_fsut."/" successfully created

CellCLI>

4- View the newly created threshold definition. After this exit from cellcli.

CellCLI> list threshold detail
         name:                   cl_fsut./
         comparison:             >
         warning:                64.0

5- On the OS prompt, execute the following command inside the cell operating system. It creates a 512 MB file on the root file system, which will increase the utilization metric. After the metric crosses the threshold you defined above an alert will be generated.

[root@pk3-iub-cel-es01 ~]# dd if=/dev/zero of=/tmp/file.out > bs=1024 count=500000
500000+0 records in
500000+0 records out
256000000 bytes (256 MB) copied, 2.39039 seconds, 107 MB/s
[root@pk3-iub-cel-es01 ~]#


6- Relaunch CellCLI and execute the LIST ALERTHISTORY command.

CellCLI> list alerthistory
         31_1    2014-10-17T02:01:06+03:00       info            "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours.  Note that many learn cycles do not require entering WriteThrough caching mode.  When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 31 C  Full Charge Capacity  : 1264 mAh  Relative Charge       : 100 %  Ambient Temperature   : 16 C"
         31_2    2014-10-17T07:51:08+03:00       clear           "All disk drives are in WriteBack caching mode.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 35 C  Full Charge Capacity  : 1256 mAh  Relative Charge       : 55 %  Ambient Temperature   : 17 C"
         32_1    2014-12-02T14:20:50+03:00       warning         "The warning threshold for the following metric has been crossed. Metric Name        : CL_FSUT  Metric Description : Percentage of total space on this file system that is currently used  Object Name        : /  Current Value      : 65.0 %  Threshold Value    : 64.0 %  "

CellCLI>

CellCLI> list alerthistory detail
         name:                   31_1
         alertMessage:           "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours.  Note that many learn cycles do not require entering WriteThrough caching mode.  When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 31 C  Full Charge Capacity  : 1264 mAh  Relative Charge       : 100 %  Ambient Temperature   : 16 C"
         alertSequenceID:        31
         alertShortName:         Hardware
         alertType:              Stateful
         beginTime:              2014-10-17T02:01:06+03:00
         endTime:                2014-10-17T07:51:08+03:00
         examinedBy:
         metricObjectName:       LUN_LEARN_CYCLE_ALERT
         notificationState:      0
         sequenceBeginTime:      2014-10-17T02:01:06+03:00
         severity:               info
         alertAction:            Informational.

         name:                   31_2
         alertMessage:           "All disk drives are in WriteBack caching mode.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 35 C  Full Charge Capacity  : 1256 mAh  Relative Charge       : 55 %  Ambient Temperature   : 17 C"
         alertSequenceID:        31
         alertShortName:         Hardware
         alertType:              Stateful
         beginTime:              2014-10-17T07:51:08+03:00
         endTime:                2014-10-17T07:51:08+03:00
         examinedBy:
         metricObjectName:       LUN_LEARN_CYCLE_ALERT
         notificationState:      0
         sequenceBeginTime:      2014-10-17T02:01:06+03:00
         severity:               clear
         alertAction:            Informational.

         name:                   32_1
         alertMessage:           "The warning threshold for the following metric has been crossed. Metric Name        : CL_FSUT  Metric Description : Percentage of total space on this file system that is currently used  Object Name        : /  Current Value      : 65.0 %  Threshold Value    : 64.0 %  "
         alertSequenceID:        32
         alertShortName:         CL_FSUT
         alertType:              Stateful
         beginTime:              2014-12-02T14:20:50+03:00
         endTime:
         examinedBy:
         metricObjectName:       "/"
         metricValue:            65.0
         notificationState:      1
         sequenceBeginTime:      2014-12-02T14:20:50+03:00
         severity:               warning
         alertAction:            "Examine the metric value that is violating the specified threshold, and take appropriate actions if needed."

CellCLI>


If you have configured the mail you will be receiving mail like below 

7- Delete the file created above to get the space again
[root@pn3-esk-cel-es01 ~]# rm /tmp/file.out
As soon as you delete this file threshold value is below, you will receive mail if mail is configured like below

8- Relaunch CellCLI and examine the file system utilization and confirm that the root (/) file system utilization has fallen back below the warning threshold. If the metric still exceeds the warning threshold, re-execute the command periodically until the metric value is updated.
CellCLI> list metriccurrent cl_fsut
         CL_FSUT         "/"                     63 %
         CL_FSUT         "/boot"                 28 %
         CL_FSUT         "/dev/shm"              0 %
         CL_FSUT         "/opt/oracle"           36 %
         CL_FSUT         "/var/log/oracle"       7 %

CellCLI>

9- Re-execute LIST ALERTHISTORY,  alert should be listed as cleared.
CellCLI> list alerthistory
         31_1    2014-10-17T02:01:06+03:00       info            "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours.  Note that many learn cycles do not require entering WriteThrough caching mode.  When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 31 C  Full Charge Capacity  : 1264 mAh  Relative Charge       : 100 %  Ambient Temperature   : 16 C"
         31_2    2014-10-17T07:51:08+03:00       clear           "All disk drives are in WriteBack caching mode.  Battery Serial Number : 6297  Battery Type          : iBBU08  Battery Temperature   : 35 C  Full Charge Capacity  : 1256 mAh  Relative Charge       : 55 %  Ambient Temperature   : 17 C"
         32_1    2014-12-02T14:20:50+03:00       warning         "The warning threshold for the following metric has been crossed. Metric Name        : CL_FSUT  Metric Description : Percentage of total space on this file system that is currently used  Object Name        : /  Current Value      : 65.0 %  Threshold Value    : 64.0 %  "
         32_2    2014-12-02T14:31:50+03:00       clear           "The warning threshold for the following metric has been cleared. Metric Name        : CL_FSUT  Metric Description : Percentage of total space on this file system that is currently used  Object Name        : /  Current Value      : 63.0 %  Threshold Value    : 64.0 %  "

CellCLI>


No comments: