- alert log
- ocssd log
- evm log
Tuesday, June 1, 2010
Oracle Clusterware Failures: Useful Logs
To determine why a node failed, try the following clusterware logs (in order of usefulness):
Oracle Clusterware Parameters: misscount, disktimeout, reboottime
Clusterware timeout parameters:
- misscount - It represents maximum time in seconds that, a heartbeat can be missed before entering into a cluster reconfiguration to evict the node.
- disktimeout - It is the maximum amount of time allowed for a voting file I/O to complete; if this time is exceeded the voting disk will be marked as offline.
- reboottime - It is the amount of time allowed for a node to complete a reboot after the CSS daemon has been evicted.
- misscount = 60 seconds
- disktimeout = 200 seconds
- reboottime = 3 seconds
- crsctl get css misscount ---------- to check misscount value
- crsctl get css disktimeout --------- to check disktimeout value
- crsctl get css reboottime ---------- to check reboottime value
- crsctl set css misscount 120 --------- to set misscount to 120 seconds
- crsctl set css disktimeout 200 ------- to set disktimeout to 200 seconds
- crsctl set css reboottime 3 ----------- to set reboottime to 3 seconds
lssnmNMInitialize: misscount set to (30) clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 15000 ms, reconfig start (misscount) 30000 ms clssgmInitCMInfo: Wait for remote node termination set to 13 seconds clssnmNMInitialize: misscount set to (60), impending reconfig threshold set to (56000) clssnmNMInitialize: diskShortTimeout set to (57000)ms clssnmNMInitialize: diskLongTimeout set to (200000)ms clssnmHandleUpdate: diskTimeout set to (200000)ms
Subscribe to:
Posts (Atom)