Tuesday, 27 October 2020

What is Split Brain Syndrome in Oracle RAC?

Split brain syndrome occurs when the Oracle RAC nodes are unable to communicate with each other via private interconnect, but the communication between client and RAC node is maintained. This can cause data Integrity issues  when the same block is read or updated by two nodes and changes done from one node are overwritten by the other node because the block being changed is not locked.

When a node fails, the failed node is prevented from accessing all the shared disk devices and groups. This methodology is called I/O Fencing, Disk Fencing or Failure Fencing.

The node which first detects that one of the node is not accessible will evict that node from the RAC cluster group.This problem is solved by configuring the heartbeat connections through the same communication channels that are used to access the clients.

What causes Node eviction in Oracle RAC?

What is node eviction?

Node eviction in RAC is done when a heartbeat indicates that a node is not responding, 

and the evicted node is re-started to make it a part of cluster.

Causes for RAC node eviction:

Node eviction on Oracle RAC environment can be due to any of the below reasons. 

- A failure of any of the major hardware components (CPU, RAM, network interconnect).

- A server that is experiencing RAM swapping.

- When communications to the voting disk is interrupted, causing the disconnected node to be evicted and re-boot.

- Database or ASM hang condition.

Below is the list of important log files to review in case of a node eviction

- Clusterware alert log

- Database alert log

- CSSD agent logs

- CSSD monitor logs

- System Message logs (/var/log/messages)