Oracle CRS troubleshooting
This post is related to CRS 11.1.0.7, but concept and fundamentals
remains same in all versions. I am writing this because this is what I
faced in 11.1.0.7 after applying a PSU patch and running root.sh
followed by patch application.
Problem Description:
DBAs often face a problem where crs_stat -t (or crsctl stat res -t in 11gR2 or later) doesnt gives the output or CRS doesnt comes up after patching. or CRS comes up but doesn’t display its registered services. I faced this issue with a 3 node cluster on Linux 5.11. Plan was to upgrade CRS from 11.1.0.7 to 11.2.0.4 and latest PSU was required to be applied on 11.1.0.7 as a prereq of upgrade. PSU (11724953) was applied successfully but got following errors while running postrootpatch.sh:
./postrootpatch.sh -crshome /grid/app/oracle/product/11.1.0./crs
Checking to see if Oracle CRS stack is already up…
Checking to see if Oracle CRS stack is already starting
Startup will be queued to init within 30 seconds.
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
Then I did the following steps as suggested by Oracle support :
run <CRS_HOME>/install/rootdelete.sh, it will remove the init* scripts and place back the blank inittab
<CRS_HOME>/install/rootdelete.sh
run <CRS_HOME>/install/rootdeinstall.sh, it will blank out the $ORACLE_HOME/cdata/localhost/local.ocr and remove the ocr.loc <CRS_HOME>/install/rootdeinstall.sh
run <CRS_HOME>/root.sh, CRS should start automatically after this.
<CRS_HOME>/root.sh
Confirm that the Node Clusterware has started successfully
crs_stat -t
Only if all looks Ok in Step 4 repeat for next node
But no use, Then Oracle provided another plan :
Check the permissions of /grid/app/oracle/product
ls -al /grid/app/oracle/product – You should see Oracle user doesnt have permissions i.e. its likely set to 700
Change the permission of directory /grid/app/oracle/product to 777 i.e. chmod 777 /grid/app/oracle/product
Rerun the delete command
./rootdeinstall.sh
Check permissions again and make sure Oracle user has permissions i.e. its not just set to ROOT (700)
ls -al /grid/app/oracle/product
Now run root.sh again i.e.
But this didn’t help, rather as soon as I use to ran root.sh, server use to reboot and never came up. Actually to bring the server up, I had to start the server in single user mode and comment CRS starting scripts in init.d and then started the server in normal mode. This was a serious permission issue.
Then I repeated the above steps and ran root.sh in debug mode (sh –x root.sh). It displayed hell lot of output but in the end it displayed :
/grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
Then I changed the permissions as follows :
Set the ownership of all directories to be owned by oracle i.e.
chowm -R oracle:dba /grid/app/oracle/product/11.1.0./crs
./rootdeinstall.sh
./root.sh
It fixed the issue and brought CRS up on 1 node: ps –ef |grep d.bin started showing the nodeapps services up and running. But …
crs_stat command gave following error :
PRKH-1010 : Unable to communicate with CRS services.
Checked various log files — cssd.log, evemd.log, crsd.log). Although there were many error messages but those were not clear.
Then tried I set the trace level to 2 and tried to manually start nodeapps on 1 node as follows:
[oracle@ldsfsxs012q ~]$ export SRVM_TRACE=true
[oracle@ldsfsxs012q ~]$ srvctl start nodeapps -n <hostname>
This gave following output/error:
[main] [10:50:33:911] [OPSCTLDriver.setInternalDebugLevel:173] tracing is true at level 2 to file null
[main] [10:50:33:911] [OPSCTLDriver.main:116] SRVCTL arguments : args[0]=start args[1]=nodeapps args[2]=-n args[3]=<nodename>
[main] [10:50:33:918] [OPSCTLDriver.<init>:96] Security manager is set
[main] [10:50:33:924] [CommandLineParser.parse:193] parsing cmdline args
[main] [10:50:33:924] [CommandLineParser.parse2WordCommandOptions:981] parsing 2-word cmdline
[main] [10:50:33:949] [HASContext.getInstance:199] Module init : 16
[main] [10:50:33:949] [HASContext.getInstance:222] Local Module init : 19
[main] [10:50:33:949] [HASContext.<init>:92] moduleInit = 19
[main] [10:50:33:959] [Library.getInstance:106] Created instance of Library.
[main] [10:50:33:959] [Library.load:206] Loading libsrvmhas11.so…
[main] [10:50:33:960] [Library.load:212] oracleHome /ora/app/oracle/product/11.1.0/db_1
[main] [10:50:33:960] [sPlatform.isHybrid:63] osName=Linux osArch=amd64 JVM=64 rc=false
[main] [10:50:33:960] [Library.load:238] Loading library /ora/app/oracle/product/11.1.0/db_1/lib/libsrvmhas11.so
[main] [10:50:33:967] [Library.load:262] Loaded library /ora/app/oracle/product/11.1.0/db_1/lib/libsrvmhas11.so from path=
/ora/app/oracle/product/11.1.0/db_1/lib
[main] [10:50:33:968] [has.HASContextNative.Native] prsr_trace: no lsf ctx, line=Native: allocHASContext
[main] [10:50:33:968] [has.HASContextNative.Native]
allocHASContext: Came in
[main] [10:50:33:968] [has.HASContextNative.Native] allocHASContext: module_init = 19
[main] [10:50:33:968] [has.HASContextNative.Native]
allocHASContext: META context [1]
[main] [10:50:33:969] [has.HASContextNative.Native]
allocHASContext: LSF context [1]
[main] [10:50:33:969] [has.HASContextNative.Native] prsr_trace: Native: prsr_initCLSS
[main] [10:50:35:617] [has.HASContextNative.Native] prsr_trace: clsc_connect: (0x2b8e60164920) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ldsfsxs012q_))
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: Native: clss error 3
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: Native: prsr_freeCLSS
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: prsr_throwException: oracle/ops/mgmt/has/HASContextException[Communications Error–Native: prsr_initCLSS]
oracle.ops.mgmt.cluster.ClusterException: PRKC-1056 : Failed to get the hostname for node <nodename>
PRKH-1010 : Unable to communicate with CRS services.
And on rest 2 nodes also CRS was not coming up at all.
Solution was to actually reconfigure voting disk as follows because Voting disk was corrupted:
./crsctl stop crs –f
./crsctl query css votedisk
./crsctl delete css votedisk /dev/raw/raw4 –force
Successful deletion of voting disk /dev/raw/raw4.
./crsctl add css votedisk /dev/raw/raw4 –force
./crsctl start crs
After this CRS comes up successfully.
Then manually add node for all 3 nodes:
srvctl add nodeapps -n <node1_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
srvctl add nodeapps -n <node2_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
srvctl add nodeapps -n <node3_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
Public IP address, subnet mask and interface name can be seen by “ifconfig -a” command. I am not giving any hostnames or IP addresses in this blog due to security reasons.
So once this is done start nodeapps as root user:
./srvctl start nodeapps -n <node1>
./srvctl start nodeapps -n <node2>
./srvctl start nodeapps -n <node3>
—–>>>> Super !! this comeup without any issues. <<<<——
Then add asm as follows:
./srvctl add asm -n <node1> -i +ASM1 -o /ora/app/oracle/product/11.1.0/asm
./srvctl add asm -n <node2> -i +ASM2 -o /ora/app/oracle/product/11.1.0/asm
./srvctl add asm -n <node3> -i +ASM3 -o /ora/app/oracle/product/11.1.0/asm
Then start asm. Similarly add database, instance, listeners etc.
Problem Solved !!!!!
Problem Description:
DBAs often face a problem where crs_stat -t (or crsctl stat res -t in 11gR2 or later) doesnt gives the output or CRS doesnt comes up after patching. or CRS comes up but doesn’t display its registered services. I faced this issue with a 3 node cluster on Linux 5.11. Plan was to upgrade CRS from 11.1.0.7 to 11.2.0.4 and latest PSU was required to be applied on 11.1.0.7 as a prereq of upgrade. PSU (11724953) was applied successfully but got following errors while running postrootpatch.sh:
./postrootpatch.sh -crshome /grid/app/oracle/product/11.1.0./crs
Checking to see if Oracle CRS stack is already up…
Checking to see if Oracle CRS stack is already starting
Startup will be queued to init within 30 seconds.
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
/bin/sh: /grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
Then I did the following steps as suggested by Oracle support :
run <CRS_HOME>/install/rootdelete.sh, it will remove the init* scripts and place back the blank inittab
<CRS_HOME>/install/rootdelete.sh
run <CRS_HOME>/install/rootdeinstall.sh, it will blank out the $ORACLE_HOME/cdata/localhost/local.ocr and remove the ocr.loc <CRS_HOME>/install/rootdeinstall.sh
run <CRS_HOME>/root.sh, CRS should start automatically after this.
<CRS_HOME>/root.sh
Confirm that the Node Clusterware has started successfully
crs_stat -t
Only if all looks Ok in Step 4 repeat for next node
But no use, Then Oracle provided another plan :
Check the permissions of /grid/app/oracle/product
ls -al /grid/app/oracle/product – You should see Oracle user doesnt have permissions i.e. its likely set to 700
Change the permission of directory /grid/app/oracle/product to 777 i.e. chmod 777 /grid/app/oracle/product
Rerun the delete command
./rootdeinstall.sh
Check permissions again and make sure Oracle user has permissions i.e. its not just set to ROOT (700)
ls -al /grid/app/oracle/product
Now run root.sh again i.e.
But this didn’t help, rather as soon as I use to ran root.sh, server use to reboot and never came up. Actually to bring the server up, I had to start the server in single user mode and comment CRS starting scripts in init.d and then started the server in normal mode. This was a serious permission issue.
Then I repeated the above steps and ran root.sh in debug mode (sh –x root.sh). It displayed hell lot of output but in the end it displayed :
/grid/app/oracle/product/11.1.0./crs/bin/crsctl: Permission denied
Then I changed the permissions as follows :
Set the ownership of all directories to be owned by oracle i.e.
chowm -R oracle:dba /grid/app/oracle/product/11.1.0./crs
./rootdeinstall.sh
./root.sh
It fixed the issue and brought CRS up on 1 node: ps –ef |grep d.bin started showing the nodeapps services up and running. But …
crs_stat command gave following error :
PRKH-1010 : Unable to communicate with CRS services.
Checked various log files — cssd.log, evemd.log, crsd.log). Although there were many error messages but those were not clear.
Then tried I set the trace level to 2 and tried to manually start nodeapps on 1 node as follows:
[oracle@ldsfsxs012q ~]$ export SRVM_TRACE=true
[oracle@ldsfsxs012q ~]$ srvctl start nodeapps -n <hostname>
This gave following output/error:
[main] [10:50:33:911] [OPSCTLDriver.setInternalDebugLevel:173] tracing is true at level 2 to file null
[main] [10:50:33:911] [OPSCTLDriver.main:116] SRVCTL arguments : args[0]=start args[1]=nodeapps args[2]=-n args[3]=<nodename>
[main] [10:50:33:918] [OPSCTLDriver.<init>:96] Security manager is set
[main] [10:50:33:924] [CommandLineParser.parse:193] parsing cmdline args
[main] [10:50:33:924] [CommandLineParser.parse2WordCommandOptions:981] parsing 2-word cmdline
[main] [10:50:33:949] [HASContext.getInstance:199] Module init : 16
[main] [10:50:33:949] [HASContext.getInstance:222] Local Module init : 19
[main] [10:50:33:949] [HASContext.<init>:92] moduleInit = 19
[main] [10:50:33:959] [Library.getInstance:106] Created instance of Library.
[main] [10:50:33:959] [Library.load:206] Loading libsrvmhas11.so…
[main] [10:50:33:960] [Library.load:212] oracleHome /ora/app/oracle/product/11.1.0/db_1
[main] [10:50:33:960] [sPlatform.isHybrid:63] osName=Linux osArch=amd64 JVM=64 rc=false
[main] [10:50:33:960] [Library.load:238] Loading library /ora/app/oracle/product/11.1.0/db_1/lib/libsrvmhas11.so
[main] [10:50:33:967] [Library.load:262] Loaded library /ora/app/oracle/product/11.1.0/db_1/lib/libsrvmhas11.so from path=
/ora/app/oracle/product/11.1.0/db_1/lib
[main] [10:50:33:968] [has.HASContextNative.Native] prsr_trace: no lsf ctx, line=Native: allocHASContext
[main] [10:50:33:968] [has.HASContextNative.Native]
allocHASContext: Came in
[main] [10:50:33:968] [has.HASContextNative.Native] allocHASContext: module_init = 19
[main] [10:50:33:968] [has.HASContextNative.Native]
allocHASContext: META context [1]
[main] [10:50:33:969] [has.HASContextNative.Native]
allocHASContext: LSF context [1]
[main] [10:50:33:969] [has.HASContextNative.Native] prsr_trace: Native: prsr_initCLSS
[main] [10:50:35:617] [has.HASContextNative.Native] prsr_trace: clsc_connect: (0x2b8e60164920) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ldsfsxs012q_))
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: Native: clss error 3
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: Native: prsr_freeCLSS
[main] [10:50:35:618] [has.HASContextNative.Native] prsr_trace: prsr_throwException: oracle/ops/mgmt/has/HASContextException[Communications Error–Native: prsr_initCLSS]
oracle.ops.mgmt.cluster.ClusterException: PRKC-1056 : Failed to get the hostname for node <nodename>
PRKH-1010 : Unable to communicate with CRS services.
And on rest 2 nodes also CRS was not coming up at all.
Solution:
Solution was to actually reconfigure voting disk as follows because Voting disk was corrupted:
./crsctl stop crs –f
./crsctl query css votedisk
./crsctl delete css votedisk /dev/raw/raw4 –force
Successful deletion of voting disk /dev/raw/raw4.
./crsctl add css votedisk /dev/raw/raw4 –force
./crsctl start crs
After this CRS comes up successfully.
Then manually add node for all 3 nodes:
srvctl add nodeapps -n <node1_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
srvctl add nodeapps -n <node2_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
srvctl add nodeapps -n <node3_name> -A <public address>/<subnet_mask>/<interface_name like bond0 or eth0 etc>
Public IP address, subnet mask and interface name can be seen by “ifconfig -a” command. I am not giving any hostnames or IP addresses in this blog due to security reasons.
So once this is done start nodeapps as root user:
./srvctl start nodeapps -n <node1>
./srvctl start nodeapps -n <node2>
./srvctl start nodeapps -n <node3>
—–>>>> Super !! this comeup without any issues. <<<<——
Then add asm as follows:
./srvctl add asm -n <node1> -i +ASM1 -o /ora/app/oracle/product/11.1.0/asm
./srvctl add asm -n <node2> -i +ASM2 -o /ora/app/oracle/product/11.1.0/asm
./srvctl add asm -n <node3> -i +ASM3 -o /ora/app/oracle/product/11.1.0/asm
Then start asm. Similarly add database, instance, listeners etc.
Problem Solved !!!!!
No comments:
Post a Comment