Sunday, 10 November 2013

Oracle 10g Dataguard Best Practice and setup steps

Oracle 10g Dataguard Best Practice and setup steps


How it works:

As the primary database processes transactions, the ARCH process on the primary database is configured to transmit the redo data to the standby.  On standby, one or more RFS processes are used to receive redo data and MRP process applies the redo data to the standby.  The primary database also has the Fetch Archive Log (FAL) process to provide a client-server mechanism for transmitting archived logs to the standby following a communication loss between the primary and standby for automatic gap resolution and resynchronization.

Configuration Steps:

This section provides step-by-step instructions on how to setup 10g Dataguard.  Additional information has been provided as notes to elaborate or clarify a point. All steps shown were tested on linux.

Primary host   : prod-lnx-01
Standby host   : stdby-linx-01
ORACLE_SID=SILVER
Kernel   :Linux x86 2.4.21-27.EL (CEL 5)
Service names : SILVER_primary, SILVER_stdby

Initial Setup:

Initialization Parameters:

Primary database init.ora:

instance_name = SILVER
DB_UNIQUE_NAME  = 'SILVER_primary'
_LOG_ARCHIVE_CALLOUT='LOCAL_FIRST=TRUE'
##COMMON TO BOTH PRIMARY AND STANDBY ROLES
log_archive_config ='DG_CONFIG=(SILVER_primary, SILVER_stdby)'
log_archive_dest_1='LOCATION=/oracle/archive/SILVER/SILVER.arch VALID_FOR=(ALL_LOGFILES, ALL_ROLES) DB_UNIQUE_NAME=SILVER_primary'
log_archive_dest_2='SERVICE=SILVER_stdby ARCH VALID_FOR=(ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=SILVER_stdby max_connections=4 delay=30'
log_archive_dest_state_1=enable
log_archive_dest_state_2=enable
log_archive_max_processes=20
log_archive_format='%t_%s_%r.dbf'
remote_login_passwordfile=EXCLUSIVE
#SPECIFIC TO STANDBY ROLE
FAL_CLIENT=SILVER_primary
FAL_SERVER=SILVER_stdby
standby_archive_dest=/oracle/archive/SILVER/SILVER.arch
standby_file_management=auto

NOTE: Values for log_archive_max_processes, max_connections and delay are just recommended values and can be changed as applicable to the env based on the amount of generated redo. You can refer to the test cases to get some idea on what the values should be.

Standby database init.ora:
instance_name = SILVER
DB_UNIQUE_NAME   = 'SILVER_stdby'
log_archive_config ='DG_CONFIG=(SILVER_primary, SILVER_stdby)'
log_archive_dest_1='LOCATION=/oracle/archive/SILVER/SILVER.arch VALID_FOR=(ALL_LOGFILES, ALL_ROLES) DB_UNIQUE_NAME=SILVER_stdby'
log_archive_dest_2='SERVICE=SILVER_primary VALID_FOR=(ONLINE_LOGFILES, PRIMARY_ROLE) DB_UNIQUE_NAME=SILVER_primary max_connections=4 delay=30'
log_archive_dest_state_1=enable
log_archive_dest_state_2=enable
log_archive_max_processes=20
log_archive_format='%t_%s_%r.dbf'
remote_login_passwordfile=EXCLUSIVE
#SPECIFIC TO STANDBY ROLE
FAL_CLIENT='SILVER_stdby'
FAL_SERVER='SILVER_primary'
standby_archive_dest='/oracle/archive/SILVER/SILVER.arch'
standby_file_management='auto'
# SPECIFIC TO RECOVERY
parallel_max_servers=16
parallel_execution_message_size=16384
recovery_parallelism=4

Note:
Create just one initialization parameter file that contains parameters used in both Roles (primary / standby). Parameters specific to that role will be used eg: FAL_ parameter will be used only when the database is in standby role. For Log_archive_dest_n the VALID_FOR attribute will differentiate the roles, if not specified the default is (ALL_LOGFILES,ALL_ROLES). This VALID_FOR attribute allows us to use the same initialization parameter file for both the primary and standby roles. This will allow to failover without any changes

Note:
- The parameters specific to recovery should be set only on standby database and  they enhance the performance of the recovery. These specific parameters are not recommended on primary.
- Max_connection setting has issue. Refer Issues mentioned at the start of the document
- Settings like Log_archive_max_processes, delay can vary based on the env needs

Configure Net Services:

Note:

The recommendation is to use ports between 1910 to 1920 for Data Guard traffic for DG listener as there is a requirement  for asynchronous traffic over WAN to be marked for Scavenger class and these ports are included in scavenger class Qos list.

What is Scavenger class?
Scavenger class traffic is defined as limited percent of the total bandwidth available. When there is available bandwidth the scavenger class will exceed its  allocation and is in no way policed or shaped. The Qos policies will kick in only when there is no available bandwidth, and in that scenario the traffic will be limited to that allocation (which isn't often).  What it means is during the periods of congestion, the Scavenger class policies get kicked in and the traffic marked for this class will get limited bandwidth, packets might get dropped etc..prior to impacting the critical transactional data

Verify the TCP send/receive size are set to 4665000 (for improved efficiency in log shipping).
cat /proc/sys/net/core/wmem_max
cat /proc/sys/net/core/rmem_max
Add SDU = 32768 in tnsnames.ora  in both Primary and DR.
Setup separate listener for DG
 
On Primary database host:
/etc/listener.ora
LISTENER_SILVER_DG =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST =erpdb-prd-06)(PORT = 1523))
      )
    )
  )
SID_LIST_LISTENER_SILVER_DG =
  (SID_LIST =
    (SID_DESC =
      (SDU=32768)
      (SID_NAME = SILVER)
      (ORACLE_HOME = /oracle/product/10.2.0.2-64)
    )
  )

/etc/tnsnames.ora
SILVER_stdby =
 (DESCRIPTION =
  (SDU = 32768)
   (ADDRESS_LIST =
       (ADDRESS =
         (PROTOCOL = TCP)
         (HOST =hpquadopt)
         (PORT = 1523)
        )
    )
   (CONNECT_DATA =
     (SERVICE_NAME = SILVER)
   )
)
On standby host:
/etc/listener.ora
LISTENER_SILVER_DG =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST =hpquadopt)(PORT = 1523))
      )
    )
  )
SID_LIST_LISTENER_SILVER_DG =
  (SID_LIST =
    (SID_DESC =
      (SDU=32768)
      (SID_NAME = SILVER)
      (ORACLE_HOME = /oracle/product/10.2.0.2-64)
    )
  )
SILVER_primary =
 (DESCRIPTION =
  (SDU = 32768)
   (ADDRESS_LIST =
       (ADDRESS =
         (COMMUNITY = portal.com)
         (PROTOCOL = TCP)
         (HOST =erpdb-prd-06)
         (PORT = 1523)
        )
    )
   (CONNECT_DATA =
     (SERVICE_NAME = SILVER)
   )
)

Configure Primary Database

Ensure initialization parameters and net services configuration in place (refer initial setup)
Shutdown the Database
Setup Remote OS authentication
Create the password file if not in place already
orapwd file=/oracle/admin/<SID>/pfile/orapw<SID> password=<sys password>
Ensure sym link is created from $ORACLE_HOME/dbs to /oracle/admin/<SID>/pfile/orapw<SID>
Copy the orapw<sid> from primary to same location on standby
Ensure the Database is in archivelog mode
Startup the Database

Configure Standby Database

1. Ensure initialization parameters and net services configuration in place (refer initial setup)
2. Add entry in /etc/oratab. Put automatic startup as N
3. Copy the datafiles and required archives from cold/hot backup to DR server
4. Create the standby controlfile on Primary host
SQL>ALTER DATABASE CREATE STANDBY CONTROFLE AS '/tmp/stdby.ctl';
Copy the stdby.ctl file to the standby host. Make sure the control_file parameter reflects the correct name/path
Startup the standby database
SQL>STARTUP MOUNT
(optional) Create the SRL's (standby redo logs). Number of SRL’s should be one more than the number of redo logs on primary and the log size should be same as primary
                SQL>ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 size 50 M;
                                 
                SQL>ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 size 50 M;
                 
                SQL>ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 size 50 M;

Note:

The archiver (ARCn) process on the primary database can optionally transmit redo data to remote standby redo logs (SRL’s) if  SRL’s are created.  In a failover situation, you will have access to the information already written in the Standby Redo Logs rather than wait until it is archived completely in case of no SRL’s and hence reducing the risk of loosing data.
MAXLOGFILE defaults to 16 , To create online redologs + standby redologs more than this ensure that you recreate the control file to modify maxlogfile to accomodate this number.

The POC testing was done with and without SRL’s. No issues were noticed with either option. But, if you can not afford to loose any data in case of a failover and you need standby recovered real time, you have to use SRL’s

Verify log shipping

Check to see if the redo from the primary is being shipped to the standby    
            On the primary do the following:
            SQL>ALTER SYSTEM SWITCH LOGFILE;
            SQl>SELECT SEQUENCE#, FIRST_TIME, NEXT_TIME, ARCHIVED, DEST_ID  FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

            SEQUENCE# FIRST_TIME  NEXT_TIME   ARC    DEST_ID
      ---------- ----------- ----------- --- ----------
      1243 01-OCT-2006 07-OCT-2006 YES          1
      1243 01-OCT-2006 07-OCT-2006 YES          2
      1244 01-OCT-2006 07-OCT-2006 YES          1
      1244 01-OCT-2006 07-OCT-2006 YES          2

Note

In this configuration, arch process is used for log_archive_dest_2 and SRL’s are created on the standby. On the primary database, after the ARC0 process successfully archives the local online redo log to the local destination (LOG_ARCHIVE_DEST_1), the ARC1 process transmits redo from the local archived redo log files (instead of the online redo log files) to the remote standby destination (LOG_ARCHIVE_DEST_2). On the remote destination, the remote file server process (RFS) will in turn write the redo data to an archived redo log file from a standby redo log file.

Verify log apply

Start managed recovery
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
Verify if the archived logs are applied successfully on the standby
SQL>SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

Note:

In our setup, the log apply services on standby are delayed by 90 mins. If we need real time apply enabled, the log apply services can apply redo data as it is received, without waiting for the current standby redo log file to be archived. On the standby cancel out of the current managed recovery and place it back in recovery with Real Time Apply
SQL>ALTER DATABASER RECOVER MANAGED STANDBY DATABASE CANCEL;    
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

1 comment:

  1. very good articles. Nice job. could please you write on cold & hot back?

    ReplyDelete