Thursday, November 28, 2013

IOD 2013

BLU is one of the highlights at IOD this year. Here is a quick summary of the highlights from Susan --> Click here

Tuesday, May 14, 2013

DB2 LUW HADR supports various sync modes. Which hadr_syncmode should I use for my database?

First, I would encourage you to read this document . Then I would recommend you to get familiar with HADR simulator.

Selecting the right hadr_syncmode for your production database is essential to get the optimal network throughput and performance from the hadr pair to satisfy the business service level agreement.

Consider the below factors, then play with the hadr simulator in several configurations with different sync modes, flush size, overhead/transferrate , tcp/ip send,receive buffer, hadr receive buffer and tabulate the results of the runs as you go. Pick the configuration that works best.

Few factors that influence the decision on choosing the hadr_syncmode.

Distance between Primary site and Standby site.
                             
The preference for sync mode at a high level would be
  • SYNC if the primary and standby are located in the same data center.
  • NEARSYNC if the primary and standby are located in different data centers but same city limits.
  • ASYNC if the primary and standby are separated by great distances.
Network type between Primary and Standby. LAN , WAN ?
The general preference is to use SYNC or NEARSYNC for systems over LAN and ASYNC, SUPERASYNC for systems over WAN.

Log data generation on the Primary.
Defining the workload and estimating the amount of log data generated (and flush size) is
necessary to enable smooth log shipping and replay on standby. Approximately (or by
doing a quick test run on a standard database) estimate the number of write transactions
per second that will take place in your business and maximum amount of data
(transactional logs) written by each transaction.
Total data generated/sec = num. of transaction per sec * data per transaction

Hint: Use this to determine the flush size for your hadr simulator runs.

Note: The sync mode that is picked at the current time may not perform well later when the
business grows. Continuous monitoring, evaluating and designing should be done to keep
up with the growing business demands and service level agreements.

Thursday, June 14, 2012

Identifying Primary/Standby db's.

Is there a way to tell if the database is a HADR Primary or Standby database when the database and instances are down. 
             Sure. Search (grep) for "HADR state" in the db2diag.log file (This file is present in the db2dump directory).  The last state in the search results will indicate what state the database was in before the database was shutdown. If the state is something that starts with 'S' (like S-Peer) then the database was a standby database and was in Peer state. If the state is something that starts with 'P' (like P-Peer) then the database was a primary database and was in Peer state.

Query on standby returns error SQL1773 RC5

 Read applications are supported on a reads on standby database. At some times, thought it might seem that the query is read only it fails to execute and an error is returned. I was asked the below question..

<Question> HADR ROS is enabled on the standby server. When I run a read only query on the standby, the query fails with error SQL1773 RC5. In the db2diag.log I notice the following messages

2012-06-12-11.10.06.782777-240 I1749E551           LEVEL: Error
PID     : 23655                TID  : 47079342729536PROC : db2sysc 0
INSTANCE: hdrinst1             NODE : 000          DB   : HADRDB71
APPHDL  : 0-609                APPID: 169.193.146.11.37629.120612150952
AUTHID  : kkchinta
EDUID   : 46                   EDUNAME: db2agent (HADRDB71) 0
FUNCTION: DB2 UDB, Common Trace API, sqlbfix, probe:123
MESSAGE : ZRC=0x80100469=-2146433943=SQLP_HDRS_READ_ONLY
          "The operation that attempted to modify the contents of the database failed"


2012-06-12-11.10.06.784560-240 I2301E1262          LEVEL: Error
PID     : 23655                TID  : 47079342729536PROC : db2sysc 0
INSTANCE: hdrinst1             NODE : 000          DB   : HADRDB71
APPHDL  : 0-609                APPID: 169.193.146.11.37629.120612150952
AUTHID  : kkchinta
EDUID   : 46                   EDUNAME: db2agent (HADRDB71) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbFixAndSearchOneSMPPage, probe:800
MESSAGE : ZRC=0x80100469=-2146433943=SQLP_HDRS_READ_ONLY
          "The operation that attempted to modify the contents of the database failed"
DATA #1 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes

64
DATA #2 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 88 bytes
    Obj: {pool:25;obj:2;type:128} Parent={25;2}
  lifeLSN:       7726780610111206
  tid:           0 0  0
  extentAnchor:                   0
  initEmpPages:                   0
  poolPage0:                      0
  poolflags:                   6112
  objectState:                   45
  lastSMP:                        0
  pageSize:                   32768
  extentSize:                    64
  bufferPoolID:                  13
  partialHash:           2147614745
  bufferPool:    0x00002ad1a226ce20
  pdef:          0x00002ad3438a6840
DATA #3 : Codepath, 8 bytes
60:62


<Answer> To explain the above failure on a high level -- Queries on the standby database can only make use of SMS system temporary table spaces. A query executed on the standby database that uses DMS system temporary table spaces may result in an error (SQL1773N Reason Code 5).  You can read this on the IBM doc's here. 

You can confirm if the table space used by the temp table is a DMS temp table space by looking at the Pool ID dumped in the diaglog. In the above diaglog, the pool ID dumped is 25. You can run a list tablespaces command to see if the table space 25 is a DMS temporary table space or not.  </Answer>

Wednesday, May 16, 2012

DB2 LUW HADR old primary reintegration

An overview of "Reintegrating a database after a takeover operation" is mentioned in the official doc's.
Steps (or Procedure) on how to reintegrate a database are also mentioned in the same document.


I was recently asked -- what would be returned if customer attempted to start the old-Primary as a Standby but the log-streams were different (ie. data not consistent).

Well, even if the old primary is started as standby with the command "start hadr on db <dbName> as standby" and the log stream on old-Primary is different (ahead of standby) a user would still get a successful return code "DB20000I  The START HADR ON DATABASE command completed successfully" but internally the pairing will fail and the new standby database will shutdown. As the official document mentioned, "Monitor the standby states to ensure that the reintegration of the new standby is successful, meaning that it stays online and proceeds with the normal state transition. You can do this using the GET SNAPSHOT FOR DATABASE command or the db2pd tool. If necessary, you can check the administration log file db2diag.log to find out the status of the database."  the user has to monitor the states and the db2diag.log. 

The following messages will be dumped to the db2diag.log  --- notice the message "standby is ahead of primary" below...


 

db2diag.log messages on old-Primary coming up as new standby:
----------------------------------------------------------------------------------------------

2012-05-16-08.26.33.400961-420 E203001E656         LEVEL: Error
PID     : 7425                 TID  : 47494469773632PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 344                  EDUNAME: db2hadrs (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduS, probe:20870
MESSAGE : ADM12500E  The HADR standby database cannot be made consistent with
          the primary database. The log stream of the standby database is
          incompatible with that of the primary database. To use this database
          as a standby, it must be recreated from a backup image or split
          mirror of the primary database.


2012-05-16-08.26.33.416153-420 I203658E360         LEVEL: Warning
PID     : 7425                 TID  : 47494469773632PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 344                  EDUNAME: db2hadrs (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduS, probe:20870
MESSAGE : HADR: Pair validation rejected by primary.

2012-05-16-08.26.33.423572-420 I204019E424         LEVEL: Error
PID     : 7425                 TID  : 47494469773632PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 344                  EDUNAME: db2hadrs (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduS, probe:20880
RETCODE : ZRC=0x87800145=-2021654203=HDR_ZRC_VALIDATION_REJECT
          "HADR shuts down due to validation rejection"
 

db2diag.log messages on new-primary
--------------------------------------------------
2012-05-16-08.26.33.379656-420 I657118E373         LEVEL: Warning
PID     : 14075                TID  : 47572295084352PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 330                  EDUNAME: db2hadrp (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP, probe:20482
MESSAGE : Old primary requesting rejoining HADR pair as a standby

2012-05-16-08.26.33.385077-420 E657492E656         LEVEL: Error
PID     : 14075                TID  : 47572295084352PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 330                  EDUNAME: db2hadrp (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP, probe:20485
MESSAGE : ADM12500E  The HADR standby database cannot be made consistent with
          the primary database. The log stream of the standby database is
          incompatible with that of the primary database. To use this database
          as a standby, it must be recreated from a backup image or split
          mirror of the primary database.

2012-05-16-08.26.33.396369-420 I658149E431         LEVEL: Warning
PID     : 14075                TID  : 47572295084352PROC : db2sysc
INSTANCE: kkchinta             NODE : 000
EDUID   : 330                  EDUNAME: db2hadrp (SSD)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP, probe:20485
MESSAGE : HADR Pair validation failed. Standby is ahead of Primary.  Standby
          LSO: 40857824 Primary LSO: 38510302

Thursday, April 26, 2012

Simple Steps to setup DB2 LUW HADR

I am a newbie, how to setup HADR ?

Official document --> Here
(OR) Cheat sheet example (quick guide) -- For release DB2 v9.7 and previous.. 

Please follow the step numbers:
Setting up Primary

Step (1)
create db $DbName;
update db cfg for $DbName using
HADR_LOCAL_HOST  $Host1
HADR_REMOTE_HOST $Host2
HADR_LOCAL_SVC $Service1
HADR_REMOTE_SVC $Service2
HADR_REMOTE_INST $Instance2
HADR_TIMEOUT 120
HADR_SYNCMODE ASYNC
LOGRETAIN ON;

Step (2)

Tuesday, April 17, 2012

Calculating the overhead and data rate (transfer rate) for a disk -- HADR Simulator Tool

                                                  If you had run HADR simulator before or read the HADR_sim page then you might have gone through the things I am explaining here. If not, then this might come in handy.

Recently, I was setting up a DB2 LUW HADR pair over a WAN and thought it might be a good idea to first run HADR Simulator to simulate HADR in the different sync modes and see which one best fits here. The binary for HADR simulator can be downloaded here. The HADR_sim web page contains all the details on how to use the tool. When running the tool using one of the sync modes it is advised to provide details to -disk argument.  The -disk argument takes in two values
1) data rate in MBytes/sec and
2) per write overhead in seconds.

The HADR_sim page describes on how to calculate these two values. I will paste it here to ease reference:
"Disk speed

Simhadr can measure disk speed and simulate disk write.

To measure disk speed, use "-testDisk" option. To specify disk speed to simulation runs, use the "-disk" option.

Disk write time is modeled as: write_time = data_amount * write_rate + per_write_overhead. Write_rate is in unit of MB/second. Per_write_overhead is in unit of second. Theoretically, given the write time of two runs with different data amount, you can solve the equation and get write rate and per write overhead. To make things simple, you can do a run with 1 page flush size. The reported write time is an approximation of per write overhead. Then do a run with a large flush size such as 500 or 1000. The reported MB/second is an approximation of write rate.

When testing disk, simhadr issues synchronous write (write() does not return until data is on disk), just like log writing in real DB2. Simhadr does not remove the temp file created for disk testing. You may examine, then delete the file. For example, you may want to examing the content of the file, or the degree of sector fragmentation on the file.

With a single disk, typical write rate is 30 to 60 MB/second. Typical per write overhead is 1 to 20 millisecond. Newer disks usually have shorter per write overhead. Disk arrays may have better performance. A device with short per write overhead is recommended as log device.

Once you have the disk speed parameters, you may feed it back to simhadr using -disk option. When disk speed is specified, simhadr will compute the time needed to write a log flush and use sleep() to simulate the write. No actual data is written. This allows you to use hypothetical disk speed for "what if" questions like "what if my disk is faster?".