Tuesday, April 17, 2012

Calculating the overhead and data rate (transfer rate) for a disk -- HADR Simulator Tool

                                                  If you had run HADR simulator before or read the HADR_sim page then you might have gone through the things I am explaining here. If not, then this might come in handy.

Recently, I was setting up a DB2 LUW HADR pair over a WAN and thought it might be a good idea to first run HADR Simulator to simulate HADR in the different sync modes and see which one best fits here. The binary for HADR simulator can be downloaded here. The HADR_sim web page contains all the details on how to use the tool. When running the tool using one of the sync modes it is advised to provide details to -disk argument.  The -disk argument takes in two values
1) data rate in MBytes/sec and
2) per write overhead in seconds.

The HADR_sim page describes on how to calculate these two values. I will paste it here to ease reference:
"Disk speed

Simhadr can measure disk speed and simulate disk write.

To measure disk speed, use "-testDisk" option. To specify disk speed to simulation runs, use the "-disk" option.

Disk write time is modeled as: write_time = data_amount * write_rate + per_write_overhead. Write_rate is in unit of MB/second. Per_write_overhead is in unit of second. Theoretically, given the write time of two runs with different data amount, you can solve the equation and get write rate and per write overhead. To make things simple, you can do a run with 1 page flush size. The reported write time is an approximation of per write overhead. Then do a run with a large flush size such as 500 or 1000. The reported MB/second is an approximation of write rate.

When testing disk, simhadr issues synchronous write (write() does not return until data is on disk), just like log writing in real DB2. Simhadr does not remove the temp file created for disk testing. You may examine, then delete the file. For example, you may want to examing the content of the file, or the degree of sector fragmentation on the file.

With a single disk, typical write rate is 30 to 60 MB/second. Typical per write overhead is 1 to 20 millisecond. Newer disks usually have shorter per write overhead. Disk arrays may have better performance. A device with short per write overhead is recommended as log device.

Once you have the disk speed parameters, you may feed it back to simhadr using -disk option. When disk speed is specified, simhadr will compute the time needed to write a log flush and use sleep() to simulate the write. No actual data is written. This allows you to use hypothetical disk speed for "what if" questions like "what if my disk is faster?".





There are two methods described here to calculate OVERHEAD and DATA RATE, quoting...
1) Theoretically, given the write time of two runs with different data amount, you can solve the equation and get write rate and per write overhead.
2) To make things simple, you can do a run with 1 page flush size. The reported write time is an approximation of per write overhead. Then do a run with a large flush size such as 500 or 1000. The reported MB/second is an approximation of write rate.

I will walk you through the steps to calculate overhead and data rate (data transfer rate) going by the harder way (more precise) - Method [1].

As per the instruction for method [1], I did two runs, one with 1 page flush per write and the other with 4096 pages flush per write using HADR_sim -testDisk option.

In run 1, I observed 0.265787 sec/write and in run 2 I observed 0.024283 sec/write

So basically what I got is
Run1 -- 0.265787 sec/write, 4096 pages/write
Run2 -- 0.024283 sec/write, 1 pages/write

Disk write time is modeled as (quoting from HADR_sim page):
   write_time = data_amount * write_rate + per_write_overhead. Write_rate is in unit of MB/second. Per_write_overhead is in unit of second.

   Assume
       Y = write_time
       X = data_amount
        a = write_rate
        b = per_write_overhead

   Let's get down to some middle school stuff here,
                                     Since I have two samples of data, I can fill the above equation with these two samples of data and calculate the unknown variables in the equation by solving the equations.

Consider,
     Run1: Y1 = X1 * a + b
     Run2: Y2 = X2 * a + b

We need to solve these two equations to find 'a' and 'b' where
               a = fa(Y1,Y2,X1,X2)
               b = fb(Y1,Y2,X1,X2)

Subtracting the two equations,
Run1 - Run2 will give:
      Y1 - Y2 = a(X1-X2)
                 a = (Y1 - Y2) / (X1 - X2)

Now we know fa, let's substitue fa in either one of the above equations to find fb
        Y1 = X1 * a + b
        Y1 = X1 * ((Y1 - Y2) / (X1 - X2)) + b
        Y1 = (Y1X1 - Y2X2) / (X1 - X2) + b
         b = Y1 - (Y1X1 - Y2X2) / (X1 - X2)
         b = (Y1X1 - Y1X2 - Y1X1 + Y2X1) / (X1 - X2)
         b = (Y2X1 - Y1X2) / (X1 - X2)

So now all we have to do is, plug in the numbers and calculate the overhead and data rate.
Y1 = 0.265787 X1 = 4096
Y2 = 0.024283 X2 = 1

a  = (Y1 - Y2) / (X1 - X2)
a  = (0.265787- 0.024283)/(4096 -1)
a  = 0.00005897 seconds/pages

Something to note here -- Observe the unit of 'a'.
----------------------------------------------------------
a = (Y1 - Y2) / (X1 - X2) , let's see how the units play out here
 unit of 'a' = (Unit of (Y1 - Y2)) / (Unit of (X1 -X2))
 unit of 'a' = Seconds / Pages

So , 'a' is currently in seconds/pages. We want the data rate (transfer rate) in data/sec or more closely here in MB/sec.
Convert the 0.00005897 seconds/pages to MB/sec.
                 = 0.00005897 seconds/pages
                 = 1 /0.00005897 pages/sec
                 = 16957.77 pages/sec
                 = 66.241 MB/sec. (Size of a page is 4K)


Calculate 'b'
b = (Y2X1 - Y1X2) / (X1 - X2)
  = (4096 * 0.024283 - 0.265787 * 1) /(4096 - 1)
  = (99.197381)/(4095)
  = .02422402466422466422 seconds  (This is about 24 Milli seconds)

HADR_sim takes in two values for -disk option, the data rate in MBytes/sec and Overhead in seconds


Calculated values
    data rate = 66.241 MB/sec
    overhead  = 0.024224 seconds


To verify if these values are correct, you can calculate the R value of the two equations with these numbers and the result should be equal to L value.

You can compare these results with the results one would end up by going with Method [2], they should be pretty close.







1 comment:

Anonymous said...

Lucky Club - Live Casino site
LuckyClub. Live luckyclub.live Casino & Games. Play the most exciting casino games & enjoy the best of quality games from the top providers. Sign up & bet online for real