Tuesday, September 11, 2012

Fdisk and parted


>>  Why Fdisk is unable to create more than 2 TB partition and how GNU parted works for it?

* Fdisk will not allow you to craete a partition more than 2 TB and give you following errors. This is mostly because of old PCDOS disk label used on disks. The problem is not with fdisk but it it the limitations of PCDOS disk label. As fdisk does not work with drives that use GPT; thus, we need to use a different partitioning tool. The usual recommendation to Linux users is GNU parted.

[root@rockie ~]# fdisk /dev/sdb
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: The size of this disk is 5.9 TB (5908688535552 bytes).
DOS partition table format can not be used on drives for volumes
larger than (2199023255040 bytes) for 512-byte sectors. Use parted(1) and GUID
partition table format (GPT). Creating 2TB partition using Fdisk

The size of the disk in this example is roughly 6 TB. You can still create a partition in this disk for 2TB using fdisk as shown below.

[root@rockie ~]# fdisk /dev/sdb1
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-718357, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-267349, default 267349):
Using default value 267349

As you can see above, even though there are 718357 cylinders available on this disk (this is for total of roughly 6TB), the last cylinder value it shows is only 267349 (which is roughly close to 2TB in this example).

So, fdisk has created a partition of 2 TB as shown below (even though the disk size is around 6 TB).

Command (m for help): p

Disk /dev/sdb: 5908.7 GB, 5908688535552 bytes
255 heads, 63 sectors/track, 718357 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x3dffd626

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      267349  2147480811   83  Linux
Set Partition Table to GPT using Parted mklabel

In our case, we need to create a partition >2TB. So, we should use parted command.

Before creating the partition command, we should set the disk label to GPT.

GPT stands for GUID partition table format (GPT).

* Use parted’s mklabel command to set disk label to GPT as shown below.

[root@rockie ~]# parted /dev/sdb
GNU Parted 2.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.

(parted) print
Error: /dev/sdb: unrecognised disk label

(parted) mklabel gpt

(parted) print
Model: Unknown (unknown)
Disk /dev/sdb: 5909GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Use parted’s mkpart command as shown below to create partition that is greater than 2TB. In this example, we are creating a partition that is roughly of 6TB in size.

[root@rockie ~]# parted /dev/sdb

(parted) mkpart primary 0GB 5909GB

(parted) print
Model: Unknown (unknown)
Disk /dev/sdb: 5909GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  5909GB  5909GB               primary
To understand how to use parted command effectively, refer to: man parted

Just for curiosity, let us see how this >2TB partition is displayed in fdisk. As you see below, it still shows the size as roughly 2TB (under the Blocks columns). However there is a + at the end indicating that this is greater than 2TB. The System column displays “GPT”.

[root@rockie ~]# fdisk /dev/sdb

Command (m for help): print

Disk /dev/sdb: 5908.7 GB, 5908688535552 bytes
255 heads, 63 sectors/track, 718357 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      267350  2147483647+  ee  GPT

Use mkfs to format the partition. This will take some time depending the size of the partition. You’ll see that it is “Writing inode tables” and the counter will keep increasing. In this example, it roughly took around 15 minutes to complete the mkfs.

[root@rockie ~]# mkfs /dev/sdb1
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
360644608 inodes, 1442550528 blocks
72127526 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
44024 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848, 512000000, 550731776, 644972544

Writing inode tables:  3955/44024
Writing inode tables:  5022/44024
Writing inode tables:  7218/44024
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Monday, September 10, 2012

What is Parity in RAID?

>>  A parity bit is a bit that is added to ensure that the number of bits with the value one in a set of bits is even or odd. Parity bits are used as the simplest form of error detecting code.

>>  Parity data is used by some RAID levels to achieve redundancy. If a drive in the array fails, remaining data on the other drives can be combined with the parity data (using the Boolean XOR function) to reconstruct the missing data.

For example, suppose two drives in a three-drive RAID 5 array contained the following data:

Drive 1: 01101101
Drive 2: 11010100
To calculate parity data for the two drives, an XOR is performed on their data:
        01101101
XOR 11010100
_____________
        10111001

The resulting parity data, 10111001, is then stored on Drive 3.

Should any of the three drives fail, the contents of the failed drive can be reconstructed on a replacement drive by subjecting the data from the remaining drives to the same XOR operation. If Drive 2 were to fail, its data could be rebuilt using the XOR results of the contents of the two remaining drives, Drive 1 and Drive 3:

Drive 1: 01101101
Drive 3: 10111001
as follows:
        10111001
XOR 01101101
_____________
        11010100

The result of that XOR calculation yields Drive 2's contents. 11010100 is then stored on Drive 2, fully repairing the array. This same XOR concept applies similarly to larger arrays, using any number of disks. In the case of a RAID 3 array of 12 drives, 11 drives participate in the XOR calculation shown above and yield a value that is then stored on the dedicated parity drive.