1.\" $MirOS: src/sbin/raidctl/raidctl.8,v 1.3 2005/11/23 16:44:05 tg Exp $ 2.\" $OpenBSD: raidctl.8,v 1.33 2005/03/12 12:21:08 jmc Exp $ 3.\" $NetBSD: raidctl.8,v 1.24 2001/07/10 01:30:52 lukem Exp $ 4.\" 5.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 6.\" All rights reserved. 7.\" 8.\" This code is derived from software contributed to The NetBSD Foundation 9.\" by Greg Oster 10.\" 11.\" Redistribution and use in source and binary forms, with or without 12.\" modification, are permitted provided that the following conditions 13.\" are met: 14.\" 1. Redistributions of source code must retain the above copyright 15.\" notice, this list of conditions and the following disclaimer. 16.\" 2. Redistributions in binary form must reproduce the above copyright 17.\" notice, this list of conditions and the following disclaimer in the 18.\" documentation and/or other materials provided with the distribution. 19.\" 3. All advertising materials mentioning features or use of this software 20.\" must display the following acknowledgement: 21.\" This product includes software developed by the NetBSD 22.\" Foundation, Inc. and its contributors. 23.\" 4. Neither the name of The NetBSD Foundation nor the names of its 24.\" contributors may be used to endorse or promote products derived 25.\" from this software without specific prior written permission. 26.\" 27.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 28.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 29.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 30.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 31.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 32.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 33.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 34.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 35.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 36.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 37.\" POSSIBILITY OF SUCH DAMAGE. 38.\" 39.\" 40.\" Copyright (c) 1995 Carnegie-Mellon University. 41.\" All rights reserved. 42.\" 43.\" Author: Mark Holland 44.\" 45.\" Permission to use, copy, modify and distribute this software and 46.\" its documentation is hereby granted, provided that both the copyright 47.\" notice and this permission notice appear in all copies of the 48.\" software, derivative works or modified versions, and any portions 49.\" thereof, and that both notices appear in supporting documentation. 50.\" 51.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 52.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 53.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 54.\" 55.\" Carnegie Mellon requests users of this software to return to 56.\" 57.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 58.\" School of Computer Science 59.\" Carnegie Mellon University 60.\" Pittsburgh PA 15213-3890 61.\" 62.\" any improvements or extensions that they make and grant Carnegie the 63.\" rights to redistribute these changes. 64.\" 65.Dd July 10, 2001 66.Dt RAIDCTL 8 67.Os 68.Sh NAME 69.Nm raidctl 70.Nd configuration utility for the RAIDframe disk driver 71.Sh SYNOPSIS 72.Nm raidctl 73.Bk -words 74.Op Fl v 75.Op Fl afFgrR Ar component 76.Op Fl BGipPsSu 77.Op Fl cC Ar config_file 78.Op Fl A Op yes | no | root 79.Op Fl I Ar serial_number 80.Ar dev 81.Ek 82.Sh DESCRIPTION 83.Nm 84is the user-land control program for 85.Xr raid 4 , 86the RAIDframe disk device. 87.Nm 88is primarily used to dynamically configure and unconfigure RAIDframe disk 89devices. 90For more information about the RAIDframe disk device, see 91.Xr raid 4 . 92.Pp 93This document assumes the reader has at least rudimentary knowledge of 94RAID and RAID concepts. 95.Pp 96The device used by 97.Nm 98is specified by 99.Ar dev . 100.Ar dev 101may be either the full name of the device, e.g.\& 102.Pa /dev/rraid0c , 103or just simply raid0 (for 104.Pa /dev/rraid0c ) . 105.Pp 106For several commands 107.Pq Fl BGipPsSu , 108.Nm 109can accept the word 110.Ic all 111as the 112.Ar dev 113argument. 114If 115.Ic all 116is used, 117.Nm 118will execute the requested action for all the configured 119.Xr raid 4 120devices. 121.Pp 122The command-line options for 123.Nm 124are as follows: 125.Bl -tag -width indent 126.It Fl a Ar component Ar dev 127Add 128.Ar component 129as a hot spare for the device 130.Ar dev . 131.It Fl A Ic yes Ar dev 132Make the RAID set auto-configurable. 133The RAID set will be automatically configured at boot 134.Em before 135the root file system is 136mounted. 137Note that all components of the set must be of type RAID in the disklabel. 138.It Fl A Ic no Ar dev 139Turn off auto-configuration for the RAID set. 140.It Fl A Ic root Ar dev 141Make the RAID set auto-configurable, and also mark the set as being 142eligible to contain the root partition. 143A RAID set configured this way will 144.Em override 145the use of the boot disk as the root device. 146In 147.Mx 148however, this only takes place if the kernel is configured 149with "config bsd generic" or "root on raid0". 150All components of the set must be of type RAID in the disklabel. 151Note that the kernel being booted must currently reside on a non-RAID set and, 152in order to have the root file system correctly mounted from it, 153the RAID set must have its 154.Sq a 155partition (aka raid[0..n]a) set up. 156.It Fl B Ar dev 157Initiate a copyback of reconstructed data from a spare disk to 158its original disk. 159This is performed after a component has failed, 160and the failed drive has been reconstructed onto a spare drive. 161.It Fl c Ar config_file Ar dev 162Configure the RAIDframe device 163.Ar dev 164according to the configuration given in 165.Ar config_file . 166A description of the contents of 167.Ar config_file 168is given later. 169.It Fl C Ar config_file Ar dev 170As for 171.Fl c , 172but forces the configuration to take place. 173This is required the first time a RAID set is configured. 174.It Fl f Ar component Ar dev 175This marks the specified 176.Ar component 177as having failed, but does not initiate a reconstruction of that 178component. 179.It Fl F Ar component Ar dev 180Fails the specified 181.Ar component 182of the device, and immediately begin a reconstruction of the failed 183disk onto an available hot spare. 184This is one of the mechanisms used to start the reconstruction process 185if a component does have a hardware failure. 186.It Fl g Ar component Ar dev 187Get the component label for the specified component. 188.It Fl G Ar dev 189Generate the configuration of the RAIDframe device in a format suitable for 190use with 191.Nm 192.Fl c 193or 194.Fl C . 195.It Fl i Ar dev 196Initialize the RAID device. 197In particular, (re-write) the parity on the selected device. 198This 199.Em MUST 200be done for 201.Em all 202RAID sets before the RAID device is labeled and before 203file systems are created on the RAID device. 204.It Fl I Ar serial_number Ar dev 205Initialize the component labels on each component of the device. 206.Ar serial_number 207is used as one of the keys in determining whether a 208particular set of components belong to the same RAID set. 209While not strictly enforced, different serial numbers should be used for 210different RAID sets. 211This step 212.Em MUST 213be performed when a new RAID set is created. 214.It Fl p Ar dev 215Check the status of the parity on the RAID set. 216Displays a status message, and returns successfully if the parity 217is up-to-date. 218.It Fl P Ar dev 219Check the status of the parity on the RAID set, and initialize 220(re-write) the parity if the parity is not known to be up-to-date. 221This is normally used after a system crash (and before a 222.Xr fsck 8 ) 223to ensure the integrity of the parity. 224.It Fl r Ar component Ar dev 225Remove the spare disk specified by 226.Ar component 227from the set of available spare components. 228.It Fl R Ar component Ar dev 229Fails the specified 230.Ar component , 231if necessary, and immediately begins a reconstruction back to 232.Ar component . 233This is useful for reconstructing back onto a component after 234it has been replaced following a failure. 235.It Fl s Ar dev 236Display the status of the RAIDframe device for each of the components 237and spares. 238.It Fl S Ar dev 239Check the status of parity re-writing, component reconstruction, and 240component copyback. 241The output indicates the amount of progress achieved in each of these areas. 242.It Fl u Ar dev 243Unconfigure the RAIDframe device. 244.It Fl v 245Be more verbose. 246For operations such as reconstructions, parity re-writing, 247and copybacks, provide a progress indicator. 248.El 249.Ss Configuration file 250The format of the configuration file is complex, and 251only an abbreviated treatment is given here. 252In the configuration files, a 253.Sq # 254indicates the beginning of a comment. 255.Pp 256There are 4 required sections of a configuration file, and 2 257optional sections. 258Each section begins with a 259.Sq START , 260followed by 261the section name, and the configuration parameters associated with that 262section. 263The first section is the 264.Sq array 265section, and it specifies 266the number of rows, columns, and spare disks in the RAID set. 267For example: 268.Bd -unfilled -offset indent 269START array 2701 3 0 271.Ed 272.Pp 273indicates an array with 1 row, 3 columns, and 0 spare disks. 274Note that although multi-dimensional arrays may be specified, they are 275.Em NOT 276supported in the driver. 277.Pp 278The second section, the 279.Sq disks 280section, specifies the actual 281components of the device. 282For example: 283.Bd -unfilled -offset indent 284START disks 285/dev/sd0e 286/dev/sd1e 287/dev/sd2e 288.Ed 289.Pp 290specifies the three component disks to be used in the RAID device. 291If any of the specified drives cannot be found when the RAID device is 292configured, then they will be marked as 293.Sq failed , 294and the system will 295operate in degraded mode. 296Note that it is 297.Em imperative 298that the order of the components in the configuration file does not 299change between configurations of a RAID device. 300Changing the order of the components will result in data loss if the set 301is configured with the 302.Fl C 303option. 304In normal circumstances, the RAID set will not configure if only 305.Fl c 306is specified, and the components are out-of-order. 307.Pp 308The next section, which is the 309.Sq spare 310section, is optional, and, if 311present, specifies the devices to be used as 312.Sq hot spares 313-- devices 314which are on-line, but are not actively used by the RAID driver unless 315one of the main components fail. 316A simple 317.Sq spare 318section might be: 319.Bd -unfilled -offset indent 320START spare 321/dev/sd3e 322.Ed 323.Pp 324for a configuration with a single spare component. 325If no spare drives are to be used in the configuration, then the 326.Sq spare 327section may be omitted. 328.Pp 329The next section is the 330.Sq layout 331section. 332This section describes the general layout parameters for the RAID device, 333and provides such information as sectors per stripe unit, 334stripe units per parity unit, stripe units per reconstruction unit, 335and the parity configuration to use. 336This section might look like: 337.Bd -unfilled -offset indent 338START layout 339# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level 34032 1 1 5 341.Ed 342.Pp 343The sectors per stripe unit specifies, in blocks, the interleave 344factor; i.e. the number of contiguous sectors to be written to each 345component for a single stripe. 346Appropriate selection of this value (32 in this example) is the subject 347of much research in RAID architectures. 348The stripe units per parity unit and stripe units per reconstruction unit 349are normally each set to 1. 350While certain values above 1 are permitted, a discussion of valid 351values and the consequences of using anything other than 1 are outside 352the scope of this document. 353The last value in this section (5 in this example) indicates the 354parity configuration desired. 355Valid entries include: 356.Bl -tag -width inde 357.It 0 358RAID level 0. 359No parity, only simple striping. 360.It 1 361RAID level 1. 362Mirroring. 363The parity is the mirror. 364.It 4 365RAID level 4. 366Striping across components, with parity stored on the last component. 367.It 5 368RAID level 5. 369Striping across components, parity distributed across all components. 370.El 371.Pp 372There are other valid entries here, including those for Even-Odd 373parity, RAID level 5 with rotated sparing, Chained declustering, 374and Interleaved declustering, but as of this writing the code for 375those parity operations has not been tested with 376.Ox . 377.Pp 378The next required section is the 379.Sq queue 380section. 381This is most often specified as: 382.Bd -unfilled -offset indent 383START queue 384fifo 100 385.Ed 386.Pp 387where the queuing method is specified as FIFO (First-In, First-Out), 388and the size of the per-component queue is limited to 100 requests. 389Other queuing methods may also be specified, but a discussion of them 390is beyond the scope of this document. 391.Pp 392The final section, the 393.Sq debug 394section, is optional. 395For more details on this the reader is referred to the RAIDframe 396documentation discussed in the 397.Sx HISTORY 398section. 399See 400.Sx EXAMPLES 401for a more complete configuration file example. 402.Sh EXAMPLES 403It is highly recommended that before using the RAID driver for real 404file systems that the system administrator(s) become quite familiar 405with the use of 406.Nm raidctl , 407and that they understand how the component reconstruction process 408works. 409The examples in this section will focus on configuring a 410number of different RAID sets of varying degrees of redundancy. 411By working through these examples, administrators should be able to 412develop a good feel for how to configure a RAID set, and how to 413initiate reconstruction of failed components. 414.Pp 415In the following examples 416.Sq raid0 417will be used to denote the RAID device. 418.Sq Pa /dev/rraid0c 419may be used in place of 420.Sq raid0 . 421.Ss Initialization and Configuration 422The initial step in configuring a RAID set is to identify the components 423that will be used in the RAID set. 424All components should be the same size. 425Each component should have a disklabel type of 426.Dv FS_RAID , 427and a typical disklabel entry for a RAID component might look like: 428.Bd -unfilled -offset indent 429f: 1800000 200495 RAID # (Cyl. 405*- 4041*) 430.Ed 431.Pp 432While 433.Dv FS_BSDFFS 434(e.g. 4.2BSD) will also work as the component type, the type 435.Dv FS_RAID 436(e.g. RAID) is preferred for RAIDframe use, as it is required for 437features such as auto-configuration. 438As part of the initial configuration of each RAID set, each component 439will be given a 440.Sq component label . 441A 442.Sq component label 443contains important information about the component, including a 444user-specified serial number, the row and column of that component in 445the RAID set, the redundancy level of the RAID set, a 'modification 446counter', and whether the parity information (if any) on that 447component is known to be correct. 448Component labels are an integral part of the RAID set, since they are used 449to ensure that components are configured in the correct order, and used 450to keep track of other vital information about the RAID set. 451Component labels are also required for the auto-detection and 452auto-configuration of RAID sets at boot time. 453For a component label to be considered valid, that particular component label 454must be in agreement with the other component labels in the set. 455For example, the serial number, 456.Sq modification counter , 457number of rows and number of columns must all 458be in agreement. 459If any of these are different, then the component is not considered to be 460part of the set. 461See 462.Xr raid 4 463for more information about component labels. 464.Pp 465Once the components have been identified, and the disks have 466appropriate labels, 467.Nm 468is then used to configure the 469.Xr raid 4 470device. 471To configure the device, a configuration file which looks something like: 472.Bd -unfilled -offset indent 473START array 474# numRow numCol numSpare 4751 3 1 476 477START disks 478/dev/sd1e 479/dev/sd2e 480/dev/sd3e 481 482START spare 483/dev/sd4e 484 485START layout 486# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 48732 1 1 5 488 489START queue 490fifo 100 491.Ed 492.Pp 493is created in a file. 494The above configuration file specifies a RAID 5 set consisting of 495the components 496.Pa /dev/sd1e , /dev/sd2e , 497and 498.Pa /dev/sd3e , 499with 500.Pa /dev/sd4e 501available as a 502.Sq hot spare 503in case one of 504the three main drives should fail. 505A RAID 0 set would be specified in a similar way: 506.Bd -unfilled -offset indent 507START array 508# numRow numCol numSpare 5091 4 0 510 511START disks 512/dev/sd10e 513/dev/sd11e 514/dev/sd12e 515/dev/sd13e 516 517START layout 518# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 51964 1 1 0 520 521START queue 522fifo 100 523.Ed 524.Pp 525In this case, devices 526.Pa /dev/sd10e , /dev/sd11e , /dev/sd12e , 527and 528.Pa /dev/sd13e 529are the components that make up this RAID set. 530Note that there are no hot spares for a RAID 0 set, since there is no way 531to recover data if any of the components fail. 532.Pp 533For a RAID 1 (mirror) set, the following configuration might be used: 534.Bd -unfilled -offset indent 535START array 536# numRow numCol numSpare 5371 2 0 538 539START disks 540/dev/sd20e 541/dev/sd21e 542 543START layout 544# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 545128 1 1 1 546 547START queue 548fifo 100 549.Ed 550.Pp 551In this case, 552.Pa /dev/sd20e 553and 554.Pa /dev/sd21e 555are the two components of the 556mirror set. 557While no hot spares have been specified in this configuration, 558they easily could be, just as they were specified in the RAID 5 case above. 559Note as well that RAID 1 sets are currently limited to only 2 components. 560At present, n-way mirroring is not possible. 561.Pp 562The first time a RAID set is configured, the 563.Fl C 564option must be used: 565.Bd -unfilled -offset indent 566# raidctl -C raid0.conf raid0 567.Ed 568.Pp 569where 570.Sq raid0.conf 571is the name of the RAID configuration file. 572The 573.Fl C 574forces the configuration to succeed, even if any of the component 575labels are incorrect. 576The 577.Fl C 578option should not be used lightly in 579situations other than initial configurations, as if 580the system is refusing to configure a RAID set, there is probably a 581very good reason for it. 582After the initial configuration is done (and appropriate component labels 583are added with the 584.Fl I 585option) then raid0 can be configured normally with: 586.Bd -unfilled -offset indent 587# raidctl -c raid0.conf raid0 588.Ed 589.Pp 590When the RAID set is configured for the first time, it is 591necessary to initialize the component labels, and to initialize the 592parity on the RAID set. 593Initializing the component labels is done with: 594.Bd -unfilled -offset indent 595# raidctl -I 112341 raid0 596.Ed 597.Pp 598where 599.Sq 112341 600is a user-specified serial number for the RAID set. 601This initialization step is 602.Em required 603for all RAID sets. 604Also, using different serial numbers between RAID sets is 605.Em strongly encouraged , 606as using the same serial number for all RAID sets will only serve to 607decrease the usefulness of the component label checking. 608.Pp 609Initializing the RAID set is done via the 610.Fl i 611option. 612This initialization 613.Em MUST 614be done for 615.Em all 616RAID sets, since among other things it verifies that the parity (if 617any) on the RAID set is correct. 618Since this initialization may be quite time-consuming, the 619.Fl v 620option may be also used in conjunction with 621.Fl i : 622.Bd -unfilled -offset indent 623# raidctl -iv raid0 624.Ed 625.Pp 626This will give more verbose output on the 627status of the initialization: 628.Bd -unfilled -offset indent 629Initiating re-write of parity 630Parity Re-write status: 631 10% |**** | ETA: 06:03 / 632.Ed 633.Pp 634The output provides a 635.Sq Percent Complete 636in both a numeric and graphical format, as well as an estimated time 637to completion of the operation. 638.Pp 639Since it is the parity that provides the 640.Sq redundancy 641part of RAID, it is critical that the parity is correct 642as much as possible. 643If the parity is not correct, then there is no guarantee that data will not 644be lost if a component fails. 645.Pp 646Once the parity is known to be correct, it is then safe to perform 647.Xr disklabel 8 , 648.Xr newfs 8 , 649or 650.Xr fsck 8 651on the device or its filesystems, and then to mount the filesystems 652for use. 653.Pp 654Under certain circumstances (e.g. the additional component has not 655arrived, or data is being migrated off of a disk destined to become a 656component) it may be desirable to configure a RAID 1 set with only 657a single component. 658This can be achieved by configuring the set with a physically existing 659component (as either the first or second component) and with a 660.Sq fake 661component. 662In the following: 663.Bd -unfilled -offset indent 664START array 665# numRow numCol numSpare 6661 2 0 667 668START disks 669/dev/sd6e 670/dev/sd0e 671 672START layout 673# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 674128 1 1 1 675 676START queue 677fifo 100 678.Ed 679.Pp 680.Pa /dev/sd0e 681is the real component, and will be the second disk of a RAID 1 682set. 683The component 684.Pa /dev/sd6e , 685which must exist, but have no physical 686device associated with it, is simply used as a placeholder. 687Configuration (using 688.Fl C 689and 690.Fl I Ar 12345 691as above) proceeds normally, but initialization of the RAID set will 692have to wait until all physical components are present. 693After configuration, this set can be used normally, but will be operating 694in degraded mode. 695Once a second physical component is obtained, it can be hot-added, 696the existing data mirrored, and normal operation resumed. 697.Ss Maintenance of the RAID set 698After the parity has been initialized for the first time, the command: 699.Bd -unfilled -offset indent 700# raidctl -p raid0 701.Ed 702.Pp 703can be used to check the current status of the parity. 704To check the parity and rebuild it necessary (for example, after an unclean 705shutdown) the command: 706.Bd -unfilled -offset indent 707# raidctl -P raid0 708.Ed 709.Pp 710is used. 711Note that re-writing the parity can be done while other operations on the 712RAID set are taking place (e.g. while doing an 713.Xr fsck 8 714on a file system on the RAID set). 715However: for maximum effectiveness of the RAID set, the parity should be 716known to be correct before any data on the set is modified. 717.Pp 718To see how the RAID set is doing, the following command can be used to 719show the RAID set's status: 720.Bd -unfilled -offset indent 721# raidctl -s raid0 722.Ed 723.Pp 724The output will look something like: 725.Bd -unfilled -offset indent 726Components: 727 /dev/sd1e: optimal 728 /dev/sd2e: optimal 729 /dev/sd3e: optimal 730Spares: 731 /dev/sd4e: spare 732Parity status: clean 733Reconstruction is 100% complete. 734Parity Re-write is 100% complete. 735Copyback is 100% complete. 736.Ed 737.Pp 738This indicates that all is well with the RAID set. 739Of importance here are the component lines which read 740.Sq optimal , 741and the 742.Sq Parity status 743line which indicates that the parity is up-to-date. 744Note that if there are file systems open on the RAID set, 745the individual components will not be 746.Sq clean 747but the set as a whole can still be clean. 748.Pp 749The 750.Fl v 751option may be also used in conjunction with 752.Fl s : 753.Bd -unfilled -offset indent 754# raidctl -sv raid0 755.Ed 756.Pp 757In this case, the components' label information (see the 758.Fl g 759option) will be given as well: 760.Bd -unfilled -offset indent 761Components: 762 /dev/sd1e: optimal 763 /dev/sd2e: optimal 764 /dev/sd3e: optimal 765Spares: 766 /dev/sd4e: spare 767Component label for /dev/sd1e: 768 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 769 Version: 2 Serial Number: 13432 Mod Counter: 65 770 Clean: No Status: 0 771 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 772 RAID Level: 5 blocksize: 512 numBlocks: 1799936 773 Autoconfig: No 774 Last configured as: raid0 775Component label for /dev/sd2e: 776 Row: 0 Column: 1 Num Rows: 1 Num Columns: 3 777 Version: 2 Serial Number: 13432 Mod Counter: 65 778 Clean: No Status: 0 779 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 780 RAID Level: 5 blocksize: 512 numBlocks: 1799936 781 Autoconfig: No 782 Last configured as: raid0 783Component label for /dev/sd3e: 784 Row: 0 Column: 2 Num Rows: 1 Num Columns: 3 785 Version: 2 Serial Number: 13432 Mod Counter: 65 786 Clean: No Status: 0 787 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 788 RAID Level: 5 blocksize: 512 numBlocks: 1799936 789 Autoconfig: No 790 Last configured as: raid0 791Parity status: clean 792Reconstruction is 100% complete. 793Parity Re-write is 100% complete. 794Copyback is 100% complete. 795.Ed 796.Pp 797To check the component label of /dev/sd1e, the following is used: 798.Bd -unfilled -offset indent 799# raidctl -g /dev/sd1e raid0 800.Ed 801.Pp 802The output of this command will look something like: 803.Bd -unfilled -offset indent 804Component label for /dev/sd1e: 805 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 806 Version: 2 Serial Number: 13432 Mod Counter: 65 807 Clean: No Status: 0 808 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 809 RAID Level: 5 blocksize: 512 numBlocks: 1799936 810 Autoconfig: No 811 Last configured as: raid0 812.Ed 813.Ss Dealing with Component Failures 814If for some reason 815(perhaps to test reconstruction) it is necessary to pretend a drive 816has failed, the following will perform that function: 817.Bd -unfilled -offset indent 818# raidctl -f /dev/sd2e raid0 819.Ed 820.Pp 821The system will then be performing all operations in degraded mode, 822where missing data is re-computed from existing data and the parity. 823In this case, obtaining the status of raid0 will return (in part): 824.Bd -unfilled -offset indent 825Components: 826 /dev/sd1e: optimal 827 /dev/sd2e: failed 828 /dev/sd3e: optimal 829Spares: 830 /dev/sd4e: spare 831.Ed 832.Pp 833Note that with the use of 834.Fl f 835a reconstruction has not been started. 836To both fail the disk and start a reconstruction, the 837.Fl F 838option must be used: 839.Bd -unfilled -offset indent 840# raidctl -F /dev/sd2e raid0 841.Ed 842.Pp 843The 844.Fl f 845option may be used first, and then the 846.Fl F 847option used later, on the same disk, if desired. 848Immediately after the reconstruction is started, the status will report: 849.Bd -unfilled -offset indent 850Components: 851 /dev/sd1e: optimal 852 /dev/sd2e: reconstructing 853 /dev/sd3e: optimal 854Spares: 855 /dev/sd4e: used_spare 856[...] 857Parity status: clean 858Reconstruction is 10% complete. 859Parity Re-write is 100% complete. 860Copyback is 100% complete. 861.Ed 862.Pp 863This indicates that a reconstruction is in progress. 864To find out how the reconstruction is progressing the 865.Fl S 866option may be used. 867This will indicate the progress in terms of the percentage of the 868reconstruction that is completed. 869When the reconstruction is finished the 870.Fl s 871option will show: 872.Bd -unfilled -offset indent 873Components: 874 /dev/sd1e: optimal 875 /dev/sd2e: spared 876 /dev/sd3e: optimal 877Spares: 878 /dev/sd4e: used_spare 879[...] 880Parity status: clean 881Reconstruction is 100% complete. 882Parity Re-write is 100% complete. 883Copyback is 100% complete. 884.Ed 885.Pp 886At this point there are at least two options. 887First, if 888.Pa /dev/sd2e 889is known to be good (i.e. the failure was either caused by 890.Fl f 891or 892.Fl F , 893or the failed disk was replaced), then a copyback of the data can 894be initiated with the 895.Fl B 896option. 897In this example, this would copy the entire contents of 898.Pa /dev/sd4e 899to 900.Pa /dev/sd2e . 901Once the copyback procedure is complete, the 902status of the device would be (in part): 903.Bd -unfilled -offset indent 904Components: 905 /dev/sd1e: optimal 906 /dev/sd2e: optimal 907 /dev/sd3e: optimal 908Spares: 909 /dev/sd4e: spare 910.Ed 911.Pp 912and the system is back to normal operation. 913.Pp 914The second option after the reconstruction is to simply use 915.Pa /dev/sd4e 916in place of 917.Pa /dev/sd2e 918in the configuration file. 919For example, the configuration file (in part) might now look like: 920.Bd -unfilled -offset indent 921START array 9221 3 0 923 924START drives 925/dev/sd1e 926/dev/sd4e 927/dev/sd3e 928.Ed 929.Pp 930This can be done as 931.Pa /dev/sd4e 932is completely interchangeable with 933.Pa /dev/sd2e 934at this point. 935Note that extreme care must be taken when changing the order of the drives 936in a configuration. 937This is one of the few instances where the devices and/or their orderings 938can be changed without loss of data! 939In general, the ordering of components in a configuration file should 940.Em never 941be changed. 942.Pp 943If a component fails and there are no hot spares 944available on-line, the status of the RAID set might (in part) look like: 945.Bd -unfilled -offset indent 946Components: 947 /dev/sd1e: optimal 948 /dev/sd2e: failed 949 /dev/sd3e: optimal 950No spares. 951.Ed 952.Pp 953In this case there are a number of options. 954The first option is to add a hot spare using: 955.Bd -unfilled -offset indent 956# raidctl -a /dev/sd4e raid0 957.Ed 958.Pp 959After the hot add, the status would then be: 960.Bd -unfilled -offset indent 961Components: 962 /dev/sd1e: optimal 963 /dev/sd2e: failed 964 /dev/sd3e: optimal 965Spares: 966 /dev/sd4e: spare 967.Ed 968.Pp 969Reconstruction could then take place using 970.Fl F 971as describe above. 972.Pp 973A second option is to rebuild directly onto 974.Pa /dev/sd2e . 975Once the disk containing 976.Pa /dev/sd2e 977has been replaced, one can simply use: 978.Bd -unfilled -offset indent 979# raidctl -R /dev/sd2e raid0 980.Ed 981.Pp 982to rebuild the 983.Pa /dev/sd2e 984component. 985As the rebuilding is in progress, the status will be: 986.Bd -unfilled -offset indent 987Components: 988 /dev/sd1e: optimal 989 /dev/sd2e: reconstructing 990 /dev/sd3e: optimal 991No spares. 992.Ed 993.Pp 994and when completed, will be: 995.Bd -unfilled -offset indent 996Components: 997 /dev/sd1e: optimal 998 /dev/sd2e: optimal 999 /dev/sd3e: optimal 1000No spares. 1001.Ed 1002.Pp 1003In circumstances where a particular component is completely 1004unavailable after a reboot, a special component name will be used to 1005indicate the missing component. 1006For example: 1007.Bd -unfilled -offset indent 1008Components: 1009 /dev/sd2e: optimal 1010 component1: failed 1011No spares. 1012.Ed 1013.Pp 1014indicates that the second component of this RAID set was not detected 1015at all by the auto-configuration code. 1016The name 1017.Sq component1 1018can be used anywhere a normal component name would be used. 1019For example, to add a hot spare to the above set, and rebuild to that hot 1020spare, the following could be done: 1021.Bd -unfilled -offset indent 1022# raidctl -a /dev/sd3e raid0 1023# raidctl -F component1 raid0 1024.Ed 1025.Pp 1026at which point the data missing from 1027.Sq component1 1028would be reconstructed onto 1029.Pa /dev/sd3e . 1030.Ss RAID on RAID 1031RAID sets can be layered to create more complex and much larger RAID 1032sets. 1033A RAID 0 set, for example, could be constructed from four RAID 5 sets. 1034The following configuration file shows such a setup: 1035.Bd -unfilled -offset indent 1036START array 1037# numRow numCol numSpare 10381 4 0 1039 1040START disks 1041/dev/raid1e 1042/dev/raid2e 1043/dev/raid3e 1044/dev/raid4e 1045 1046START layout 1047# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 1048128 1 1 0 1049 1050START queue 1051fifo 100 1052.Ed 1053.Pp 1054A similar configuration file might be used for a RAID 0 set 1055constructed from components on RAID 1 sets. 1056In such a configuration, the mirroring provides a high degree of redundancy, 1057while the striping provides additional speed benefits. 1058.Ss Auto-configuration and Root on RAID 1059RAID sets can also be auto-configured at boot. 1060To make a set auto-configurable, simply prepare the RAID set as above, 1061and then do a: 1062.Pp 1063.Dl # raidctl -A yes raid0 1064.Pp 1065to turn on auto-configuration for that set. 1066To turn off auto-configuration, use: 1067.Pp 1068.Dl # raidctl -A no raid0 1069.Pp 1070RAID sets which are auto-configurable will be configured before the 1071root file system is mounted. 1072These RAID sets are thus available for use as a root file system, 1073or for any other file system. 1074A primary advantage of using the auto-configuration is that RAID components 1075become more independent of the disks they reside on. 1076For example, SCSI ID's can change, but auto-configured sets will always be 1077configured correctly, even if the SCSI ID's of the component disks 1078have become scrambled. 1079.Pp 1080Having a system's root file system 1081.Pq Pa / 1082on a RAID set is also allowed, 1083with the 1084.Sq a 1085partition of such a RAID set being used for 1086.Pa / . 1087To use raid0a as the root file system, simply use: 1088.Bd -unfilled -offset indent 1089# raidctl -A root raid0 1090.Ed 1091.Pp 1092To return raid0 to be just an auto-configuring set simply use the 1093.Fl A Ar yes 1094arguments. 1095.Pp 1096.\" Note that kernels can only be directly read from RAID 1 components on 1097.\" alpha and pmax architectures. 1098.\" On those architectures, the 1099.\" .Dv FS_RAID 1100.\" file system is recognized by the bootblocks, and will properly load the 1101.\" kernel directly from a RAID 1 component. 1102.\" For other architectures, or 1103Note that kernels can't be directly read from a RAID component. 1104To support the root file system on RAID sets, some mechanism must be 1105used to get a kernel booting. 1106For example, a small partition containing only the secondary boot-blocks 1107and an alternate kernel (or two) could be used. 1108Once a kernel is booting however, and an auto-configured RAID 1109set is found that is eligible to be root, then that RAID set will be 1110auto-configured and its 1111.Sq a 1112partition (aka raid[0..n]a) will be used as the root file system. 1113If two or more RAID sets claim to be root devices, then the user will be 1114prompted to select the root device. 1115At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices. 1116.Pp 1117A typical RAID 1 setup with root on RAID might be as follows: 1118.Bl -enum 1119.It 1120wd0a - a small partition, which contains a complete, bootable, basic 1121.Ox 1122installation. 1123.It 1124wd1a - also contains a complete, bootable, basic 1125.Ox 1126installation. 1127.It 1128wd0e and wd1e - a RAID 1 set, raid0, used for the root file system. 1129.It 1130wd0f and wd1f - a RAID 1 set, raid1, which will be used only for 1131swap space. 1132.It 1133wd0g and wd1g - a RAID 1 set, raid2, used for 1134.Pa /usr , 1135.Pa /home , 1136or other data, if desired. 1137.It 1138wd0h and wd1h - a RAID 1 set, raid3, if desired. 1139.El 1140.Pp 1141RAID sets raid0, raid1, and raid2 are all marked as 1142auto-configurable. 1143raid0 is marked as being a root-able raid. 1144When new kernels are installed, the kernel is not only copied to 1145.Pa / , 1146but also to wd0a and wd1a. 1147The kernel on wd0a is required, since that is the kernel the system 1148boots from. 1149The kernel on wd1a is also required, since that will be the kernel used 1150should wd0 fail. 1151The important point here is to have redundant copies of the kernel 1152available, in the event that one of the drives fail. 1153.Pp 1154There is no requirement that the root file system be on the same disk 1155as the kernel. 1156For example, obtaining the kernel from wd0a, and using 1157sd0e and sd1e for raid0, and the root file system, is fine. 1158It 1159.Em is 1160critical, however, that there be multiple kernels available, in the 1161event of media failure. 1162.Pp 1163Multi-layered RAID devices (such as a RAID 0 set made 1164up of RAID 1 sets) are 1165.Em not 1166supported as root devices or auto-configurable devices at this point. 1167(Multi-layered RAID devices 1168.Em are 1169supported in general, however, as mentioned earlier.) Note that in 1170order to enable component auto-detection and auto-configuration of 1171RAID devices, the line: 1172.Bd -unfilled -offset indent 1173option RAID_AUTOCONFIG 1174.Ed 1175.Pp 1176must be in the kernel configuration file. 1177See 1178.Xr raid 4 1179for more details. 1180.Ss Unconfiguration 1181The final operation performed by 1182.Nm 1183is to unconfigure a 1184.Xr raid 4 1185device. 1186This is accomplished via a simple: 1187.Pp 1188.Dl # raidctl -u raid0 1189.Pp 1190at which point the device is ready to be reconfigured. 1191.Ss Performance Tuning 1192Selection of the various parameter values which result in the best 1193performance can be quite tricky, and often requires a bit of 1194trial-and-error to get those values most appropriate for a given system. 1195A whole range of factors come into play, including: 1196.Bl -enum 1197.It 1198Types of components (e.g. SCSI vs. IDE) and their bandwidth 1199.It 1200Types of controller cards and their bandwidth 1201.It 1202Distribution of components among controllers 1203.It 1204IO bandwidth 1205.It 1206File system access patterns 1207.It 1208CPU speed 1209.El 1210.Pp 1211As with most performance tuning, benchmarking under real-life loads 1212may be the only way to measure expected performance. 1213Understanding some of the underlying technology is also useful in tuning. 1214The goal of this section is to provide pointers to those parameters which may 1215make significant differences in performance. 1216.Pp 1217For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient. 1218Since data in a RAID 1 set is arranged in a linear 1219fashion on each component, selecting an appropriate stripe size is 1220somewhat less critical than it is for a RAID 5 set. 1221However: a stripe size that is too small will cause large IO's to be 1222broken up into a number of smaller ones, hurting performance. 1223At the same time, a large stripe size may cause problems with concurrent 1224accesses to stripes, which may also affect performance. 1225Thus values in the range of 32 to 128 are often the most effective. 1226.Pp 1227Tuning RAID 5 sets is trickier. 1228In the best case, IO is presented to the RAID set one stripe at a time. 1229Since the entire stripe is available at the beginning of the IO, 1230the parity of that stripe can be calculated before the stripe is written, 1231and then the stripe data and parity can be written in parallel. 1232When the amount of data being written is less than a full stripe worth, the 1233.Sq small write 1234problem occurs. 1235Since a 1236.Sq small write 1237means only a portion of the stripe on the components is going to 1238change, the data (and parity) on the components must be updated 1239slightly differently. 1240First, the 1241.Sq old parity 1242and 1243.Sq old data 1244must be read from the components. 1245Then the new parity is constructed, using the new data to be written, 1246and the old data and old parity. 1247Finally, the new data and new parity are written. 1248All this extra data shuffling results in a serious loss of performance, 1249and is typically 2 to 4 times slower than a full stripe write (or read). 1250To combat this problem in the real world, it may be useful to ensure that 1251stripe sizes are small enough that a 1252.Sq large IO 1253from the system will use exactly one large stripe write. 1254As is seen later, there are some file system dependencies which may come 1255into play here as well. 1256.Pp 1257Since the size of a 1258.Sq large IO 1259is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may 1260be desirable to select a SectPerSU value of 16 blocks (8K) or 32 1261blocks (16K). 1262Since there are 4 data sectors per stripe, the maximum 1263data per stripe is 64 blocks (32K) or 128 blocks (64K). 1264Again, empirical measurement will provide the best indicators of which 1265values will yield better performance. 1266.Pp 1267The parameters used for the file system are also critical to good 1268performance. 1269For 1270.Xr newfs 8 , 1271for example, increasing the block size to 32K or 64K may improve 1272performance dramatically. 1273Also, changing the cylinders-per-group parameter from 16 to 32 or higher 1274is often not only necessary for larger file systems, but may also have 1275positive performance implications. 1276.Ss Summary 1277Despite the length of this man-page, configuring a RAID set is a 1278relatively straight-forward process. 1279All that needs to be done is the following steps: 1280.Bl -enum 1281.It 1282Use 1283.Xr disklabel 8 1284to create the components (of type RAID). 1285.It 1286Construct a RAID configuration file: e.g.\& 1287.Sq raid0.conf 1288.It 1289Configure the RAID set with: 1290.Bd -unfilled -offset indent 1291# raidctl -C raid0.conf raid0 1292.Ed 1293.Pp 1294.It 1295Initialize the component labels with: 1296.Bd -unfilled -offset indent 1297# raidctl -I 123456 raid0 1298.Ed 1299.Pp 1300.It 1301Initialize other important parts of the set with: 1302.Bd -unfilled -offset indent 1303# raidctl -i raid0 1304.Ed 1305.Pp 1306.It 1307Get the default label for the RAID set: 1308.Bd -unfilled -offset indent 1309# disklabel raid0 > /tmp/label 1310.Ed 1311.Pp 1312.It 1313Edit the label: 1314.Bd -unfilled -offset indent 1315# vi /tmp/label 1316.Ed 1317.Pp 1318.It 1319Put the new label on the RAID set: 1320.Bd -unfilled -offset indent 1321# disklabel -R -r raid0 /tmp/label 1322.Ed 1323.Pp 1324.It 1325Create the file system: 1326.Bd -unfilled -offset indent 1327# newfs /dev/rraid0e 1328.Ed 1329.Pp 1330.It 1331Mount the file system: 1332.Bd -unfilled -offset indent 1333# mount /dev/raid0e /mnt 1334.Ed 1335.Pp 1336.It 1337Use: 1338.Bd -unfilled -offset indent 1339# raidctl -c raid0.conf raid0 1340.Ed 1341.Pp 1342to re-configure the RAID set the next time it is needed, or put 1343raid0.conf into 1344.Pa /etc 1345where it will automatically be started by the 1346.Pa /etc/rc 1347scripts. 1348.El 1349.Sh WARNINGS 1350Certain RAID levels (1, 4, 5, 6, and others) can protect against some 1351data loss due to component failure. 1352However the loss of two components of a RAID 4 or 5 system, or the loss 1353of a single component of a RAID 0 system will result in the entire 1354filesystem being lost. 1355RAID is 1356.Em NOT 1357a substitute for good backup practices. 1358.Pp 1359Recomputation of parity 1360.Em MUST 1361be performed whenever there is a chance that it may have been 1362compromised. 1363This includes after system crashes, or before a RAID 1364device has been used for the first time. 1365Failure to keep parity correct will be catastrophic should a component 1366ever fail -- it is better to use RAID 0 and get the additional space 1367and speed, than it is to use parity, but not keep the parity correct. 1368At least with RAID 0 there is no perception of increased data security. 1369.Sh FILES 1370.Bl -tag -width /dev/XXrXraidX -compact 1371.It Pa /dev/{,r}raid* 1372.Cm raid 1373device special files. 1374.El 1375.Sh SEE ALSO 1376.Xr ccd 4 , 1377.Xr raid 4 , 1378.Xr rc 8 1379.Sh HISTORY 1380RAIDframe is a framework for rapid prototyping of RAID structures 1381developed by the folks at the Parallel Data Laboratory at Carnegie 1382Mellon University (CMU). 1383A more complete description of the internals and functionality of 1384RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 1385for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 1386Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 1387Parallel Data Laboratory of Carnegie Mellon University. 1388.Pp 1389The 1390.Nm 1391command first appeared as a program in CMU's RAIDframe v1.1 distribution. 1392This version of 1393.Nm 1394is a complete re-write, and first appeared in 1395.Nx 1.4 1396from where it was ported to 1397.Ox 2.5 . 1398.Sh BUGS 1399Hot-spare removal is currently not available. 1400.Sh COPYRIGHT 1401.Bd -unfilled 1402The RAIDframe Copyright is as follows: 1403 1404Copyright (c) 1994-1996 Carnegie-Mellon University. 1405All rights reserved. 1406 1407Permission to use, copy, modify and distribute this software and 1408its documentation is hereby granted, provided that both the copyright 1409notice and this permission notice appear in all copies of the 1410software, derivative works or modified versions, and any portions 1411thereof, and that both notices appear in supporting documentation. 1412 1413CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 1414CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 1415FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 1416 1417Carnegie Mellon requests users of this software to return to 1418 1419 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 1420 School of Computer Science 1421 Carnegie Mellon University 1422 Pittsburgh PA 15213-3890 1423 1424any improvements or extensions that they make and grant Carnegie the 1425rights to redistribute these changes. 1426 1427.Ed 1428