1.\"	$MirOS: src/sbin/raidctl/raidctl.8,v 1.3 2005/11/23 16:44:05 tg Exp $
2.\"	$OpenBSD: raidctl.8,v 1.33 2005/03/12 12:21:08 jmc Exp $
3.\"     $NetBSD: raidctl.8,v 1.24 2001/07/10 01:30:52 lukem Exp $
4.\"
5.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
6.\" All rights reserved.
7.\"
8.\" This code is derived from software contributed to The NetBSD Foundation
9.\" by Greg Oster
10.\"
11.\" Redistribution and use in source and binary forms, with or without
12.\" modification, are permitted provided that the following conditions
13.\" are met:
14.\" 1. Redistributions of source code must retain the above copyright
15.\"    notice, this list of conditions and the following disclaimer.
16.\" 2. Redistributions in binary form must reproduce the above copyright
17.\"    notice, this list of conditions and the following disclaimer in the
18.\"    documentation and/or other materials provided with the distribution.
19.\" 3. All advertising materials mentioning features or use of this software
20.\"    must display the following acknowledgement:
21.\"        This product includes software developed by the NetBSD
22.\"        Foundation, Inc. and its contributors.
23.\" 4. Neither the name of The NetBSD Foundation nor the names of its
24.\"    contributors may be used to endorse or promote products derived
25.\"    from this software without specific prior written permission.
26.\"
27.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
28.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
29.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
30.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
31.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
32.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
33.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
34.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
35.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
36.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
37.\" POSSIBILITY OF SUCH DAMAGE.
38.\"
39.\"
40.\" Copyright (c) 1995 Carnegie-Mellon University.
41.\" All rights reserved.
42.\"
43.\" Author: Mark Holland
44.\"
45.\" Permission to use, copy, modify and distribute this software and
46.\" its documentation is hereby granted, provided that both the copyright
47.\" notice and this permission notice appear in all copies of the
48.\" software, derivative works or modified versions, and any portions
49.\" thereof, and that both notices appear in supporting documentation.
50.\"
51.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
52.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
53.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
54.\"
55.\" Carnegie Mellon requests users of this software to return to
56.\"
57.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
58.\"  School of Computer Science
59.\"  Carnegie Mellon University
60.\"  Pittsburgh PA 15213-3890
61.\"
62.\" any improvements or extensions that they make and grant Carnegie the
63.\" rights to redistribute these changes.
64.\"
65.Dd July 10, 2001
66.Dt RAIDCTL 8
67.Os
68.Sh NAME
69.Nm raidctl
70.Nd configuration utility for the RAIDframe disk driver
71.Sh SYNOPSIS
72.Nm raidctl
73.Bk -words
74.Op Fl v
75.Op Fl afFgrR Ar component
76.Op Fl BGipPsSu
77.Op Fl cC Ar config_file
78.Op Fl A Op yes | no | root
79.Op Fl I Ar serial_number
80.Ar dev
81.Ek
82.Sh DESCRIPTION
83.Nm
84is the user-land control program for
85.Xr raid 4 ,
86the RAIDframe disk device.
87.Nm
88is primarily used to dynamically configure and unconfigure RAIDframe disk
89devices.
90For more information about the RAIDframe disk device, see
91.Xr raid 4 .
92.Pp
93This document assumes the reader has at least rudimentary knowledge of
94RAID and RAID concepts.
95.Pp
96The device used by
97.Nm
98is specified by
99.Ar dev .
100.Ar dev
101may be either the full name of the device, e.g.\&
102.Pa /dev/rraid0c ,
103or just simply raid0 (for
104.Pa /dev/rraid0c ) .
105.Pp
106For several commands
107.Pq Fl BGipPsSu ,
108.Nm
109can accept the word
110.Ic all
111as the
112.Ar dev
113argument.
114If
115.Ic all
116is used,
117.Nm
118will execute the requested action for all the configured
119.Xr raid 4
120devices.
121.Pp
122The command-line options for
123.Nm
124are as follows:
125.Bl -tag -width indent
126.It Fl a Ar component Ar dev
127Add
128.Ar component
129as a hot spare for the device
130.Ar dev .
131.It Fl A Ic yes Ar dev
132Make the RAID set auto-configurable.
133The RAID set will be automatically configured at boot
134.Em before
135the root file system is
136mounted.
137Note that all components of the set must be of type RAID in the disklabel.
138.It Fl A Ic no Ar dev
139Turn off auto-configuration for the RAID set.
140.It Fl A Ic root Ar dev
141Make the RAID set auto-configurable, and also mark the set as being
142eligible to contain the root partition.
143A RAID set configured this way will
144.Em override
145the use of the boot disk as the root device.
146In
147.Mx
148however, this only takes place if the kernel is configured
149with "config bsd generic" or "root on raid0".
150All components of the set must be of type RAID in the disklabel.
151Note that the kernel being booted must currently reside on a non-RAID set and,
152in order to have the root file system correctly mounted from it,
153the RAID set must have its
154.Sq a
155partition (aka raid[0..n]a) set up.
156.It Fl B Ar dev
157Initiate a copyback of reconstructed data from a spare disk to
158its original disk.
159This is performed after a component has failed,
160and the failed drive has been reconstructed onto a spare drive.
161.It Fl c Ar config_file Ar dev
162Configure the RAIDframe device
163.Ar dev
164according to the configuration given in
165.Ar config_file .
166A description of the contents of
167.Ar config_file
168is given later.
169.It Fl C Ar config_file Ar dev
170As for
171.Fl c ,
172but forces the configuration to take place.
173This is required the first time a RAID set is configured.
174.It Fl f Ar component Ar dev
175This marks the specified
176.Ar component
177as having failed, but does not initiate a reconstruction of that
178component.
179.It Fl F Ar component Ar dev
180Fails the specified
181.Ar component
182of the device, and immediately begin a reconstruction of the failed
183disk onto an available hot spare.
184This is one of the mechanisms used to start the reconstruction process
185if a component does have a hardware failure.
186.It Fl g Ar component Ar dev
187Get the component label for the specified component.
188.It Fl G Ar dev
189Generate the configuration of the RAIDframe device in a format suitable for
190use with
191.Nm
192.Fl c
193or
194.Fl C .
195.It Fl i Ar dev
196Initialize the RAID device.
197In particular, (re-write) the parity on the selected device.
198This
199.Em MUST
200be done for
201.Em all
202RAID sets before the RAID device is labeled and before
203file systems are created on the RAID device.
204.It Fl I Ar serial_number Ar dev
205Initialize the component labels on each component of the device.
206.Ar serial_number
207is used as one of the keys in determining whether a
208particular set of components belong to the same RAID set.
209While not strictly enforced, different serial numbers should be used for
210different RAID sets.
211This step
212.Em MUST
213be performed when a new RAID set is created.
214.It Fl p Ar dev
215Check the status of the parity on the RAID set.
216Displays a status message, and returns successfully if the parity
217is up-to-date.
218.It Fl P Ar dev
219Check the status of the parity on the RAID set, and initialize
220(re-write) the parity if the parity is not known to be up-to-date.
221This is normally used after a system crash (and before a
222.Xr fsck 8 )
223to ensure the integrity of the parity.
224.It Fl r Ar component Ar dev
225Remove the spare disk specified by
226.Ar component
227from the set of available spare components.
228.It Fl R Ar component Ar dev
229Fails the specified
230.Ar component ,
231if necessary, and immediately begins a reconstruction back to
232.Ar component .
233This is useful for reconstructing back onto a component after
234it has been replaced following a failure.
235.It Fl s Ar dev
236Display the status of the RAIDframe device for each of the components
237and spares.
238.It Fl S Ar dev
239Check the status of parity re-writing, component reconstruction, and
240component copyback.
241The output indicates the amount of progress achieved in each of these areas.
242.It Fl u Ar dev
243Unconfigure the RAIDframe device.
244.It Fl v
245Be more verbose.
246For operations such as reconstructions, parity re-writing,
247and copybacks, provide a progress indicator.
248.El
249.Ss Configuration file
250The format of the configuration file is complex, and
251only an abbreviated treatment is given here.
252In the configuration files, a
253.Sq #
254indicates the beginning of a comment.
255.Pp
256There are 4 required sections of a configuration file, and 2
257optional sections.
258Each section begins with a
259.Sq START ,
260followed by
261the section name, and the configuration parameters associated with that
262section.
263The first section is the
264.Sq array
265section, and it specifies
266the number of rows, columns, and spare disks in the RAID set.
267For example:
268.Bd -unfilled -offset indent
269START array
2701 3 0
271.Ed
272.Pp
273indicates an array with 1 row, 3 columns, and 0 spare disks.
274Note that although multi-dimensional arrays may be specified, they are
275.Em NOT
276supported in the driver.
277.Pp
278The second section, the
279.Sq disks
280section, specifies the actual
281components of the device.
282For example:
283.Bd -unfilled -offset indent
284START disks
285/dev/sd0e
286/dev/sd1e
287/dev/sd2e
288.Ed
289.Pp
290specifies the three component disks to be used in the RAID device.
291If any of the specified drives cannot be found when the RAID device is
292configured, then they will be marked as
293.Sq failed ,
294and the system will
295operate in degraded mode.
296Note that it is
297.Em imperative
298that the order of the components in the configuration file does not
299change between configurations of a RAID device.
300Changing the order of the components will result in data loss if the set
301is configured with the
302.Fl C
303option.
304In normal circumstances, the RAID set will not configure if only
305.Fl c
306is specified, and the components are out-of-order.
307.Pp
308The next section, which is the
309.Sq spare
310section, is optional, and, if
311present, specifies the devices to be used as
312.Sq hot spares
313-- devices
314which are on-line, but are not actively used by the RAID driver unless
315one of the main components fail.
316A simple
317.Sq spare
318section might be:
319.Bd -unfilled -offset indent
320START spare
321/dev/sd3e
322.Ed
323.Pp
324for a configuration with a single spare component.
325If no spare drives are to be used in the configuration, then the
326.Sq spare
327section may be omitted.
328.Pp
329The next section is the
330.Sq layout
331section.
332This section describes the general layout parameters for the RAID device,
333and provides such information as sectors per stripe unit,
334stripe units per parity unit, stripe units per reconstruction unit,
335and the parity configuration to use.
336This section might look like:
337.Bd -unfilled -offset indent
338START layout
339# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
34032 1 1 5
341.Ed
342.Pp
343The sectors per stripe unit specifies, in blocks, the interleave
344factor; i.e. the number of contiguous sectors to be written to each
345component for a single stripe.
346Appropriate selection of this value (32 in this example) is the subject
347of much research in RAID architectures.
348The stripe units per parity unit and stripe units per reconstruction unit
349are normally each set to 1.
350While certain values above 1 are permitted, a discussion of valid
351values and the consequences of using anything other than 1 are outside
352the scope of this document.
353The last value in this section (5 in this example) indicates the
354parity configuration desired.
355Valid entries include:
356.Bl -tag -width inde
357.It 0
358RAID level 0.
359No parity, only simple striping.
360.It 1
361RAID level 1.
362Mirroring.
363The parity is the mirror.
364.It 4
365RAID level 4.
366Striping across components, with parity stored on the last component.
367.It 5
368RAID level 5.
369Striping across components, parity distributed across all components.
370.El
371.Pp
372There are other valid entries here, including those for Even-Odd
373parity, RAID level 5 with rotated sparing, Chained declustering,
374and Interleaved declustering, but as of this writing the code for
375those parity operations has not been tested with
376.Ox .
377.Pp
378The next required section is the
379.Sq queue
380section.
381This is most often specified as:
382.Bd -unfilled -offset indent
383START queue
384fifo 100
385.Ed
386.Pp
387where the queuing method is specified as FIFO (First-In, First-Out),
388and the size of the per-component queue is limited to 100 requests.
389Other queuing methods may also be specified, but a discussion of them
390is beyond the scope of this document.
391.Pp
392The final section, the
393.Sq debug
394section, is optional.
395For more details on this the reader is referred to the RAIDframe
396documentation discussed in the
397.Sx HISTORY
398section.
399See
400.Sx EXAMPLES
401for a more complete configuration file example.
402.Sh EXAMPLES
403It is highly recommended that before using the RAID driver for real
404file systems that the system administrator(s) become quite familiar
405with the use of
406.Nm raidctl ,
407and that they understand how the component reconstruction process
408works.
409The examples in this section will focus on configuring a
410number of different RAID sets of varying degrees of redundancy.
411By working through these examples, administrators should be able to
412develop a good feel for how to configure a RAID set, and how to
413initiate reconstruction of failed components.
414.Pp
415In the following examples
416.Sq raid0
417will be used to denote the RAID device.
418.Sq Pa /dev/rraid0c
419may be used in place of
420.Sq raid0 .
421.Ss Initialization and Configuration
422The initial step in configuring a RAID set is to identify the components
423that will be used in the RAID set.
424All components should be the same size.
425Each component should have a disklabel type of
426.Dv FS_RAID ,
427and a typical disklabel entry for a RAID component might look like:
428.Bd -unfilled -offset indent
429f:  1800000  200495     RAID              # (Cyl.  405*- 4041*)
430.Ed
431.Pp
432While
433.Dv FS_BSDFFS
434(e.g. 4.2BSD) will also work as the component type, the type
435.Dv FS_RAID
436(e.g. RAID) is preferred for RAIDframe use, as it is required for
437features such as auto-configuration.
438As part of the initial configuration of each RAID set, each component
439will be given a
440.Sq component label .
441A
442.Sq component label
443contains important information about the component, including a
444user-specified serial number, the row and column of that component in
445the RAID set, the redundancy level of the RAID set, a 'modification
446counter', and whether the parity information (if any) on that
447component is known to be correct.
448Component labels are an integral part of the RAID set, since they are used
449to ensure that components are configured in the correct order, and used
450to keep track of other vital information about the RAID set.
451Component labels are also required for the auto-detection and
452auto-configuration of RAID sets at boot time.
453For a component label to be considered valid, that particular component label
454must be in agreement with the other component labels in the set.
455For example, the serial number,
456.Sq modification counter ,
457number of rows and number of columns must all
458be in agreement.
459If any of these are different, then the component is not considered to be
460part of the set.
461See
462.Xr raid 4
463for more information about component labels.
464.Pp
465Once the components have been identified, and the disks have
466appropriate labels,
467.Nm
468is then used to configure the
469.Xr raid 4
470device.
471To configure the device, a configuration file which looks something like:
472.Bd -unfilled -offset indent
473START array
474# numRow numCol numSpare
4751 3 1
476
477START disks
478/dev/sd1e
479/dev/sd2e
480/dev/sd3e
481
482START spare
483/dev/sd4e
484
485START layout
486# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
48732 1 1 5
488
489START queue
490fifo 100
491.Ed
492.Pp
493is created in a file.
494The above configuration file specifies a RAID 5 set consisting of
495the components
496.Pa /dev/sd1e , /dev/sd2e ,
497and
498.Pa /dev/sd3e ,
499with
500.Pa /dev/sd4e
501available as a
502.Sq hot spare
503in case one of
504the three main drives should fail.
505A RAID 0 set would be specified in a similar way:
506.Bd -unfilled -offset indent
507START array
508# numRow numCol numSpare
5091 4 0
510
511START disks
512/dev/sd10e
513/dev/sd11e
514/dev/sd12e
515/dev/sd13e
516
517START layout
518# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
51964 1 1 0
520
521START queue
522fifo 100
523.Ed
524.Pp
525In this case, devices
526.Pa /dev/sd10e , /dev/sd11e , /dev/sd12e ,
527and
528.Pa /dev/sd13e
529are the components that make up this RAID set.
530Note that there are no hot spares for a RAID 0 set, since there is no way
531to recover data if any of the components fail.
532.Pp
533For a RAID 1 (mirror) set, the following configuration might be used:
534.Bd -unfilled -offset indent
535START array
536# numRow numCol numSpare
5371 2 0
538
539START disks
540/dev/sd20e
541/dev/sd21e
542
543START layout
544# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
545128 1 1 1
546
547START queue
548fifo 100
549.Ed
550.Pp
551In this case,
552.Pa /dev/sd20e
553and
554.Pa /dev/sd21e
555are the two components of the
556mirror set.
557While no hot spares have been specified in this configuration,
558they easily could be, just as they were specified in the RAID 5 case above.
559Note as well that RAID 1 sets are currently limited to only 2 components.
560At present, n-way mirroring is not possible.
561.Pp
562The first time a RAID set is configured, the
563.Fl C
564option must be used:
565.Bd -unfilled -offset indent
566# raidctl -C raid0.conf raid0
567.Ed
568.Pp
569where
570.Sq raid0.conf
571is the name of the RAID configuration file.
572The
573.Fl C
574forces the configuration to succeed, even if any of the component
575labels are incorrect.
576The
577.Fl C
578option should not be used lightly in
579situations other than initial configurations, as if
580the system is refusing to configure a RAID set, there is probably a
581very good reason for it.
582After the initial configuration is done (and appropriate component labels
583are added with the
584.Fl I
585option) then raid0 can be configured normally with:
586.Bd -unfilled -offset indent
587# raidctl -c raid0.conf raid0
588.Ed
589.Pp
590When the RAID set is configured for the first time, it is
591necessary to initialize the component labels, and to initialize the
592parity on the RAID set.
593Initializing the component labels is done with:
594.Bd -unfilled -offset indent
595# raidctl -I 112341 raid0
596.Ed
597.Pp
598where
599.Sq 112341
600is a user-specified serial number for the RAID set.
601This initialization step is
602.Em required
603for all RAID sets.
604Also, using different serial numbers between RAID sets is
605.Em strongly encouraged ,
606as using the same serial number for all RAID sets will only serve to
607decrease the usefulness of the component label checking.
608.Pp
609Initializing the RAID set is done via the
610.Fl i
611option.
612This initialization
613.Em MUST
614be done for
615.Em all
616RAID sets, since among other things it verifies that the parity (if
617any) on the RAID set is correct.
618Since this initialization may be quite time-consuming, the
619.Fl v
620option may be also used in conjunction with
621.Fl i :
622.Bd -unfilled -offset indent
623# raidctl -iv raid0
624.Ed
625.Pp
626This will give more verbose output on the
627status of the initialization:
628.Bd -unfilled -offset indent
629Initiating re-write of parity
630Parity Re-write status:
631 10% |****                                   | ETA:    06:03 /
632.Ed
633.Pp
634The output provides a
635.Sq Percent Complete
636in both a numeric and graphical format, as well as an estimated time
637to completion of the operation.
638.Pp
639Since it is the parity that provides the
640.Sq redundancy
641part of RAID, it is critical that the parity is correct
642as much as possible.
643If the parity is not correct, then there is no guarantee that data will not
644be lost if a component fails.
645.Pp
646Once the parity is known to be correct, it is then safe to perform
647.Xr disklabel 8 ,
648.Xr newfs 8 ,
649or
650.Xr fsck 8
651on the device or its filesystems, and then to mount the filesystems
652for use.
653.Pp
654Under certain circumstances (e.g. the additional component has not
655arrived, or data is being migrated off of a disk destined to become a
656component) it may be desirable to configure a RAID 1 set with only
657a single component.
658This can be achieved by configuring the set with a physically existing
659component (as either the first or second component) and with a
660.Sq fake
661component.
662In the following:
663.Bd -unfilled -offset indent
664START array
665# numRow numCol numSpare
6661 2 0
667
668START disks
669/dev/sd6e
670/dev/sd0e
671
672START layout
673# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
674128 1 1 1
675
676START queue
677fifo 100
678.Ed
679.Pp
680.Pa /dev/sd0e
681is the real component, and will be the second disk of a RAID 1
682set.
683The component
684.Pa /dev/sd6e ,
685which must exist, but have no physical
686device associated with it, is simply used as a placeholder.
687Configuration (using
688.Fl C
689and
690.Fl I Ar 12345
691as above) proceeds normally, but initialization of the RAID set will
692have to wait until all physical components are present.
693After configuration, this set can be used normally, but will be operating
694in degraded mode.
695Once a second physical component is obtained, it can be hot-added,
696the existing data mirrored, and normal operation resumed.
697.Ss Maintenance of the RAID set
698After the parity has been initialized for the first time, the command:
699.Bd -unfilled -offset indent
700# raidctl -p raid0
701.Ed
702.Pp
703can be used to check the current status of the parity.
704To check the parity and rebuild it necessary (for example, after an unclean
705shutdown) the command:
706.Bd -unfilled -offset indent
707# raidctl -P raid0
708.Ed
709.Pp
710is used.
711Note that re-writing the parity can be done while other operations on the
712RAID set are taking place (e.g. while doing an
713.Xr fsck 8
714on a file system on the RAID set).
715However: for maximum effectiveness of the RAID set, the parity should be
716known to be correct before any data on the set is modified.
717.Pp
718To see how the RAID set is doing, the following command can be used to
719show the RAID set's status:
720.Bd -unfilled -offset indent
721# raidctl -s raid0
722.Ed
723.Pp
724The output will look something like:
725.Bd -unfilled -offset indent
726Components:
727           /dev/sd1e: optimal
728           /dev/sd2e: optimal
729           /dev/sd3e: optimal
730Spares:
731           /dev/sd4e: spare
732Parity status: clean
733Reconstruction is 100% complete.
734Parity Re-write is 100% complete.
735Copyback is 100% complete.
736.Ed
737.Pp
738This indicates that all is well with the RAID set.
739Of importance here are the component lines which read
740.Sq optimal ,
741and the
742.Sq Parity status
743line which indicates that the parity is up-to-date.
744Note that if there are file systems open on the RAID set,
745the individual components will not be
746.Sq clean
747but the set as a whole can still be clean.
748.Pp
749The
750.Fl v
751option may be also used in conjunction with
752.Fl s :
753.Bd -unfilled -offset indent
754# raidctl -sv raid0
755.Ed
756.Pp
757In this case, the components' label information (see the
758.Fl g
759option) will be given as well:
760.Bd -unfilled -offset indent
761Components:
762           /dev/sd1e: optimal
763           /dev/sd2e: optimal
764           /dev/sd3e: optimal
765Spares:
766           /dev/sd4e: spare
767Component label for /dev/sd1e:
768   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
769   Version: 2 Serial Number: 13432 Mod Counter: 65
770   Clean: No Status: 0
771   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
772   RAID Level: 5  blocksize: 512 numBlocks: 1799936
773   Autoconfig: No
774   Last configured as: raid0
775Component label for /dev/sd2e:
776   Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
777   Version: 2 Serial Number: 13432 Mod Counter: 65
778   Clean: No Status: 0
779   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
780   RAID Level: 5  blocksize: 512 numBlocks: 1799936
781   Autoconfig: No
782   Last configured as: raid0
783Component label for /dev/sd3e:
784   Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
785   Version: 2 Serial Number: 13432 Mod Counter: 65
786   Clean: No Status: 0
787   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
788   RAID Level: 5  blocksize: 512 numBlocks: 1799936
789   Autoconfig: No
790   Last configured as: raid0
791Parity status: clean
792Reconstruction is 100% complete.
793Parity Re-write is 100% complete.
794Copyback is 100% complete.
795.Ed
796.Pp
797To check the component label of /dev/sd1e, the following is used:
798.Bd -unfilled -offset indent
799# raidctl -g /dev/sd1e raid0
800.Ed
801.Pp
802The output of this command will look something like:
803.Bd -unfilled -offset indent
804Component label for /dev/sd1e:
805   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
806   Version: 2 Serial Number: 13432 Mod Counter: 65
807   Clean: No Status: 0
808   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
809   RAID Level: 5  blocksize: 512 numBlocks: 1799936
810   Autoconfig: No
811   Last configured as: raid0
812.Ed
813.Ss Dealing with Component Failures
814If for some reason
815(perhaps to test reconstruction) it is necessary to pretend a drive
816has failed, the following will perform that function:
817.Bd -unfilled -offset indent
818# raidctl -f /dev/sd2e raid0
819.Ed
820.Pp
821The system will then be performing all operations in degraded mode,
822where missing data is re-computed from existing data and the parity.
823In this case, obtaining the status of raid0 will return (in part):
824.Bd -unfilled -offset indent
825Components:
826           /dev/sd1e: optimal
827           /dev/sd2e: failed
828           /dev/sd3e: optimal
829Spares:
830           /dev/sd4e: spare
831.Ed
832.Pp
833Note that with the use of
834.Fl f
835a reconstruction has not been started.
836To both fail the disk and start a reconstruction, the
837.Fl F
838option must be used:
839.Bd -unfilled -offset indent
840# raidctl -F /dev/sd2e raid0
841.Ed
842.Pp
843The
844.Fl f
845option may be used first, and then the
846.Fl F
847option used later, on the same disk, if desired.
848Immediately after the reconstruction is started, the status will report:
849.Bd -unfilled -offset indent
850Components:
851           /dev/sd1e: optimal
852           /dev/sd2e: reconstructing
853           /dev/sd3e: optimal
854Spares:
855           /dev/sd4e: used_spare
856[...]
857Parity status: clean
858Reconstruction is 10% complete.
859Parity Re-write is 100% complete.
860Copyback is 100% complete.
861.Ed
862.Pp
863This indicates that a reconstruction is in progress.
864To find out how the reconstruction is progressing the
865.Fl S
866option may be used.
867This will indicate the progress in terms of the percentage of the
868reconstruction that is completed.
869When the reconstruction is finished the
870.Fl s
871option will show:
872.Bd -unfilled -offset indent
873Components:
874           /dev/sd1e: optimal
875           /dev/sd2e: spared
876           /dev/sd3e: optimal
877Spares:
878           /dev/sd4e: used_spare
879[...]
880Parity status: clean
881Reconstruction is 100% complete.
882Parity Re-write is 100% complete.
883Copyback is 100% complete.
884.Ed
885.Pp
886At this point there are at least two options.
887First, if
888.Pa /dev/sd2e
889is known to be good (i.e. the failure was either caused by
890.Fl f
891or
892.Fl F ,
893or the failed disk was replaced), then a copyback of the data can
894be initiated with the
895.Fl B
896option.
897In this example, this would copy the entire contents of
898.Pa /dev/sd4e
899to
900.Pa /dev/sd2e .
901Once the copyback procedure is complete, the
902status of the device would be (in part):
903.Bd -unfilled -offset indent
904Components:
905           /dev/sd1e: optimal
906           /dev/sd2e: optimal
907           /dev/sd3e: optimal
908Spares:
909           /dev/sd4e: spare
910.Ed
911.Pp
912and the system is back to normal operation.
913.Pp
914The second option after the reconstruction is to simply use
915.Pa /dev/sd4e
916in place of
917.Pa /dev/sd2e
918in the configuration file.
919For example, the configuration file (in part) might now look like:
920.Bd -unfilled -offset indent
921START array
9221 3 0
923
924START drives
925/dev/sd1e
926/dev/sd4e
927/dev/sd3e
928.Ed
929.Pp
930This can be done as
931.Pa /dev/sd4e
932is completely interchangeable with
933.Pa /dev/sd2e
934at this point.
935Note that extreme care must be taken when changing the order of the drives
936in a configuration.
937This is one of the few instances where the devices and/or their orderings
938can be changed without loss of data!
939In general, the ordering of components in a configuration file should
940.Em never
941be changed.
942.Pp
943If a component fails and there are no hot spares
944available on-line, the status of the RAID set might (in part) look like:
945.Bd -unfilled -offset indent
946Components:
947           /dev/sd1e: optimal
948           /dev/sd2e: failed
949           /dev/sd3e: optimal
950No spares.
951.Ed
952.Pp
953In this case there are a number of options.
954The first option is to add a hot spare using:
955.Bd -unfilled -offset indent
956# raidctl -a /dev/sd4e raid0
957.Ed
958.Pp
959After the hot add, the status would then be:
960.Bd -unfilled -offset indent
961Components:
962           /dev/sd1e: optimal
963           /dev/sd2e: failed
964           /dev/sd3e: optimal
965Spares:
966           /dev/sd4e: spare
967.Ed
968.Pp
969Reconstruction could then take place using
970.Fl F
971as describe above.
972.Pp
973A second option is to rebuild directly onto
974.Pa /dev/sd2e .
975Once the disk containing
976.Pa /dev/sd2e
977has been replaced, one can simply use:
978.Bd -unfilled -offset indent
979# raidctl -R /dev/sd2e raid0
980.Ed
981.Pp
982to rebuild the
983.Pa /dev/sd2e
984component.
985As the rebuilding is in progress, the status will be:
986.Bd -unfilled -offset indent
987Components:
988           /dev/sd1e: optimal
989           /dev/sd2e: reconstructing
990           /dev/sd3e: optimal
991No spares.
992.Ed
993.Pp
994and when completed, will be:
995.Bd -unfilled -offset indent
996Components:
997           /dev/sd1e: optimal
998           /dev/sd2e: optimal
999           /dev/sd3e: optimal
1000No spares.
1001.Ed
1002.Pp
1003In circumstances where a particular component is completely
1004unavailable after a reboot, a special component name will be used to
1005indicate the missing component.
1006For example:
1007.Bd -unfilled -offset indent
1008Components:
1009           /dev/sd2e: optimal
1010          component1: failed
1011No spares.
1012.Ed
1013.Pp
1014indicates that the second component of this RAID set was not detected
1015at all by the auto-configuration code.
1016The name
1017.Sq component1
1018can be used anywhere a normal component name would be used.
1019For example, to add a hot spare to the above set, and rebuild to that hot
1020spare, the following could be done:
1021.Bd -unfilled -offset indent
1022# raidctl -a /dev/sd3e raid0
1023# raidctl -F component1 raid0
1024.Ed
1025.Pp
1026at which point the data missing from
1027.Sq component1
1028would be reconstructed onto
1029.Pa /dev/sd3e .
1030.Ss RAID on RAID
1031RAID sets can be layered to create more complex and much larger RAID
1032sets.
1033A RAID 0 set, for example, could be constructed from four RAID 5 sets.
1034The following configuration file shows such a setup:
1035.Bd -unfilled -offset indent
1036START array
1037# numRow numCol numSpare
10381 4 0
1039
1040START disks
1041/dev/raid1e
1042/dev/raid2e
1043/dev/raid3e
1044/dev/raid4e
1045
1046START layout
1047# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
1048128 1 1 0
1049
1050START queue
1051fifo 100
1052.Ed
1053.Pp
1054A similar configuration file might be used for a RAID 0 set
1055constructed from components on RAID 1 sets.
1056In such a configuration, the mirroring provides a high degree of redundancy,
1057while the striping provides additional speed benefits.
1058.Ss Auto-configuration and Root on RAID
1059RAID sets can also be auto-configured at boot.
1060To make a set auto-configurable, simply prepare the RAID set as above,
1061and then do a:
1062.Pp
1063.Dl # raidctl -A yes raid0
1064.Pp
1065to turn on auto-configuration for that set.
1066To turn off auto-configuration, use:
1067.Pp
1068.Dl # raidctl -A no raid0
1069.Pp
1070RAID sets which are auto-configurable will be configured before the
1071root file system is mounted.
1072These RAID sets are thus available for use as a root file system,
1073or for any other file system.
1074A primary advantage of using the auto-configuration is that RAID components
1075become more independent of the disks they reside on.
1076For example, SCSI ID's can change, but auto-configured sets will always be
1077configured correctly, even if the SCSI ID's of the component disks
1078have become scrambled.
1079.Pp
1080Having a system's root file system
1081.Pq Pa /
1082on a RAID set is also allowed,
1083with the
1084.Sq a
1085partition of such a RAID set being used for
1086.Pa / .
1087To use raid0a as the root file system, simply use:
1088.Bd -unfilled -offset indent
1089# raidctl -A root raid0
1090.Ed
1091.Pp
1092To return raid0 to be just an auto-configuring set simply use the
1093.Fl A Ar yes
1094arguments.
1095.Pp
1096.\" Note that kernels can only be directly read from RAID 1 components on
1097.\" alpha and pmax architectures.
1098.\" On those architectures, the
1099.\" .Dv FS_RAID
1100.\" file system is recognized by the bootblocks, and will properly load the
1101.\" kernel directly from a RAID 1 component.
1102.\" For other architectures, or
1103Note that kernels can't be directly read from a RAID component.
1104To support the root file system on RAID sets, some mechanism must be
1105used to get a kernel booting.
1106For example, a small partition containing only the secondary boot-blocks
1107and an alternate kernel (or two) could be used.
1108Once a kernel is booting however, and an auto-configured RAID
1109set is found that is eligible to be root, then that RAID set will be
1110auto-configured and its
1111.Sq a
1112partition (aka raid[0..n]a) will be used as the root file system.
1113If two or more RAID sets claim to be root devices, then the user will be
1114prompted to select the root device.
1115At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
1116.Pp
1117A typical RAID 1 setup with root on RAID might be as follows:
1118.Bl -enum
1119.It
1120wd0a - a small partition, which contains a complete, bootable, basic
1121.Ox
1122installation.
1123.It
1124wd1a - also contains a complete, bootable, basic
1125.Ox
1126installation.
1127.It
1128wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
1129.It
1130wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
1131swap space.
1132.It
1133wd0g and wd1g - a RAID 1 set, raid2, used for
1134.Pa /usr ,
1135.Pa /home ,
1136or other data, if desired.
1137.It
1138wd0h and wd1h - a RAID 1 set, raid3, if desired.
1139.El
1140.Pp
1141RAID sets raid0, raid1, and raid2 are all marked as
1142auto-configurable.
1143raid0 is marked as being a root-able raid.
1144When new kernels are installed, the kernel is not only copied to
1145.Pa / ,
1146but also to wd0a and wd1a.
1147The kernel on wd0a is required, since that is the kernel the system
1148boots from.
1149The kernel on wd1a is also required, since that will be the kernel used
1150should wd0 fail.
1151The important point here is to have redundant copies of the kernel
1152available, in the event that one of the drives fail.
1153.Pp
1154There is no requirement that the root file system be on the same disk
1155as the kernel.
1156For example, obtaining the kernel from wd0a, and using
1157sd0e and sd1e for raid0, and the root file system, is fine.
1158It
1159.Em is
1160critical, however, that there be multiple kernels available, in the
1161event of media failure.
1162.Pp
1163Multi-layered RAID devices (such as a RAID 0 set made
1164up of RAID 1 sets) are
1165.Em not
1166supported as root devices or auto-configurable devices at this point.
1167(Multi-layered RAID devices
1168.Em are
1169supported in general, however, as mentioned earlier.)  Note that in
1170order to enable component auto-detection and auto-configuration of
1171RAID devices, the line:
1172.Bd -unfilled -offset indent
1173option	RAID_AUTOCONFIG
1174.Ed
1175.Pp
1176must be in the kernel configuration file.
1177See
1178.Xr raid 4
1179for more details.
1180.Ss Unconfiguration
1181The final operation performed by
1182.Nm
1183is to unconfigure a
1184.Xr raid 4
1185device.
1186This is accomplished via a simple:
1187.Pp
1188.Dl # raidctl -u raid0
1189.Pp
1190at which point the device is ready to be reconfigured.
1191.Ss Performance Tuning
1192Selection of the various parameter values which result in the best
1193performance can be quite tricky, and often requires a bit of
1194trial-and-error to get those values most appropriate for a given system.
1195A whole range of factors come into play, including:
1196.Bl -enum
1197.It
1198Types of components (e.g. SCSI vs. IDE) and their bandwidth
1199.It
1200Types of controller cards and their bandwidth
1201.It
1202Distribution of components among controllers
1203.It
1204IO bandwidth
1205.It
1206File system access patterns
1207.It
1208CPU speed
1209.El
1210.Pp
1211As with most performance tuning, benchmarking under real-life loads
1212may be the only way to measure expected performance.
1213Understanding some of the underlying technology is also useful in tuning.
1214The goal of this section is to provide pointers to those parameters which may
1215make significant differences in performance.
1216.Pp
1217For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
1218Since data in a RAID 1 set is arranged in a linear
1219fashion on each component, selecting an appropriate stripe size is
1220somewhat less critical than it is for a RAID 5 set.
1221However: a stripe size that is too small will cause large IO's to be
1222broken up into a number of smaller ones, hurting performance.
1223At the same time, a large stripe size may cause problems with concurrent
1224accesses to stripes, which may also affect performance.
1225Thus values in the range of 32 to 128 are often the most effective.
1226.Pp
1227Tuning RAID 5 sets is trickier.
1228In the best case, IO is presented to the RAID set one stripe at a time.
1229Since the entire stripe is available at the beginning of the IO,
1230the parity of that stripe can be calculated before the stripe is written,
1231and then the stripe data and parity can be written in parallel.
1232When the amount of data being written is less than a full stripe worth, the
1233.Sq small write
1234problem occurs.
1235Since a
1236.Sq small write
1237means only a portion of the stripe on the components is going to
1238change, the data (and parity) on the components must be updated
1239slightly differently.
1240First, the
1241.Sq old parity
1242and
1243.Sq old data
1244must be read from the components.
1245Then the new parity is constructed, using the new data to be written,
1246and the old data and old parity.
1247Finally, the new data and new parity are written.
1248All this extra data shuffling results in a serious loss of performance,
1249and is typically 2 to 4 times slower than a full stripe write (or read).
1250To combat this problem in the real world, it may be useful to ensure that
1251stripe sizes are small enough that a
1252.Sq large IO
1253from the system will use exactly one large stripe write.
1254As is seen later, there are some file system dependencies which may come
1255into play here as well.
1256.Pp
1257Since the size of a
1258.Sq large IO
1259is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
1260be desirable to select a SectPerSU value of 16 blocks (8K) or 32
1261blocks (16K).
1262Since there are 4 data sectors per stripe, the maximum
1263data per stripe is 64 blocks (32K) or 128 blocks (64K).
1264Again, empirical measurement will provide the best indicators of which
1265values will yield better performance.
1266.Pp
1267The parameters used for the file system are also critical to good
1268performance.
1269For
1270.Xr newfs 8 ,
1271for example, increasing the block size to 32K or 64K may improve
1272performance dramatically.
1273Also, changing the cylinders-per-group parameter from 16 to 32 or higher
1274is often not only necessary for larger file systems, but may also have
1275positive performance implications.
1276.Ss Summary
1277Despite the length of this man-page, configuring a RAID set is a
1278relatively straight-forward process.
1279All that needs to be done is the following steps:
1280.Bl -enum
1281.It
1282Use
1283.Xr disklabel 8
1284to create the components (of type RAID).
1285.It
1286Construct a RAID configuration file: e.g.\&
1287.Sq raid0.conf
1288.It
1289Configure the RAID set with:
1290.Bd -unfilled -offset indent
1291# raidctl -C raid0.conf raid0
1292.Ed
1293.Pp
1294.It
1295Initialize the component labels with:
1296.Bd -unfilled -offset indent
1297# raidctl -I 123456 raid0
1298.Ed
1299.Pp
1300.It
1301Initialize other important parts of the set with:
1302.Bd -unfilled -offset indent
1303# raidctl -i raid0
1304.Ed
1305.Pp
1306.It
1307Get the default label for the RAID set:
1308.Bd -unfilled -offset indent
1309# disklabel raid0 > /tmp/label
1310.Ed
1311.Pp
1312.It
1313Edit the label:
1314.Bd -unfilled -offset indent
1315# vi /tmp/label
1316.Ed
1317.Pp
1318.It
1319Put the new label on the RAID set:
1320.Bd -unfilled -offset indent
1321# disklabel -R -r raid0 /tmp/label
1322.Ed
1323.Pp
1324.It
1325Create the file system:
1326.Bd -unfilled -offset indent
1327# newfs /dev/rraid0e
1328.Ed
1329.Pp
1330.It
1331Mount the file system:
1332.Bd -unfilled -offset indent
1333# mount /dev/raid0e /mnt
1334.Ed
1335.Pp
1336.It
1337Use:
1338.Bd -unfilled -offset indent
1339# raidctl -c raid0.conf raid0
1340.Ed
1341.Pp
1342to re-configure the RAID set the next time it is needed, or put
1343raid0.conf into
1344.Pa /etc
1345where it will automatically be started by the
1346.Pa /etc/rc
1347scripts.
1348.El
1349.Sh WARNINGS
1350Certain RAID levels (1, 4, 5, 6, and others) can protect against some
1351data loss due to component failure.
1352However the loss of two components of a RAID 4 or 5 system, or the loss
1353of a single component of a RAID 0 system will result in the entire
1354filesystem being lost.
1355RAID is
1356.Em NOT
1357a substitute for good backup practices.
1358.Pp
1359Recomputation of parity
1360.Em MUST
1361be performed whenever there is a chance that it may have been
1362compromised.
1363This includes after system crashes, or before a RAID
1364device has been used for the first time.
1365Failure to keep parity correct will be catastrophic should a component
1366ever fail -- it is better to use RAID 0 and get the additional space
1367and speed, than it is to use parity, but not keep the parity correct.
1368At least with RAID 0 there is no perception of increased data security.
1369.Sh FILES
1370.Bl -tag -width /dev/XXrXraidX -compact
1371.It Pa /dev/{,r}raid*
1372.Cm raid
1373device special files.
1374.El
1375.Sh SEE ALSO
1376.Xr ccd 4 ,
1377.Xr raid 4 ,
1378.Xr rc 8
1379.Sh HISTORY
1380RAIDframe is a framework for rapid prototyping of RAID structures
1381developed by the folks at the Parallel Data Laboratory at Carnegie
1382Mellon University (CMU).
1383A more complete description of the internals and functionality of
1384RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
1385for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
1386Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
1387Parallel Data Laboratory of Carnegie Mellon University.
1388.Pp
1389The
1390.Nm
1391command first appeared as a program in CMU's RAIDframe v1.1 distribution.
1392This version of
1393.Nm
1394is a complete re-write, and first appeared in
1395.Nx 1.4
1396from where it was ported to
1397.Ox 2.5 .
1398.Sh BUGS
1399Hot-spare removal is currently not available.
1400.Sh COPYRIGHT
1401.Bd -unfilled
1402The RAIDframe Copyright is as follows:
1403
1404Copyright (c) 1994-1996 Carnegie-Mellon University.
1405All rights reserved.
1406
1407Permission to use, copy, modify and distribute this software and
1408its documentation is hereby granted, provided that both the copyright
1409notice and this permission notice appear in all copies of the
1410software, derivative works or modified versions, and any portions
1411thereof, and that both notices appear in supporting documentation.
1412
1413CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
1414CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
1415FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
1416
1417Carnegie Mellon requests users of this software to return to
1418
1419 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
1420 School of Computer Science
1421 Carnegie Mellon University
1422 Pittsburgh PA 15213-3890
1423
1424any improvements or extensions that they make and grant Carnegie the
1425rights to redistribute these changes.
1426
1427.Ed
1428