1.\"	$OpenBSD: raid.4,v 1.26 2003/04/01 11:15:12 jmc Exp $
2.\"     $NetBSD: raid.4,v 1.20 2001/09/22 16:03:58 wiz Exp $
3.\"
4.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
5.\" All rights reserved.
6.\"
7.\" This code is derived from software contributed to The NetBSD Foundation
8.\" by Greg Oster
9.\"
10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions
12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright
14.\"    notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\"    notice, this list of conditions and the following disclaimer in the
17.\"    documentation and/or other materials provided with the distribution.
18.\" 3. All advertising materials mentioning features or use of this software
19.\"    must display the following acknowledgement:
20.\"        This product includes software developed by the NetBSD
21.\"        Foundation, Inc. and its contributors.
22.\" 4. Neither the name of The NetBSD Foundation nor the names of its
23.\"    contributors may be used to endorse or promote products derived
24.\"    from this software without specific prior written permission.
25.\"
26.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
27.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
28.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
29.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
30.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
31.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
32.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
33.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
34.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
35.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
36.\" POSSIBILITY OF SUCH DAMAGE.
37.\"
38.\"
39.\" Copyright (c) 1995 Carnegie-Mellon University.
40.\" All rights reserved.
41.\"
42.\" Author: Mark Holland
43.\"
44.\" Permission to use, copy, modify and distribute this software and
45.\" its documentation is hereby granted, provided that both the copyright
46.\" notice and this permission notice appear in all copies of the
47.\" software, derivative works or modified versions, and any portions
48.\" thereof, and that both notices appear in supporting documentation.
49.\"
50.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
51.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
52.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
53.\"
54.\" Carnegie Mellon requests users of this software to return to
55.\"
56.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
57.\"  School of Computer Science
58.\"  Carnegie Mellon University
59.\"  Pittsburgh PA 15213-3890
60.\"
61.\" any improvements or extensions that they make and grant Carnegie the
62.\" rights to redistribute these changes.
63.\"
64.Dd November 9, 1998
65.Dt RAID 4
66.Os
67.Sh NAME
68.Nm raid
69.Nd RAIDframe disk driver
70.Sh SYNOPSIS
71.Cd "pseudo-device raid" Op Ar count
72.Sh DESCRIPTION
73The
74.Nm
75driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
76.Ox .
77This
78document assumes that the reader has at least some familiarity with RAID
79and RAID concepts.
80The reader is also assumed to know how to configure
81disks and pseudo-devices into kernels, how to generate kernels, and how
82to partition disks.
83.Pp
84RAIDframe provides a number of different RAID levels including:
85.Bl -tag -width indent
86.It RAID 0
87provides simple data striping across the components.
88.It RAID 1
89provides mirroring.
90.It RAID 4
91provides data striping across the components, with parity
92stored on a dedicated drive (in this case, the last component).
93.It RAID 5
94provides data striping across the components, with parity
95distributed across all the components.
96.El
97.Pp
98There are a wide variety of other RAID levels supported by RAIDframe,
99including Even-Odd parity, RAID level 5 with rotated sparing, Chained
100declustering, and Interleaved declustering.
101The reader is referred to the RAIDframe documentation mentioned in the
102.Sx HISTORY
103section for more detail on these various RAID configurations.
104.Pp
105Depending on the parity level configured, the device driver can
106support the failure of component drives.
107The number of failures allowed depends on the parity level selected.
108If the driver is able to handle drive failures, and a drive does fail,
109then the system is operating in "degraded mode".
110In this mode, all missing data must be reconstructed from the data and
111parity present on the other components.
112This results in much slower data accesses, but does mean that a failure
113need not bring the system to a complete halt.
114.Pp
115The RAID driver supports and enforces the use of
116.Sq component labels .
117A
118.Sq component label
119contains important information about the component, including a
120user-specified serial number, the row and column of that component in
121the RAID set, and whether the data (and parity) on the component is
122.Sq clean .
123If the driver determines that the labels are very inconsistent with
124respect to each other (e.g. two or more serial numbers do not match)
125or that the component label is not consistent with its assigned place
126in the set (e.g., the component label claims the component should be
127the 3rd one of a 6-disk set, but the RAID set has it as the 3rd component
128in a 5-disk set) then the device will fail to configure.
129If the driver determines that exactly one component label seems to be
130incorrect, and the RAID set is being configured as a set that supports
131a single failure, then the RAID set will be allowed to configure, but
132the incorrectly labeled component will be marked as
133.Sq failed ,
134and the RAID set will begin operation in degraded mode.
135If all of the components are consistent among themselves, the RAID set
136will configure normally.
137.Pp
138Component labels are also used to support the auto-detection and
139auto-configuration of RAID sets.
140A RAID set can be flagged as auto-configurable, in which case it will be
141configured automatically during the kernel boot process.
142RAID filesystems which are
143automatically configured are also eligible to be the root filesystem.
144There is currently no support for booting a kernel directly from a RAID
145set.
146To use a RAID set as the root filesystem, a kernel is usually
147obtained from a small non-RAID partition, after which any
148auto-configuring RAID set can be used for the root filesystem.
149See
150.Xr raidctl 8
151for more information on auto-configuration of RAID sets.
152.Pp
153The driver supports
154.Sq hot spares ,
155disks which are on-line, but are not actively used in an existing
156filesystem.
157Should a disk fail, the driver is capable of reconstructing
158the failed disk onto a hot spare or back onto a replacement drive.
159If the components are hot swapable, the failed disk can then be
160removed, a new disk put in its place, and a copyback operation
161performed.
162The copyback operation, as its name indicates, will copy
163the reconstructed data from the hot spare to the previously failed
164(and now replaced) disk.
165Hot spares can also be hot-added using
166.Xr raidctl 8 .
167.Pp
168If a component cannot be detected when the RAID device is configured,
169that component will be simply marked as 'failed'.
170.Pp
171The user-land utility for doing all
172.Nm
173configuration and other operations
174is
175.Xr raidctl 8 .
176Most importantly,
177.Xr raidctl 8
178must be used with the
179.Fl i
180option to initialize all RAID sets.
181In particular, this initialization includes re-building the parity data.
182This rebuilding of parity data is also required when either a) a new RAID
183device is brought up for the first time or b) after an un-clean shutdown of a
184RAID device.
185By using the
186.Fl P
187option to
188.Xr raidctl 8 ,
189and performing this on-demand recomputation of all parity
190before doing a
191.Xr fsck 8
192or a
193.Xr newfs 8 ,
194filesystem integrity and parity integrity can be ensured.
195It bears repeating again that parity recomputation is
196.Ar required
197before any filesystems are created or used on the RAID device.
198If the parity is not correct, then missing data cannot be correctly recovered.
199.Pp
200RAID levels may be combined in a hierarchical fashion.
201For example, a RAID 0 device can be constructed out of a number of RAID 5
202devices (which, in turn, may be constructed out of the physical disks,
203or of other RAID devices).
204.Pp
205It is important that drives be hard-coded at their respective
206addresses (i.e., not left free-floating, where a drive with SCSI ID of
2074 can end up as
208.Pa /dev/sd0c )
209for well-behaved functioning of the RAID device.
210This is true for all types of drives, including IDE, HP-IB, etc.
211For normal SCSI drives, for example, the following can be used
212to fix the device addresses:
213.Bd -unfilled -offset indent
214sd0     at scsibus0 target 0 lun ?      # SCSI disk drives
215sd1     at scsibus0 target 1 lun ?      # SCSI disk drives
216sd2     at scsibus0 target 2 lun ?      # SCSI disk drives
217sd3     at scsibus0 target 3 lun ?      # SCSI disk drives
218sd4     at scsibus0 target 4 lun ?      # SCSI disk drives
219sd5     at scsibus0 target 5 lun ?      # SCSI disk drives
220sd6     at scsibus0 target 6 lun ?      # SCSI disk drives
221.Ed
222.Pp
223See
224.Xr sd 4
225for more information.
226The rationale for fixing the device addresses is as follows:
227Consider a system with three SCSI drives at SCSI ID's 4, 5, and 6,
228and which map to components
229.Pa /dev/sd0e , /dev/sd1e ,
230and
231.Pa /dev/sd2e
232of a RAID 5 set.
233If the drive with SCSI ID 5 fails, and the system reboots, the old
234.Pa /dev/sd2e
235will show up as
236.Pa /dev/sd1e .
237The RAID driver is able to detect that component positions have changed, and
238will not allow normal configuration.
239If the device addresses are hard
240coded, however, the RAID driver would detect that the middle component
241is unavailable, and bring the RAID 5 set up in degraded mode.
242Note that the auto-detection and auto-configuration code does not care
243about where the components live.
244The auto-configuration code will
245correctly configure a device even after any number of the components
246have been re-arranged.
247.Pp
248The first step to using the
249.Nm
250driver is to ensure that it is suitably configured in the kernel.
251This is done by adding a line similar to:
252.Bd -unfilled -offset indent
253pseudo-device   raid   4       # RAIDframe disk device
254.Ed
255.Pp
256to the kernel configuration file.
257The
258.Sq count
259argument (
260.Sq 4 ,
261in this case), specifies the number of RAIDframe drivers to configure.
262To turn on component auto-detection and auto-configuration of RAID
263sets, simply add:
264.Bd -unfilled -offset indent
265option	RAID_AUTOCONFIG
266.Ed
267.Pp
268to the kernel configuration file.
269.Pp
270All component partitions must be of the type
271.Dv FS_BSDFFS
272(e.g., 4.2BSD) or
273.Dv FS_RAID
274(e.g., RAID).
275The use of the latter is strongly encouraged, and is
276required if auto-configuration of the RAID set is desired.
277Since RAIDframe leaves room for disklabels, RAID components can be simply
278raw disks, or partitions which use an entire disk.
279Note that some platforms (such as SUN) do not allow using the FS_RAID
280partition type.
281On these platforms, the
282.Nm
283driver can still auto-configure from FS_BSDFFS partitions.
284.Pp
285A more detailed treatment of actually using a
286.Nm
287device is found in
288.Xr raidctl 8 .
289It is highly recommended that the steps to reconstruct, copyback, and
290re-compute parity are well understood by the system administrator(s)
291.Ar before
292a component failure.
293Doing the wrong thing when a component fails may result in data loss.
294.Pp
295Additional debug information can be sent to the console by specifying:
296.Bd -unfilled -offset indent
297option	RAIDDEBUG
298.Ed
299.Sh WARNINGS
300Certain RAID levels (1, 4, 5, 6, and others) can protect against some
301data loss due to component failure.
302However the loss of two components of a RAID 4 or 5 system, or the loss
303of a single component of a RAID 0 system, will result in the entire
304filesystems on that RAID device being lost.
305RAID is
306.Ar NOT
307a substitute for good backup practices.
308.Pp
309Recomputation of parity
310.Ar MUST
311be performed whenever there is a chance that it may have been
312compromised.
313This includes after system crashes, or before a RAID
314device has been used for the first time.
315Failure to keep parity correct will be catastrophic should a component
316ever fail -- it is better to use RAID 0 and get the additional space and
317speed, than it is to use parity, but not keep the parity correct.
318At least with RAID 0 there is no perception of increased data security.
319.Sh FILES
320.Bl -tag -width /dev/XXrXraidX -compact
321.It Pa /dev/{,r}raid*
322.Nm
323device special files.
324.El
325.Sh SEE ALSO
326.Xr ccd 4 ,
327.Xr sd 4 ,
328.Xr wd 4 ,
329.Xr MAKEDEV 8 ,
330.Xr config 8 ,
331.Xr fsck 8 ,
332.Xr mount 8 ,
333.Xr newfs 8 ,
334.Xr raidctl 8
335.Sh HISTORY
336The
337.Nm
338driver in
339.Ox
340is a port of RAIDframe, a framework for rapid prototyping of RAID
341structures developed by the folks at the Parallel Data Laboratory at
342Carnegie Mellon University (CMU).
343RAIDframe, as originally distributed
344by CMU, provides a RAID simulator for a number of different
345architectures, and a user-level device driver and a kernel device
346driver for Digital UNIX.
347The
348.Nm
349driver is a kernelized version of RAIDframe v1.1.
350.Pp
351A more complete description of the internals and functionality of
352RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
353for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
354Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
355Parallel Data Laboratory of Carnegie Mellon University.
356The
357.Nm
358driver first appeared in
359.Nx 1.4
360from where it was ported to
361.Ox 2.5 .
362.Sh COPYRIGHT
363.Bd -unfilled
364The RAIDframe Copyright is as follows:
365.Pp
366Copyright (c) 1994-1996 Carnegie-Mellon University.
367All rights reserved.
368.Pp
369Permission to use, copy, modify and distribute this software and
370its documentation is hereby granted, provided that both the copyright
371notice and this permission notice appear in all copies of the
372software, derivative works or modified versions, and any portions
373thereof, and that both notices appear in supporting documentation.
374.Pp
375CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
376CONDITION.
377CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES
378WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
379.Pp
380Carnegie Mellon requests users of this software to return to
381.Pp
382 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
383 School of Computer Science
384 Carnegie Mellon University
385 Pittsburgh PA 15213-3890
386.Pp
387any improvements or extensions that they make and grant Carnegie the
388rights to redistribute these changes.
389.Ed
390