1.\"	$MirOS: src/usr.bin/awk/awk.1,v 1.6 2014/03/23 20:18:27 tg Exp $
2.\"	$OpenBSD: awk.1,v 1.40 2011/05/02 11:14:11 jmc Exp $
3.\"
4.\" Copyright (C) Lucent Technologies 1997
5.\" All Rights Reserved
6.\"
7.\" Permission to use, copy, modify, and distribute this software and
8.\" its documentation for any purpose and without fee is hereby
9.\" granted, provided that the above copyright notice appear in all
10.\" copies and that both that the copyright notice and this
11.\" permission notice and warranty disclaimer appear in supporting
12.\" documentation, and that the name Lucent Technologies or any of
13.\" its entities not be used in advertising or publicity pertaining
14.\" to distribution of the software without specific, written prior
15.\" permission.
16.\"
17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
24.\" THIS SOFTWARE.
25.\"
26.Dd $Mdocdate: March 23 2014 $
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl safe
35.Op Fl V
36.Op Fl d Ns Op Ar n
37.Op Fl F Ar fs
38.Op Fl v Ar var Ns = Ns Ar value
39.Op Ar prog | Fl f Ar progfile
40.Ar
41.Sh DESCRIPTION
42.Nm
43scans each input
44.Ar file
45for lines that match any of a set of patterns specified literally in
46.Ar prog
47or in one or more files specified as
48.Fl f Ar progfile .
49With each pattern there can be an associated action that will be performed
50when a line of a
51.Ar file
52matches the pattern.
53Each line is matched against the
54pattern portion of every pattern-action statement;
55the associated action is performed for each matched pattern.
56The file name
57.Sq -
58means the standard input.
59Any
60.Ar file
61of the form
62.Ar var Ns = Ns Ar value
63is treated as an assignment, not a filename,
64and is executed at the time it would have been opened if it were a filename.
65.Pp
66The options are as follows:
67.Bl -tag -width "-safe "
68.It Fl d Ns Op Ar n
69Debug mode.
70Set debug level to
71.Ar n ,
72or 1 if
73.Ar n
74is not specified.
75A value greater than 1 causes
76.Nm
77to dump core on fatal errors.
78.It Fl F Ar fs
79Define the input field separator to be the regular expression
80.Ar fs .
81.It Fl f Ar progfile
82Read program code from the specified file
83.Ar progfile
84instead of from the command line.
85.It Fl safe
86Disable file output
87.Pf ( Ic print No > ,
88.Ic print No >> ) ,
89process creation
90.Po
91.Ar cmd | Ic getline ,
92.Ic print | ,
93.Ic system
94.Pc
95and access to the environment
96.Pf ( Va ENVIRON ;
97see the section on variables below).
98This is a first
99.Pq and not very reliable
100approximation to a
101.Dq safe
102version of
103.Nm .
104.It Fl V
105Print the version number of
106.Nm
107to standard output and exit.
108.It Fl v Ar var Ns = Ns Ar value
109Assign
110.Ar value
111to variable
112.Ar var
113before
114.Ar prog
115is executed;
116any number of
117.Fl v
118options may be present.
119.El
120.Pp
121The input is normally made up of input lines
122.Pq records
123separated by newlines, or by the value of
124.Va RS .
125If
126.Va RS
127is null, then any number of blank lines are used as the record separator,
128and newlines are used as field separators
129(in addition to the value of
130.Va FS ) .
131This is convenient when working with multi-line records.
132.Pp
133An input line is normally made up of fields separated by whitespace,
134or by the regular expression
135.Va FS .
136The fields are denoted
137.Va $1 , $2 , ... ,
138while
139.Va $0
140refers to the entire line.
141If
142.Va FS
143is null, the input line is split into one field per character.
144.Pp
145Normally, any number of blanks separate fields.
146In order to set the field separator to a single blank, use the
147.Fl F
148option with a value of
149.Sq [\ \&] .
150If a field separator of
151.Sq t
152is specified,
153.Nm
154treats it as if
155.Sq \et
156had been specified and uses
157.Aq TAB
158as the field separator.
159In order to use a literal
160.Sq t
161as the field separator, use the
162.Fl F
163option with a value of
164.Sq [t] .
165.Pp
166A pattern-action statement has the form
167.Pp
168.D1 Ar pattern Ic \&{ Ar action Ic \&}
169.Pp
170A missing
171.Ic \&{ Ar action Ic \&}
172means print the line;
173a missing pattern always matches.
174Pattern-action statements are separated by newlines or semicolons.
175.Pp
176Newlines are permitted after a terminating statement or following a comma
177.Pq Sq ,\& ,
178an open brace
179.Pq Sq { ,
180a logical AND
181.Pq Sq && ,
182a logical OR
183.Pq Sq || ,
184after the
185.Sq do
186or
187.Sq else
188keywords,
189or after the closing parenthesis of an
190.Sq if ,
191.Sq for ,
192or
193.Sq while
194statement.
195Additionally, a backslash
196.Pq Sq \e
197can be used to escape a newline between tokens.
198.Pp
199An action is a sequence of statements.
200A statement can be one of the following:
201.Pp
202.Bl -tag -width Ds -offset indent -compact
203.It Xo Ic if ( Ar expression ) Ar statement
204.Op Ic else Ar statement
205.Xc
206.It Ic while ( Ar expression ) Ar statement
207.It Xo Ic for
208.No ( Ar expression ; expression ; expression ) statement
209.Xc
210.It Xo Ic for
211.No ( Ar var Ic in Ar array ) statement
212.Xc
213.It Xo Ic do
214.Ar statement Ic while ( Ar expression )
215.Xc
216.It Ic break
217.It Ic continue
218.It Xo Ic {
219.Op Ar statement ...
220.Ic }
221.Xc
222.It Xo Ar expression
223.No # commonly
224.Ar var No = Ar expression
225.Xc
226.It Xo Ic print
227.Op Ar expression-list
228.Op > Ns Ar expression
229.Xc
230.It Xo Ic printf Ar format
231.Op Ar ... , expression-list
232.Op > Ns Ar expression
233.Xc
234.It Ic return Op Ar expression
235.It Xo Ic next
236.No # skip remaining patterns on this input line
237.Xc
238.It Xo Ic nextfile
239.No # skip rest of this file, open next, start at top
240.Xc
241.It Xo Ic delete
242.Sm off
243.Ar array Ic \&[ Ar expression Ic \&]
244.Sm on
245.No \ # delete an array element
246.Xc
247.It Xo Ic delete Ar array
248.No # delete all elements of array
249.Xc
250.It Xo Ic exit
251.Op Ar expression
252.No # exit immediately; status is Ar expression
253.Xc
254.El
255.Pp
256Statements are terminated by
257semicolons, newlines or right braces.
258An empty
259.Ar expression-list
260stands for
261.Ar $0 .
262String constants are quoted
263.Li \&"" ,
264with the usual C escapes recognized within
265(see
266.Xr printf 1
267for a complete list of these).
268Expressions take on string or numeric values as appropriate,
269and are built using the operators
270.Ic + \- * / % ^
271.Pq exponentiation ,
272and concatenation
273.Pq indicated by whitespace .
274The operators
275.Ic \&! ++ \-\- += \-= *= /= %= ^=
276.Ic > >= < <= == != ?:
277are also available in expressions.
278Variables may be scalars, array elements
279(denoted
280.Li x[i] )
281or fields.
282Variables are initialized to the null string.
283Array subscripts may be any string,
284not necessarily numeric;
285this allows for a form of associative memory.
286Multiple subscripts such as
287.Li [i,j,k]
288are permitted; the constituents are concatenated,
289separated by the value of
290.Va SUBSEP
291.Pq see the section on variables below .
292.Pp
293The
294.Ic print
295statement prints its arguments on the standard output
296(or on a file if
297.Pf > Ns Ar file
298or
299.Pf >> Ns Ar file
300is present or on a pipe if
301.Pf |\ \& Ar cmd
302is present), separated by the current output field separator,
303and terminated by the output record separator.
304.Ar file
305and
306.Ar cmd
307may be literal names or parenthesized expressions;
308identical string values in different statements denote
309the same open file.
310The
311.Ic printf
312statement formats its expression list according to the format
313(see
314.Xr printf 1 ) .
315.Pp
316Patterns are arbitrary Boolean combinations
317(with
318.Ic "\&! || &&" )
319of regular expressions and
320relational expressions.
321.Nm
322supports extended regular expressions
323.Pq EREs .
324See
325.Xr re_format 7
326for more information on regular expressions.
327Isolated regular expressions
328in a pattern apply to the entire line.
329Regular expressions may also occur in
330relational expressions, using the operators
331.Ic ~
332and
333.Ic !~ .
334.Pf / Ns Ar re Ns /
335is a constant regular expression;
336any string (constant or variable) may be used
337as a regular expression, except in the position of an isolated regular expression
338in a pattern.
339.Pp
340A pattern may consist of two patterns separated by a comma;
341in this case, the action is performed for all lines
342from an occurrence of the first pattern
343through an occurrence of the second.
344.Pp
345A relational expression is one of the following:
346.Pp
347.Bl -tag -width Ds -offset indent -compact
348.It Ar expression matchop regular-expression
349.It Ar expression relop expression
350.It Ar expression Ic in Ar array-name
351.It Xo Ic \&( Ns
352.Ar expr , expr , \&... Ns Ic \&) in
353.Ar array-name
354.Xc
355.El
356.Pp
357where a
358.Ar relop
359is any of the six relational operators in C, and a
360.Ar matchop
361is either
362.Ic ~
363(matches)
364or
365.Ic !~
366(does not match).
367A conditional is an arithmetic expression,
368a relational expression,
369or a Boolean combination
370of these.
371.Pp
372The special patterns
373.Ic BEGIN
374and
375.Ic END
376may be used to capture control before the first input line is read
377and after the last.
378.Ic BEGIN
379and
380.Ic END
381do not combine with other patterns.
382.Pp
383Variable names with special meanings:
384.Pp
385.Bl -tag -width "FILENAME " -compact
386.It Va ARGC
387Argument count, assignable.
388.It Va ARGV
389Argument array, assignable;
390non-null members are taken as filenames.
391.It Va CONVFMT
392Conversion format when converting numbers
393(default
394.Qq Li %.6g ) .
395.It Va ENVIRON
396Array of environment variables; subscripts are names.
397.It Va FILENAME
398The name of the current input file.
399.It Va FNR
400Ordinal number of the current record in the current file.
401.It Va FS
402Regular expression used to separate fields; also settable
403by option
404.Fl F Ar fs .
405.It Va NF
406Number of fields in the current record.
407.Va $NF
408can be used to obtain the value of the last field in the current record.
409.It Va NR
410Ordinal number of the current record.
411.It Va OFMT
412Output format for numbers (default
413.Qq Li %.6g ) .
414.It Va OFS
415Output field separator (default blank).
416.It Va ORS
417Output record separator (default newline).
418.It Va RLENGTH
419The length of the string matched by the
420.Fn match
421function.
422.It Va RS
423Input record separator (default newline).
424.It Va RSTART
425The starting position of the string matched by the
426.Fn match
427function.
428.It Va SUBSEP
429Separates multiple subscripts (default 034).
430.El
431.Sh FUNCTIONS
432The awk language has a variety of built-in functions:
433arithmetic, string, input/output, general, and bit-operation.
434.Pp
435Functions may be defined (at the position of a pattern-action statement)
436thusly:
437.Pp
438.Dl function foo(a, b, c) { ...; return x }
439.Pp
440Parameters are passed by value if scalar, and by reference if array name;
441functions may be called recursively.
442Parameters are local to the function; all other variables are global.
443Thus local variables may be created by providing excess parameters in
444the function definition.
445.Ss Arithmetic Functions
446.Bl -tag -width "atan2(y, x)"
447.It Fn atan2 y x
448Return the arctangent of
449.Fa y Ns / Ns Fa x
450in radians.
451.It Fn cos x
452Return the cosine of
453.Fa x ,
454where
455.Fa x
456is in radians.
457.It Fn exp x
458Return the exponential of
459.Fa x .
460.It Fn int x
461Return
462.Fa x
463truncated to an integer value.
464.It Fn log x
465Return the natural logarithm of
466.Fa x .
467.It Fn rand
468Return a random number,
469.Fa n ,
470such that
471.Sm off
472.Pf 0 \*(Le Fa n No \*(Lt 1 .
473.Sm on
474No further guarantees are made, especially not on the
475reproducibility of the random values, even in the face of
476.Fn srand .
477.It Fn sin x
478Return the sine of
479.Fa x ,
480where
481.Fa x
482is in radians.
483.It Fn sqrt x
484Return the square root of
485.Fa x .
486.It Fn srand expr
487Sets seed for
488.Fn rand
489to
490.Fa expr
491and returns the previous seed.
492If
493.Fa expr
494is omitted, the time of day is used instead.
495This statement is ignored in this implementation.
496.El
497.Ss String Functions
498.Bl -tag -width "split(s, a, fs)"
499.It Fn gsub r t s
500The same as
501.Fn sub
502except that all occurrences of the regular expression are replaced.
503.Fn gsub
504returns the number of replacements.
505.It Fn index s t
506The position in
507.Fa s
508where the string
509.Fa t
510occurs, or 0 if it does not.
511.It Fn length s
512The length of
513.Fa s
514taken as a string,
515or of
516.Va $0
517if no argument is given.
518.It Fn match s r
519The position in
520.Fa s
521where the regular expression
522.Fa r
523occurs, or 0 if it does not.
524The variable
525.Va RSTART
526is set to the starting position of the matched string
527.Pq which is the same as the returned value
528or zero if no match is found.
529The variable
530.Va RLENGTH
531is set to the length of the matched string,
532or \-1 if no match is found.
533.It Fn split s a fs
534Splits the string
535.Fa s
536into array elements
537.Va a[1] , a[2] , ... , a[n]
538and returns
539.Va n .
540The separation is done with the regular expression
541.Ar fs
542or with the field separator
543.Va FS
544if
545.Ar fs
546is not given.
547An empty string as field separator splits the string
548into one array element per character.
549.It Fn sprintf fmt expr ...
550The string resulting from formatting
551.Fa expr , ...
552according to the
553.Xr printf 1
554format
555.Fa fmt .
556.It Fn sub r t s
557Substitutes
558.Fa t
559for the first occurrence of the regular expression
560.Fa r
561in the string
562.Fa s .
563If
564.Fa s
565is not given,
566.Va $0
567is used.
568An ampersand
569.Pq Sq &
570in
571.Fa t
572is replaced in string
573.Fa s
574with regular expression
575.Fa r .
576A literal ampersand can be specified by preceding it with two backslashes
577.Pq Sq \e\e .
578A literal backslash can be specified by preceding it with another backslash
579.Pq Sq \e\e .
580.Fn sub
581returns the number of replacements.
582.It Fn substr s m n
583Return at most the
584.Fa n Ns -character
585substring of
586.Fa s
587that begins at position
588.Fa m
589counted from 1.
590If
591.Fa n
592is omitted, or if
593.Fa n
594specifies more characters than are left in the string,
595the length of the substring is limited by the length of
596.Fa s .
597.It Fn tolower str
598Returns a copy of
599.Fa str
600with all upper-case characters translated to their
601corresponding lower-case equivalents.
602.It Fn toupper str
603Returns a copy of
604.Fa str
605with all lower-case characters translated to their
606corresponding upper-case equivalents.
607.El
608.Ss Input/Output and General Functions
609.Bl -tag -width "getline [var] < file"
610.It Fn close expr
611Closes the file or pipe
612.Fa expr .
613.Fa expr
614should match the string that was used to open the file or pipe.
615.It Ar cmd | Ic getline Op Va var
616Read a record of input from a stream piped from the output of
617.Ar cmd .
618If
619.Va var
620is omitted, the variables
621.Va $0
622and
623.Va NF
624are set.
625Otherwise
626.Va var
627is set.
628If the stream is not open, it is opened.
629As long as the stream remains open, subsequent calls
630will read subsequent records from the stream.
631The stream remains open until explicitly closed with a call to
632.Fn close .
633.Ic getline
634returns 1 for a successful input, 0 for end of file, and \-1 for an error.
635.It Fn fflush [expr]
636Flushes any buffered output for the file or pipe
637.Fa expr ,
638or all open files or pipes if
639.Fa expr
640is omitted.
641.Fa expr
642should match the string that was used to open the file or pipe.
643.It Ic getline
644Sets
645.Va $0
646to the next input record from the current input file.
647This form of
648.Ic getline
649sets the variables
650.Va NF ,
651.Va NR ,
652and
653.Va FNR .
654.Ic getline
655returns 1 for a successful input, 0 for end of file, and \-1 for an error.
656.It Ic getline Va var
657Sets
658.Va $0
659to variable
660.Va var .
661This form of
662.Ic getline
663sets the variables
664.Va NR
665and
666.Va FNR .
667.Ic getline
668returns 1 for a successful input, 0 for end of file, and \-1 for an error.
669.It Xo
670.Ic getline Op Va var
671.Pf \ \&< Ar file
672.Xc
673Sets
674.Va $0
675to the next record from
676.Ar file .
677If
678.Va var
679is omitted, the variables
680.Va $0
681and
682.Va NF
683are set.
684Otherwise
685.Va var
686is set.
687If
688.Ar file
689is not open, it is opened.
690As long as the stream remains open, subsequent calls will read subsequent
691records from
692.Ar file .
693.Ar file
694remains open until explicitly closed with a call to
695.Fn close .
696.It Fn system cmd
697Executes
698.Fa cmd
699and returns its exit status.
700.El
701.Ss Bit-Operation Functions
702.Bl -tag -width "lshift(a, b)"
703.It Fn compl x
704Returns the bitwise complement of integer argument x.
705.It Fn and x y
706Performs a bitwise AND on integer arguments x and y.
707.It Fn or x y
708Performs a bitwise OR on integer arguments x and y.
709.It Fn xor x y
710Performs a bitwise Exclusive-OR on integer arguments x and y.
711.It Fn lshift x n
712Returns integer argument x shifted by n bits to the left.
713.It Fn rshift x n
714Returns integer argument x shifted by n bits to the right.
715.El
716.Sh EXIT STATUS
717.Ex -std awk
718.Pp
719But note that the
720.Ic exit
721expression can modify the exit status.
722.Sh EXAMPLES
723Print lines longer than 72 characters:
724.Pp
725.Dl length($0) > 72
726.Pp
727Print first two fields in opposite order:
728.Pp
729.Dl { print $2, $1 }
730.Pp
731Same, with input fields separated by comma and/or blanks and tabs:
732.Bd -literal -offset indent
733BEGIN { FS = ",[ \et]*|[ \et]+" }
734      { print $2, $1 }
735.Ed
736.Pp
737Add up first column, print sum and average:
738.Bd -literal -offset indent
739{ s += $1 }
740END { print "sum is", s, " average is", s/NR }
741.Ed
742.Pp
743Print all lines between start/stop pairs:
744.Pp
745.Dl /start/, /stop/
746.Pp
747Simulate echo(1):
748.Bd -literal -offset indent
749BEGIN { # Simulate echo(1)
750        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
751        printf "\en"
752        exit }
753.Ed
754.Pp
755Print an error message to standard error:
756.Bd -literal -offset indent
757{ print "error!" > "/dev/stderr" }
758.Ed
759.Sh SEE ALSO
760.Xr lex 1 ,
761.Xr printf 1 ,
762.Xr sed 1 ,
763.Xr re_format 7 ,
764.Xr script 7
765.Rs
766.%A A. V. Aho
767.%A B. W. Kernighan
768.%A P. J. Weinberger
769.%T The AWK Programming Language
770.%I Addison-Wesley
771.%D 1988
772.%O ISBN 0-201-07981-X
773.Re
774.Sh STANDARDS
775The
776.Nm
777utility is compliant with the
778.St -p1003.1-2008
779specification.
780.Pp
781The flags
782.Op Fl \&dV
783and
784.Op Fl safe ,
785as well as the commands
786.Cm fflush , compl , and , or ,
787.Cm xor , lshift , rshift ,
788are extensions to that specification.
789.Pp
790.Nm
791does not support {n,m} pattern matching.
792.Sh HISTORY
793An
794.Nm
795utility appeared in
796.At v7 .
797.Sh BUGS
798There are no explicit conversions between numbers and strings.
799To force an expression to be treated as a number add 0 to it;
800to force it to be treated as a string concatenate
801.Li \&""
802to it.
803.Pp
804The scope rules for variables in functions are a botch;
805the syntax is worse.
806