1.\" $MirOS: src/usr.bin/awk/awk.1,v 1.6 2014/03/23 20:18:27 tg Exp $ 2.\" $OpenBSD: awk.1,v 1.40 2011/05/02 11:14:11 jmc Exp $ 3.\" 4.\" Copyright (C) Lucent Technologies 1997 5.\" All Rights Reserved 6.\" 7.\" Permission to use, copy, modify, and distribute this software and 8.\" its documentation for any purpose and without fee is hereby 9.\" granted, provided that the above copyright notice appear in all 10.\" copies and that both that the copyright notice and this 11.\" permission notice and warranty disclaimer appear in supporting 12.\" documentation, and that the name Lucent Technologies or any of 13.\" its entities not be used in advertising or publicity pertaining 14.\" to distribution of the software without specific, written prior 15.\" permission. 16.\" 17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 24.\" THIS SOFTWARE. 25.\" 26.Dd $Mdocdate: March 23 2014 $ 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl safe 35.Op Fl V 36.Op Fl d Ns Op Ar n 37.Op Fl F Ar fs 38.Op Fl v Ar var Ns = Ns Ar value 39.Op Ar prog | Fl f Ar progfile 40.Ar 41.Sh DESCRIPTION 42.Nm 43scans each input 44.Ar file 45for lines that match any of a set of patterns specified literally in 46.Ar prog 47or in one or more files specified as 48.Fl f Ar progfile . 49With each pattern there can be an associated action that will be performed 50when a line of a 51.Ar file 52matches the pattern. 53Each line is matched against the 54pattern portion of every pattern-action statement; 55the associated action is performed for each matched pattern. 56The file name 57.Sq - 58means the standard input. 59Any 60.Ar file 61of the form 62.Ar var Ns = Ns Ar value 63is treated as an assignment, not a filename, 64and is executed at the time it would have been opened if it were a filename. 65.Pp 66The options are as follows: 67.Bl -tag -width "-safe " 68.It Fl d Ns Op Ar n 69Debug mode. 70Set debug level to 71.Ar n , 72or 1 if 73.Ar n 74is not specified. 75A value greater than 1 causes 76.Nm 77to dump core on fatal errors. 78.It Fl F Ar fs 79Define the input field separator to be the regular expression 80.Ar fs . 81.It Fl f Ar progfile 82Read program code from the specified file 83.Ar progfile 84instead of from the command line. 85.It Fl safe 86Disable file output 87.Pf ( Ic print No > , 88.Ic print No >> ) , 89process creation 90.Po 91.Ar cmd | Ic getline , 92.Ic print | , 93.Ic system 94.Pc 95and access to the environment 96.Pf ( Va ENVIRON ; 97see the section on variables below). 98This is a first 99.Pq and not very reliable 100approximation to a 101.Dq safe 102version of 103.Nm . 104.It Fl V 105Print the version number of 106.Nm 107to standard output and exit. 108.It Fl v Ar var Ns = Ns Ar value 109Assign 110.Ar value 111to variable 112.Ar var 113before 114.Ar prog 115is executed; 116any number of 117.Fl v 118options may be present. 119.El 120.Pp 121The input is normally made up of input lines 122.Pq records 123separated by newlines, or by the value of 124.Va RS . 125If 126.Va RS 127is null, then any number of blank lines are used as the record separator, 128and newlines are used as field separators 129(in addition to the value of 130.Va FS ) . 131This is convenient when working with multi-line records. 132.Pp 133An input line is normally made up of fields separated by whitespace, 134or by the regular expression 135.Va FS . 136The fields are denoted 137.Va $1 , $2 , ... , 138while 139.Va $0 140refers to the entire line. 141If 142.Va FS 143is null, the input line is split into one field per character. 144.Pp 145Normally, any number of blanks separate fields. 146In order to set the field separator to a single blank, use the 147.Fl F 148option with a value of 149.Sq [\ \&] . 150If a field separator of 151.Sq t 152is specified, 153.Nm 154treats it as if 155.Sq \et 156had been specified and uses 157.Aq TAB 158as the field separator. 159In order to use a literal 160.Sq t 161as the field separator, use the 162.Fl F 163option with a value of 164.Sq [t] . 165.Pp 166A pattern-action statement has the form 167.Pp 168.D1 Ar pattern Ic \&{ Ar action Ic \&} 169.Pp 170A missing 171.Ic \&{ Ar action Ic \&} 172means print the line; 173a missing pattern always matches. 174Pattern-action statements are separated by newlines or semicolons. 175.Pp 176Newlines are permitted after a terminating statement or following a comma 177.Pq Sq ,\& , 178an open brace 179.Pq Sq { , 180a logical AND 181.Pq Sq && , 182a logical OR 183.Pq Sq || , 184after the 185.Sq do 186or 187.Sq else 188keywords, 189or after the closing parenthesis of an 190.Sq if , 191.Sq for , 192or 193.Sq while 194statement. 195Additionally, a backslash 196.Pq Sq \e 197can be used to escape a newline between tokens. 198.Pp 199An action is a sequence of statements. 200A statement can be one of the following: 201.Pp 202.Bl -tag -width Ds -offset indent -compact 203.It Xo Ic if ( Ar expression ) Ar statement 204.Op Ic else Ar statement 205.Xc 206.It Ic while ( Ar expression ) Ar statement 207.It Xo Ic for 208.No ( Ar expression ; expression ; expression ) statement 209.Xc 210.It Xo Ic for 211.No ( Ar var Ic in Ar array ) statement 212.Xc 213.It Xo Ic do 214.Ar statement Ic while ( Ar expression ) 215.Xc 216.It Ic break 217.It Ic continue 218.It Xo Ic { 219.Op Ar statement ... 220.Ic } 221.Xc 222.It Xo Ar expression 223.No # commonly 224.Ar var No = Ar expression 225.Xc 226.It Xo Ic print 227.Op Ar expression-list 228.Op > Ns Ar expression 229.Xc 230.It Xo Ic printf Ar format 231.Op Ar ... , expression-list 232.Op > Ns Ar expression 233.Xc 234.It Ic return Op Ar expression 235.It Xo Ic next 236.No # skip remaining patterns on this input line 237.Xc 238.It Xo Ic nextfile 239.No # skip rest of this file, open next, start at top 240.Xc 241.It Xo Ic delete 242.Sm off 243.Ar array Ic \&[ Ar expression Ic \&] 244.Sm on 245.No \ # delete an array element 246.Xc 247.It Xo Ic delete Ar array 248.No # delete all elements of array 249.Xc 250.It Xo Ic exit 251.Op Ar expression 252.No # exit immediately; status is Ar expression 253.Xc 254.El 255.Pp 256Statements are terminated by 257semicolons, newlines or right braces. 258An empty 259.Ar expression-list 260stands for 261.Ar $0 . 262String constants are quoted 263.Li \&"" , 264with the usual C escapes recognized within 265(see 266.Xr printf 1 267for a complete list of these). 268Expressions take on string or numeric values as appropriate, 269and are built using the operators 270.Ic + \- * / % ^ 271.Pq exponentiation , 272and concatenation 273.Pq indicated by whitespace . 274The operators 275.Ic \&! ++ \-\- += \-= *= /= %= ^= 276.Ic > >= < <= == != ?: 277are also available in expressions. 278Variables may be scalars, array elements 279(denoted 280.Li x[i] ) 281or fields. 282Variables are initialized to the null string. 283Array subscripts may be any string, 284not necessarily numeric; 285this allows for a form of associative memory. 286Multiple subscripts such as 287.Li [i,j,k] 288are permitted; the constituents are concatenated, 289separated by the value of 290.Va SUBSEP 291.Pq see the section on variables below . 292.Pp 293The 294.Ic print 295statement prints its arguments on the standard output 296(or on a file if 297.Pf > Ns Ar file 298or 299.Pf >> Ns Ar file 300is present or on a pipe if 301.Pf |\ \& Ar cmd 302is present), separated by the current output field separator, 303and terminated by the output record separator. 304.Ar file 305and 306.Ar cmd 307may be literal names or parenthesized expressions; 308identical string values in different statements denote 309the same open file. 310The 311.Ic printf 312statement formats its expression list according to the format 313(see 314.Xr printf 1 ) . 315.Pp 316Patterns are arbitrary Boolean combinations 317(with 318.Ic "\&! || &&" ) 319of regular expressions and 320relational expressions. 321.Nm 322supports extended regular expressions 323.Pq EREs . 324See 325.Xr re_format 7 326for more information on regular expressions. 327Isolated regular expressions 328in a pattern apply to the entire line. 329Regular expressions may also occur in 330relational expressions, using the operators 331.Ic ~ 332and 333.Ic !~ . 334.Pf / Ns Ar re Ns / 335is a constant regular expression; 336any string (constant or variable) may be used 337as a regular expression, except in the position of an isolated regular expression 338in a pattern. 339.Pp 340A pattern may consist of two patterns separated by a comma; 341in this case, the action is performed for all lines 342from an occurrence of the first pattern 343through an occurrence of the second. 344.Pp 345A relational expression is one of the following: 346.Pp 347.Bl -tag -width Ds -offset indent -compact 348.It Ar expression matchop regular-expression 349.It Ar expression relop expression 350.It Ar expression Ic in Ar array-name 351.It Xo Ic \&( Ns 352.Ar expr , expr , \&... Ns Ic \&) in 353.Ar array-name 354.Xc 355.El 356.Pp 357where a 358.Ar relop 359is any of the six relational operators in C, and a 360.Ar matchop 361is either 362.Ic ~ 363(matches) 364or 365.Ic !~ 366(does not match). 367A conditional is an arithmetic expression, 368a relational expression, 369or a Boolean combination 370of these. 371.Pp 372The special patterns 373.Ic BEGIN 374and 375.Ic END 376may be used to capture control before the first input line is read 377and after the last. 378.Ic BEGIN 379and 380.Ic END 381do not combine with other patterns. 382.Pp 383Variable names with special meanings: 384.Pp 385.Bl -tag -width "FILENAME " -compact 386.It Va ARGC 387Argument count, assignable. 388.It Va ARGV 389Argument array, assignable; 390non-null members are taken as filenames. 391.It Va CONVFMT 392Conversion format when converting numbers 393(default 394.Qq Li %.6g ) . 395.It Va ENVIRON 396Array of environment variables; subscripts are names. 397.It Va FILENAME 398The name of the current input file. 399.It Va FNR 400Ordinal number of the current record in the current file. 401.It Va FS 402Regular expression used to separate fields; also settable 403by option 404.Fl F Ar fs . 405.It Va NF 406Number of fields in the current record. 407.Va $NF 408can be used to obtain the value of the last field in the current record. 409.It Va NR 410Ordinal number of the current record. 411.It Va OFMT 412Output format for numbers (default 413.Qq Li %.6g ) . 414.It Va OFS 415Output field separator (default blank). 416.It Va ORS 417Output record separator (default newline). 418.It Va RLENGTH 419The length of the string matched by the 420.Fn match 421function. 422.It Va RS 423Input record separator (default newline). 424.It Va RSTART 425The starting position of the string matched by the 426.Fn match 427function. 428.It Va SUBSEP 429Separates multiple subscripts (default 034). 430.El 431.Sh FUNCTIONS 432The awk language has a variety of built-in functions: 433arithmetic, string, input/output, general, and bit-operation. 434.Pp 435Functions may be defined (at the position of a pattern-action statement) 436thusly: 437.Pp 438.Dl function foo(a, b, c) { ...; return x } 439.Pp 440Parameters are passed by value if scalar, and by reference if array name; 441functions may be called recursively. 442Parameters are local to the function; all other variables are global. 443Thus local variables may be created by providing excess parameters in 444the function definition. 445.Ss Arithmetic Functions 446.Bl -tag -width "atan2(y, x)" 447.It Fn atan2 y x 448Return the arctangent of 449.Fa y Ns / Ns Fa x 450in radians. 451.It Fn cos x 452Return the cosine of 453.Fa x , 454where 455.Fa x 456is in radians. 457.It Fn exp x 458Return the exponential of 459.Fa x . 460.It Fn int x 461Return 462.Fa x 463truncated to an integer value. 464.It Fn log x 465Return the natural logarithm of 466.Fa x . 467.It Fn rand 468Return a random number, 469.Fa n , 470such that 471.Sm off 472.Pf 0 \*(Le Fa n No \*(Lt 1 . 473.Sm on 474No further guarantees are made, especially not on the 475reproducibility of the random values, even in the face of 476.Fn srand . 477.It Fn sin x 478Return the sine of 479.Fa x , 480where 481.Fa x 482is in radians. 483.It Fn sqrt x 484Return the square root of 485.Fa x . 486.It Fn srand expr 487Sets seed for 488.Fn rand 489to 490.Fa expr 491and returns the previous seed. 492If 493.Fa expr 494is omitted, the time of day is used instead. 495This statement is ignored in this implementation. 496.El 497.Ss String Functions 498.Bl -tag -width "split(s, a, fs)" 499.It Fn gsub r t s 500The same as 501.Fn sub 502except that all occurrences of the regular expression are replaced. 503.Fn gsub 504returns the number of replacements. 505.It Fn index s t 506The position in 507.Fa s 508where the string 509.Fa t 510occurs, or 0 if it does not. 511.It Fn length s 512The length of 513.Fa s 514taken as a string, 515or of 516.Va $0 517if no argument is given. 518.It Fn match s r 519The position in 520.Fa s 521where the regular expression 522.Fa r 523occurs, or 0 if it does not. 524The variable 525.Va RSTART 526is set to the starting position of the matched string 527.Pq which is the same as the returned value 528or zero if no match is found. 529The variable 530.Va RLENGTH 531is set to the length of the matched string, 532or \-1 if no match is found. 533.It Fn split s a fs 534Splits the string 535.Fa s 536into array elements 537.Va a[1] , a[2] , ... , a[n] 538and returns 539.Va n . 540The separation is done with the regular expression 541.Ar fs 542or with the field separator 543.Va FS 544if 545.Ar fs 546is not given. 547An empty string as field separator splits the string 548into one array element per character. 549.It Fn sprintf fmt expr ... 550The string resulting from formatting 551.Fa expr , ... 552according to the 553.Xr printf 1 554format 555.Fa fmt . 556.It Fn sub r t s 557Substitutes 558.Fa t 559for the first occurrence of the regular expression 560.Fa r 561in the string 562.Fa s . 563If 564.Fa s 565is not given, 566.Va $0 567is used. 568An ampersand 569.Pq Sq & 570in 571.Fa t 572is replaced in string 573.Fa s 574with regular expression 575.Fa r . 576A literal ampersand can be specified by preceding it with two backslashes 577.Pq Sq \e\e . 578A literal backslash can be specified by preceding it with another backslash 579.Pq Sq \e\e . 580.Fn sub 581returns the number of replacements. 582.It Fn substr s m n 583Return at most the 584.Fa n Ns -character 585substring of 586.Fa s 587that begins at position 588.Fa m 589counted from 1. 590If 591.Fa n 592is omitted, or if 593.Fa n 594specifies more characters than are left in the string, 595the length of the substring is limited by the length of 596.Fa s . 597.It Fn tolower str 598Returns a copy of 599.Fa str 600with all upper-case characters translated to their 601corresponding lower-case equivalents. 602.It Fn toupper str 603Returns a copy of 604.Fa str 605with all lower-case characters translated to their 606corresponding upper-case equivalents. 607.El 608.Ss Input/Output and General Functions 609.Bl -tag -width "getline [var] < file" 610.It Fn close expr 611Closes the file or pipe 612.Fa expr . 613.Fa expr 614should match the string that was used to open the file or pipe. 615.It Ar cmd | Ic getline Op Va var 616Read a record of input from a stream piped from the output of 617.Ar cmd . 618If 619.Va var 620is omitted, the variables 621.Va $0 622and 623.Va NF 624are set. 625Otherwise 626.Va var 627is set. 628If the stream is not open, it is opened. 629As long as the stream remains open, subsequent calls 630will read subsequent records from the stream. 631The stream remains open until explicitly closed with a call to 632.Fn close . 633.Ic getline 634returns 1 for a successful input, 0 for end of file, and \-1 for an error. 635.It Fn fflush [expr] 636Flushes any buffered output for the file or pipe 637.Fa expr , 638or all open files or pipes if 639.Fa expr 640is omitted. 641.Fa expr 642should match the string that was used to open the file or pipe. 643.It Ic getline 644Sets 645.Va $0 646to the next input record from the current input file. 647This form of 648.Ic getline 649sets the variables 650.Va NF , 651.Va NR , 652and 653.Va FNR . 654.Ic getline 655returns 1 for a successful input, 0 for end of file, and \-1 for an error. 656.It Ic getline Va var 657Sets 658.Va $0 659to variable 660.Va var . 661This form of 662.Ic getline 663sets the variables 664.Va NR 665and 666.Va FNR . 667.Ic getline 668returns 1 for a successful input, 0 for end of file, and \-1 for an error. 669.It Xo 670.Ic getline Op Va var 671.Pf \ \&< Ar file 672.Xc 673Sets 674.Va $0 675to the next record from 676.Ar file . 677If 678.Va var 679is omitted, the variables 680.Va $0 681and 682.Va NF 683are set. 684Otherwise 685.Va var 686is set. 687If 688.Ar file 689is not open, it is opened. 690As long as the stream remains open, subsequent calls will read subsequent 691records from 692.Ar file . 693.Ar file 694remains open until explicitly closed with a call to 695.Fn close . 696.It Fn system cmd 697Executes 698.Fa cmd 699and returns its exit status. 700.El 701.Ss Bit-Operation Functions 702.Bl -tag -width "lshift(a, b)" 703.It Fn compl x 704Returns the bitwise complement of integer argument x. 705.It Fn and x y 706Performs a bitwise AND on integer arguments x and y. 707.It Fn or x y 708Performs a bitwise OR on integer arguments x and y. 709.It Fn xor x y 710Performs a bitwise Exclusive-OR on integer arguments x and y. 711.It Fn lshift x n 712Returns integer argument x shifted by n bits to the left. 713.It Fn rshift x n 714Returns integer argument x shifted by n bits to the right. 715.El 716.Sh EXIT STATUS 717.Ex -std awk 718.Pp 719But note that the 720.Ic exit 721expression can modify the exit status. 722.Sh EXAMPLES 723Print lines longer than 72 characters: 724.Pp 725.Dl length($0) > 72 726.Pp 727Print first two fields in opposite order: 728.Pp 729.Dl { print $2, $1 } 730.Pp 731Same, with input fields separated by comma and/or blanks and tabs: 732.Bd -literal -offset indent 733BEGIN { FS = ",[ \et]*|[ \et]+" } 734 { print $2, $1 } 735.Ed 736.Pp 737Add up first column, print sum and average: 738.Bd -literal -offset indent 739{ s += $1 } 740END { print "sum is", s, " average is", s/NR } 741.Ed 742.Pp 743Print all lines between start/stop pairs: 744.Pp 745.Dl /start/, /stop/ 746.Pp 747Simulate echo(1): 748.Bd -literal -offset indent 749BEGIN { # Simulate echo(1) 750 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 751 printf "\en" 752 exit } 753.Ed 754.Pp 755Print an error message to standard error: 756.Bd -literal -offset indent 757{ print "error!" > "/dev/stderr" } 758.Ed 759.Sh SEE ALSO 760.Xr lex 1 , 761.Xr printf 1 , 762.Xr sed 1 , 763.Xr re_format 7 , 764.Xr script 7 765.Rs 766.%A A. V. Aho 767.%A B. W. Kernighan 768.%A P. J. Weinberger 769.%T The AWK Programming Language 770.%I Addison-Wesley 771.%D 1988 772.%O ISBN 0-201-07981-X 773.Re 774.Sh STANDARDS 775The 776.Nm 777utility is compliant with the 778.St -p1003.1-2008 779specification. 780.Pp 781The flags 782.Op Fl \&dV 783and 784.Op Fl safe , 785as well as the commands 786.Cm fflush , compl , and , or , 787.Cm xor , lshift , rshift , 788are extensions to that specification. 789.Pp 790.Nm 791does not support {n,m} pattern matching. 792.Sh HISTORY 793An 794.Nm 795utility appeared in 796.At v7 . 797.Sh BUGS 798There are no explicit conversions between numbers and strings. 799To force an expression to be treated as a number add 0 to it; 800to force it to be treated as a string concatenate 801.Li \&"" 802to it. 803.Pp 804The scope rules for variables in functions are a botch; 805the syntax is worse. 806