1.\" $NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $ 2.\" 3.\" Copyright (c) 1989, 1990 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" Robert Paul Corbett. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. Neither the name of the University nor the names of its contributors 18.\" may be used to endorse or promote products derived from this software 19.\" without specific prior written permission. 20.\" 21.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.\" from: @(#)yacc.1 5.7 (Berkeley) 7/30/91 34.\" from: Id: yacc.1,v 1.24 2014/10/06 00:03:48 tom Exp 35.\" $NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $ 36.\" 37.Dd September 14, 2024 38.Dt YACC 1 39.Os 40.Sh NAME 41.Nm yacc 42.Nd an 43.Tn LALR Ns (1) 44parser generator 45.Sh SYNOPSIS 46.Nm 47.Op Fl BdhgilLPrtvVy 48.Op Fl b Ar file_prefix 49.Op Fl H Ar defines_file 50.Op Fl o Ar output_file 51.Op Fl p Ar symbol_prefix 52.Ar filename 53.Sh DESCRIPTION 54.Nm 55reads the grammar specification in the file 56.Ar filename 57and generates an 58.Tn LALR Ns (1) 59parser for it. 60The parsers consist of a set of 61.Tn LALR Ns (1) 62parsing tables and a driver routine 63written in the C programming language. 64.Nm 65normally writes the parse tables and the driver routine to the file 66.Pa y.tab.c . 67.Pp 68The following options are available: 69.Bl -tag -width Fl 70.It Fl b Ar file_prefix 71The 72.Fl b 73option changes the prefix prepended to the output file names to 74the string denoted by 75.Ar file_prefix . 76The default prefix is the character 77.Ql y . 78.It Fl B 79Create a backtracking parser (compile-type configuration for 80.Nm ) . 81.It Fl d 82causes the header file 83.Pa y.tab.h 84to be written. 85It contains 86.No #define Ns 's 87for the token identifiers. 88.It Fl h 89print a usage message to the standard error. 90.It Fl H Ar defines_file 91causes 92.No #define Ns 's 93for the token identifiers 94to be written to the given 95.Ar defines_file 96rather 97than the 98.Pa y.tab.h 99file used by the 100.Fl d 101option. 102.It Fl g 103The 104.Fl g 105option causes a graphical description of the generated 106.Tn LALR Ns (1) 107parser to be written to the file 108.Pa y.dot 109in graphviz format, ready to be processed by 110.Xr dot 1 . 111.It Fl i 112The 113.Fl i 114option causes a supplementary header file 115.Pa y.tab.i 116to be written. 117It contains extern declarations 118and supplementary 119.No #define Ns 's 120as needed to map the conventional 121.Nm 122.Va yy Ns \&-prefixed 123names to whatever the 124.Fl p 125option may specify. 126The code file, e.g., 127.Pa y.tab.c 128is modified to 129.No #include 130this file as well as the 131.Pa y.tab.h 132file, enforcing consistent usage of the symbols defined in those files. 133The supplementary header file makes it simpler to separate compilation 134of lex- and yacc-files. 135.It Fl l 136If the 137.Fl l 138option is not specified, 139.Nm 140will insert 141.No #line 142directives in the generated code. 143The 144.No #line 145directives let the C compiler relate errors in the 146generated code to the user's original code. 147If the 148.Fl l 149option is specified, 150.Nm 151will not insert the 152.No #line 153directives. 154.No #line 155directives specified by the user will be retained. 156.It Fl L 157Enable position processing, e.g., 158.Ql %locations 159(compile-type configuration for 160.Nm ) . 161.It Fl o Ar output_file 162specify the filename for the parser file. 163If this option is not given, the output filename is 164the file prefix concatenated with the file suffix, e.g. 165.Pa y.tab.c . 166This overrides the 167.Fl b 168option. 169.It Fl p Ar symbol_prefix 170The 171.Fl p 172option changes the prefix prepended to yacc-generated symbols to 173the string denoted by 174.Ar symbol_prefix . 175The default prefix is the string 176.Ql yy . 177.It Fl P 178create a reentrant parser, e.g., 179.Ql %pure-parser . 180.It Fl r 181The 182.Fl r 183option causes 184.Nm 185to produce separate files for code and tables. 186The code file is named 187.Pa y.code.c , 188and the tables file is named 189.Pa y.tab.c . 190The prefix 191.Ql y 192can be overridden using the 193.Fl b 194option. 195.It Fl s 196Suppress 197.No #define 198statements generated for string literals in a 199.Ql %token 200statement, to more closely match original 201.Nm 202behavior. 203.Pp 204Normally when 205.Nm 206sees a line such as 207.Pp 208.Dl %token OP_ADD \*qADD\*q 209.Pp 210it notices that the quoted 211.Dq ADD 212is a valid C identifier, and generates a 213.No #define 214not only for 215.Dv OP_ADD , 216but for 217.Dv ADD 218as well, 219e.g., 220.Bd -literal -offset indent 221#define OP_ADD 257 222#define ADD 258 223.Ed 224.Pp 225The original 226.Nm 227does not generate the second 228.No #define . 229The 230.Fl s 231option suppresses this 232.No #define . 233.Pp 234.St -p1003.1 235documents only names and numbers for 236.Ql %token , 237though the original 238.Nm 239and 240.Xr bison 1 241also accept string literals. 242.It Fl t 243The 244.Fl t 245option changes the preprocessor directives generated by 246.Nm 247so that debugging statements will be incorporated in the compiled code. 248.It Fl v 249The 250.Fl v 251option causes a human-readable description of the generated parser to 252be written to the file 253.Pa y.output . 254.It Fl V 255The 256.Fl V 257print the version number to the standard output. 258.It Fl y 259.Nm 260ignores this option, 261which 262.Xr bison 1 263supports for ostensible POSIX compatibility. 264.El 265.Pp 266The filename parameter is not optional. 267However, 268.Nm 269accepts a single 270.Dq \&- 271to read the grammar from the standard input. 272A double 273.Dq \&-- 274marker denotes the end of options. 275A single filename parameter is expected after a 276.Dq \&-- 277marker. 278.Sh EXTENSIONS 279.Nm 280provides some extensions for 281compatibility with 282.Xr bison 1 283and other implementations of yacc. 284It accepts several 285.Ql long options 286which have equivalents in 287.Nm . 288The 289.Ql %destructor 290and 291.Ql %locations 292features are available only if 293.Nm yacc 294has been configured and compiled to support the back-tracking 295.Aq ( btyacc ) 296functionality. 297The remaining features are always available: 298.Bl -tag -width Fl 299.It Ic %code Ar keyword { Ar code Ic } 300Adds the indicated source code at a given point in the output 301file. 302The optional 303.Ar keyword 304tells yacc where to insert the 305.Ar code : 306.Bl -tag -width Fl 307.It Ic top 308just after the version-definition in the generated code-file. 309.It Ic requires 310just after the declaration of public parser variables. 311If the 312.Fl d 313option is given, the code is inserted at the beginning of the 314.Ar defines_file . 315.It Ic provides 316just after the declaration of private parser variables. 317If the 318.Fl d 319option is given, the code is inserted at the end of the 320.Ar defines_file . 321.El 322.Pp 323If no 324.Ar keyword 325is given, the code is inserted at the beginning of 326the section of code copied verbatim from the source file. 327Multiple 328.Ar %code 329directives may be given; 330.Nm 331inserts those into the corresponding code- or defines_file in the order that 332they appear in the source file. 333.It Ic %debug 334This has the same effect as the 335.Fl t 336command-line option. 337.It Ic %destructor { Ar code Ic } Ar symbol Ns + 338defines code that is invoked when a symbol is automatically 339discarded during error recovery. 340This code can be used to 341reclaim dynamically allocated memory associated with the corresponding 342semantic value for cases where user actions cannot manage the memory 343explicitly. 344.Pp 345On encountering a parse error, the generated parser 346discards symbols on the stack and input tokens until it reaches a state 347that will allow parsing to continue. 348This error recovery approach results in a memory leak 349if the 350.Vt YYSTYPE 351value is, or contains, pointers to dynamically allocated memory. 352.Pp 353The bracketed 354.Ar code 355is invoked whenever the parser discards one of the symbols. 356Within it 357.Sq Li $$ 358or 359.Sq Li $\*[Lt] Ns Ar tag Ns Li \*[Gt]$ 360designates the semantic value associated with the discarded symbol, and 361.Sq Li @$ 362designates its location (see 363.Ql %locations 364directive). 365.Pp 366A per-symbol destructor is defined by listing a grammar symbol 367in 368.Ar symbol Ns + . 369A per-type destructor is defined by listing a semantic type tag (e.g., 370.Sq Li \*[Lt] Ns Ar some_tag Ns Li \*[Gt] ) 371in 372.Ar symbol Ns + ; 373in this case, the parser will invoke 374.Ar code 375whenever it discards any grammar symbol that has that semantic type tag, 376unless that symbol has its own per-symbol destructor. 377.Pp 378Two categories of default destructor are supported that are 379invoked when discarding any grammar symbol that has no per-symbol and no 380per-type destructor: 381.Bl -bullet 382.It 383The code for 384.Sq Li \*[Lt]*\*[Gt] 385is used 386for grammar symbols that have an explicitly declared semantic type tag 387(via 388.Ql %type ) ; 389.It 390The code for 391.Sq Li \*[Lt]\*[Gt] 392is used for grammar symbols that have no declared semantic type tag. 393.El 394.It Ic %empty 395ignored by 396.Nm . 397.It Ic %expect Ar number 398tells 399.Nm 400the expected number of shift/reduce conflicts. 401That makes it only report the number if it differs. 402.It Ic %expect-rr Ar number 403tell 404.Nm 405the expected number of reduce/reduce conflicts. 406That makes it only report the number if it differs. 407This is, unlike 408.Xr bison 1 , 409allowable in 410.Tn LALR Ns (1) 411parsers. 412.It Ic %locations 413Tell 414.Nm 415to enable management of position information associated with each token, 416provided by the lexer in the global variable 417.Va yylloc , 418similar to management of semantic value information provided in 419.Va yylval . 420.Pp 421As for semantic values, locations can be referenced within actions using 422.Sq Li @$ 423to refer to the location of the left hand side symbol, and 424.Sq Li @ Ns Ar N\| 425.Ar ( N 426an integer) to refer to the location of one of the right hand side 427symbols. 428Also as for semantic values, when a rule is matched, a default 429action is used the compute the location represented by 430.Sq Li @$ 431as the beginning of the first symbol and the end of the last symbol 432in the right hand side of the rule. 433This default computation can be overridden by 434explicit assignment to 435.Sq Li @$ 436in a rule action. 437.Pp 438The type of 439.Va yylloc 440is 441.Vt YYLTYPE , 442which is defined by default as: 443.Bd -literal -offset indent 444typedef struct YYLTYPE { 445 int first_line; 446 int first_column; 447 int last_line; 448 int last_column; 449} YYLTYPE; 450.Ed 451.Pp 452.Vt YYLTYPE 453can be redefined by the user 454.Dv ( YYLTYPE_IS_DEFINED 455must be defined, to inhibit the default) 456in the declarations section of the specification file. 457As in 458.Xr bison 1 , 459the macro 460.Dv YYLLOC_DEFAULT 461is invoked each time a rule is matched to calculate a position for the 462left hand side of the rule, before the associated action is executed; 463this macro can be redefined by the user. 464.Pp 465This directive adds a 466.Vt YYLTYPE 467parameter to 468.Fn yyerror . 469If the 470.Ql %pure-parser 471directive is present, 472a 473.Vt YYLTYPE 474parameter is added to 475.Fn yylex 476calls. 477.It Ic %lex-param { Ar argument-declaration Ic } 478By default, the lexer accepts no parameters, e.g., 479.Fn yylex . 480Use this directive to add parameter declarations for your customized lexer. 481.It Ic %parse-param { Ar argument-declaration Ic } 482By default, the parser accepts no parameters, e.g., 483.Fn yyparse . 484Use this directive to add parameter declarations for your customized parser. 485.It Ic %pure-parser 486Most variables (other than 487.Va yydebug 488and 489.Va yynerrs ) 490are allocated on the stack within 491.Fn yyparse , 492making the parser reasonably reentrant. 493.It Ic %token-table 494Make the parser's names for tokens available in the 495.Va yytname 496array. 497However, 498.Nm 499yacc 500does not predefine 501.Dq $end , 502.Dq $error 503or 504.Dq $undefined 505in this array. 506.El 507.Sh PORTABILITY 508According to Robert Corbett: 509.Bd -filled -offset indent 510Berkeley Yacc is an 511.Tn LALR Ns (1) 512parser generator. 513Berkeley Yacc has been made as compatible as possible with 514.Tn AT\*[Am]T 515Yacc. 516Berkeley Yacc can accept any input specification that conforms to the 517.Tn AT\*[Am]T 518Yacc documentation. 519Specifications that take advantage of undocumented features of 520.Tn AT\*[Am]T 521Yacc will probably be rejected. 522.Ed 523.Pp 524The rationale in 525.%U http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html 526documents some features of 527.Tn AT\*[Am]T 528yacc which are no longer required for POSIX compliance. 529.Pp 530That said, you may be interested in reusing grammar files with some 531other implementation which is not strictly compatible with 532.Tn AT\*[Am]T 533yacc. 534For instance, there is 535.Xr bison 1 . 536Here are a few differences: 537.Bl -bullet 538.It 539.Nm 540accepts an equals mark preceding the left curly brace 541of an action (as in the original grammar file 542.Dv ftp.y ) : 543.Bd -literal -offset indent 544 | STAT CRLF 545 = { 546 statcmd(); 547 } 548.Ed 549.It 550.Nm 551and 552.Xr bison 1 553emit code in different order, and in particular 554.Xr bison 1 555makes forward reference to common functions such as 556.Fn yylex , 557.Fn yyparse 558and 559.Fn yyerror 560without providing prototypes. 561.It 562.Xr bison 1 563support for 564.Ql %expect 565is broken in more than one release. 566For best results using 567.Xr bison 1 , 568delete that directive. 569.It 570.Xr bison 1 571has no equivalent for some of 572.Nm Ns 's 573command-line options, relying on directives embedded in the grammar file. 574.It 575.Xr bison 1 576.Fl y 577option does not affect bison's lack of support for 578features of AT\*[Am]T yacc which were deemed obsolescent. 579.It 580.Nm 581accepts multiple parameters with 582.Ql %lex-param 583and 584.Ql %parse-param 585in two forms 586.Bd -literal -offset indent 587{type1 name1} {type2 name2} ... 588{type1 name1, type2 name2 ...} 589.Ed 590.Pp 591.Xr bison 1 592accepts the latter (though undocumented), but depending on the 593release may generate bad code. 594.It 595Like 596.Xr bison 1 , 597.Nm 598will add parameters specified via 599.Ql %parse-param 600to 601.Fn yyparse , 602.Fn yyerror 603and (if configured for back-tracking) 604to the destructor declared using 605.Ql %destructor . 606.Pp 607.Xr bison 1 608puts the additional parameters 609.Em first 610for 611.Fn yyparse 612and 613.Fn yyerror 614but 615.Em last 616for destructors. 617.Nm 618matches this behavior. 619.El 620.Sh ENVIRONMENT 621The following environment variable is referenced by 622.Nm : 623.Bl -tag -width TMPDIR 624.It Ev TMPDIR 625If the environment variable 626.Ev TMPDIR 627is set, the string denoted by 628.Ev TMPDIR 629will be used as the name of the directory where the temporary 630files are created. 631.El 632.Sh TABLES 633The names of the tables generated by this version of 634.Nm 635are 636.Va yylhs , 637.Va yylen , 638.Va yydefred , 639.Va yydgoto , 640.Va yysindex , 641.Va yyrindex , 642.Va yygindex , 643.Va yytable , 644and 645.Va yycheck . 646Two additional tables, 647.Va yyname 648and 649.Va yyrule , 650are created if 651.Dv YYDEBUG 652is defined and non-zero. 653.Sh FILES 654.Bl -tag -compact 655.It Pa y.code.c 656.It Pa y.tab.c 657.It Pa y.tab.h 658.It Pa y.output 659.It Pa /tmp/yacc.aXXXXXX 660.It Pa /tmp/yacc.tXXXXXX 661.It Pa /tmp/yacc.uXXXXXX 662.El 663.Sh DIAGNOSTICS 664If there are rules that are never reduced, the number of such rules is 665written to the standard error. 666If there are any 667.Tn LALR Ns (1) 668conflicts, the number of conflicts is also written 669to the standard error. 670.\" .Sh SEE ALSO 671.Xr flex 1 , 672.Xr lex 1 673.\" .Xr yyfix 1 674.Sh STANDARDS 675The 676.Nm 677utility conforms to 678.St -p1003.2 . 679