1.\"	$NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $
2.\"
3.\" Copyright (c) 1989, 1990 The Regents of the University of California.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to Berkeley by
7.\" Robert Paul Corbett.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. Neither the name of the University nor the names of its contributors
18.\"    may be used to endorse or promote products derived from this software
19.\"    without specific prior written permission.
20.\"
21.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.\"	from: @(#)yacc.1	5.7 (Berkeley) 7/30/91
34.\"	from: Id: yacc.1,v 1.24 2014/10/06 00:03:48 tom Exp
35.\"	$NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $
36.\"
37.Dd September 14, 2024
38.Dt YACC 1
39.Os
40.Sh NAME
41.Nm yacc
42.Nd an
43.Tn LALR Ns (1)
44parser generator
45.Sh SYNOPSIS
46.Nm
47.Op Fl BdhgilLPrtvVy
48.Op Fl b Ar file_prefix
49.Op Fl H Ar defines_file
50.Op Fl o Ar output_file
51.Op Fl p Ar symbol_prefix
52.Ar filename
53.Sh DESCRIPTION
54.Nm
55reads the grammar specification in the file
56.Ar filename
57and generates an
58.Tn LALR Ns (1)
59parser for it.
60The parsers consist of a set of
61.Tn LALR Ns (1)
62parsing tables and a driver routine
63written in the C programming language.
64.Nm
65normally writes the parse tables and the driver routine to the file
66.Pa y.tab.c .
67.Pp
68The following options are available:
69.Bl -tag -width Fl
70.It Fl b Ar file_prefix
71The
72.Fl b
73option changes the prefix prepended to the output file names to
74the string denoted by
75.Ar file_prefix .
76The default prefix is the character
77.Ql y .
78.It Fl B
79Create a backtracking parser (compile-type configuration for
80.Nm ) .
81.It Fl d
82causes the header file
83.Pa y.tab.h
84to be written.
85It contains
86.No #define Ns 's
87for the token identifiers.
88.It Fl h
89print a usage message to the standard error.
90.It Fl H Ar defines_file
91causes
92.No #define Ns 's
93for the token identifiers
94to be written to the given
95.Ar defines_file
96rather
97than the
98.Pa y.tab.h
99file used by the
100.Fl d
101option.
102.It Fl g
103The
104.Fl g
105option causes a graphical description of the generated
106.Tn LALR Ns (1)
107parser to be written to the file
108.Pa y.dot
109in graphviz format, ready to be processed by
110.Xr dot 1 .
111.It Fl i
112The
113.Fl i
114option causes a supplementary header file
115.Pa y.tab.i
116to be written.
117It contains extern declarations
118and supplementary
119.No #define Ns 's
120as needed to map the conventional
121.Nm
122.Va yy Ns \&-prefixed
123names to whatever the
124.Fl p
125option may specify.
126The code file, e.g.,
127.Pa y.tab.c
128is modified to
129.No #include
130this file as well as the
131.Pa y.tab.h
132file, enforcing consistent usage of the symbols defined in those files.
133The supplementary header file makes it simpler to separate compilation
134of lex- and yacc-files.
135.It Fl l
136If the
137.Fl l
138option is not specified,
139.Nm
140will insert
141.No #line
142directives in the generated code.
143The
144.No #line
145directives let the C compiler relate errors in the
146generated code to the user's original code.
147If the
148.Fl l
149option is specified,
150.Nm
151will not insert the
152.No #line
153directives.
154.No #line
155directives specified by the user will be retained.
156.It Fl L
157Enable position processing, e.g.,
158.Ql %locations
159(compile-type configuration for
160.Nm ) .
161.It Fl o Ar output_file
162specify the filename for the parser file.
163If this option is not given, the output filename is
164the file prefix concatenated with the file suffix, e.g.
165.Pa y.tab.c .
166This overrides the
167.Fl b
168option.
169.It Fl p Ar symbol_prefix
170The
171.Fl p
172option changes the prefix prepended to yacc-generated symbols to
173the string denoted by
174.Ar symbol_prefix .
175The default prefix is the string
176.Ql yy .
177.It Fl P
178create a reentrant parser, e.g.,
179.Ql %pure-parser .
180.It Fl r
181The
182.Fl r
183option causes
184.Nm
185to produce separate files for code and tables.
186The code file is named
187.Pa y.code.c ,
188and the tables file is named
189.Pa y.tab.c .
190The prefix
191.Ql y
192can be overridden using the
193.Fl b
194option.
195.It Fl s
196Suppress
197.No #define
198statements generated for string literals in a
199.Ql %token
200statement, to more closely match original
201.Nm
202behavior.
203.Pp
204Normally when
205.Nm
206sees a line such as
207.Pp
208.Dl %token OP_ADD \*qADD\*q
209.Pp
210it notices that the quoted
211.Dq ADD
212is a valid C identifier, and generates a
213.No #define
214not only for
215.Dv OP_ADD ,
216but for
217.Dv ADD
218as well,
219e.g.,
220.Bd -literal -offset indent
221#define OP_ADD 257
222#define ADD 258
223.Ed
224.Pp
225The original
226.Nm
227does not generate the second
228.No #define .
229The
230.Fl s
231option suppresses this
232.No #define .
233.Pp
234.St -p1003.1
235documents only names and numbers for
236.Ql %token ,
237though the original
238.Nm
239and
240.Xr bison 1
241also accept string literals.
242.It Fl t
243The
244.Fl t
245option changes the preprocessor directives generated by
246.Nm
247so that debugging statements will be incorporated in the compiled code.
248.It Fl v
249The
250.Fl v
251option causes a human-readable description of the generated parser to
252be written to the file
253.Pa y.output .
254.It Fl V
255The
256.Fl V
257print the version number to the standard output.
258.It Fl y
259.Nm
260ignores this option,
261which
262.Xr bison 1
263supports for ostensible POSIX compatibility.
264.El
265.Pp
266The filename parameter is not optional.
267However,
268.Nm
269accepts a single
270.Dq \&-
271to read the grammar from the standard input.
272A double
273.Dq \&--
274marker denotes the end of options.
275A single filename  parameter  is  expected after a
276.Dq \&--
277marker.
278.Sh EXTENSIONS
279.Nm
280provides some extensions for
281compatibility with
282.Xr bison 1
283and other implementations of yacc.
284It accepts several
285.Ql long options
286which have equivalents in
287.Nm .
288The
289.Ql %destructor
290and
291.Ql %locations
292features are available only if
293.Nm yacc
294has been configured and compiled to support the back-tracking
295.Aq ( btyacc )
296functionality.
297The remaining features are always available:
298.Bl -tag -width Fl
299.It Ic %code Ar keyword { Ar code Ic }
300Adds the indicated source code at a given point in the output
301file.
302The optional
303.Ar keyword
304tells yacc where to insert the
305.Ar code :
306.Bl -tag -width Fl
307.It Ic top
308just after the version-definition in  the  generated  code-file.
309.It Ic requires
310just after the declaration of public parser variables.
311If the
312.Fl d
313option is given, the code is inserted at the beginning of the
314.Ar defines_file .
315.It Ic provides
316just after the declaration of private parser variables.
317If the
318.Fl d
319option is given, the code is inserted at the end  of the
320.Ar defines_file .
321.El
322.Pp
323If no
324.Ar keyword
325is given, the code is inserted at the beginning of
326the section of code copied verbatim from the source file.
327Multiple
328.Ar %code
329directives may be given;
330.Nm
331inserts those into the corresponding code- or defines_file in the order that
332they appear in the source file.
333.It Ic %debug
334This has the same effect as the
335.Fl t
336command-line option.
337.It Ic %destructor { Ar code Ic } Ar symbol Ns +
338defines code that is invoked when a symbol is automatically
339discarded during error recovery.
340This code can be used to
341reclaim dynamically allocated memory associated with the corresponding
342semantic value for cases where user actions cannot manage the memory
343explicitly.
344.Pp
345On encountering a parse error, the generated parser
346discards symbols on the stack and input tokens until it reaches a state
347that will allow parsing to continue.
348This error recovery approach results in a memory leak
349if the
350.Vt YYSTYPE
351value is, or contains, pointers to dynamically allocated memory.
352.Pp
353The bracketed
354.Ar code
355is invoked whenever the parser discards one of the symbols.
356Within it
357.Sq Li $$
358or
359.Sq Li $\*[Lt] Ns Ar tag Ns Li \*[Gt]$
360designates the semantic value associated with the discarded symbol, and
361.Sq Li @$
362designates its location (see
363.Ql %locations
364directive).
365.Pp
366A per-symbol destructor is defined by listing a grammar symbol
367in
368.Ar symbol Ns + .
369A per-type destructor is defined  by listing a semantic type tag (e.g.,
370.Sq Li \*[Lt] Ns Ar some_tag Ns Li \*[Gt] )
371in
372.Ar symbol Ns + ;
373in this case, the parser will invoke
374.Ar code
375whenever it discards any grammar symbol that has that semantic type tag,
376unless that symbol has its own per-symbol destructor.
377.Pp
378Two categories of default destructor are supported that are
379invoked when discarding any grammar symbol that has no per-symbol and no
380per-type destructor:
381.Bl -bullet
382.It
383The code for
384.Sq Li \*[Lt]*\*[Gt]
385is used
386for grammar symbols that have an explicitly declared semantic type tag
387(via
388.Ql %type ) ;
389.It
390The code for
391.Sq Li \*[Lt]\*[Gt]
392is used for grammar symbols that have no declared semantic type tag.
393.El
394.It Ic %empty
395ignored by
396.Nm .
397.It Ic %expect Ar number
398tells
399.Nm
400the expected number of shift/reduce conflicts.
401That makes it only report the number if it differs.
402.It Ic %expect-rr Ar number
403tell
404.Nm
405the expected number of reduce/reduce conflicts.
406That makes it only report the number if it differs.
407This is, unlike
408.Xr bison 1 ,
409allowable in
410.Tn LALR Ns (1)
411parsers.
412.It Ic %locations
413Tell
414.Nm
415to enable  management of position information associated with each token,
416provided by the lexer in the global variable
417.Va yylloc ,
418similar to management of semantic value information provided in
419.Va yylval .
420.Pp
421As for semantic values, locations can be referenced within actions using
422.Sq Li @$
423to refer to the location of the left hand side symbol, and
424.Sq Li @ Ns Ar N\|
425.Ar ( N
426an integer) to refer to the location of one of the right hand side
427symbols.
428Also as for semantic values, when a rule is matched, a default
429action is used the compute the location represented by
430.Sq Li @$
431as the beginning of the first symbol and the end of the last symbol
432in the right hand side of the rule.
433This default computation can be overridden by
434explicit assignment to
435.Sq Li @$
436in a rule action.
437.Pp
438The type of
439.Va yylloc
440is
441.Vt YYLTYPE ,
442which is defined by default as:
443.Bd -literal -offset indent
444typedef struct YYLTYPE {
445    int first_line;
446    int first_column;
447    int last_line;
448    int last_column;
449} YYLTYPE;
450.Ed
451.Pp
452.Vt YYLTYPE
453can be redefined by the user
454.Dv ( YYLTYPE_IS_DEFINED
455must be defined, to inhibit the default)
456in the declarations section of the specification file.
457As in
458.Xr bison 1 ,
459the macro
460.Dv YYLLOC_DEFAULT
461is invoked each time a rule is matched to calculate a position for the
462left hand side of the rule, before the associated action is executed;
463this macro can be redefined by the user.
464.Pp
465This directive adds a
466.Vt YYLTYPE
467parameter to
468.Fn yyerror .
469If the
470.Ql %pure-parser
471directive is present,
472a
473.Vt YYLTYPE
474parameter is added to
475.Fn yylex
476calls.
477.It Ic %lex-param { Ar argument-declaration Ic }
478By default, the lexer accepts no parameters, e.g.,
479.Fn yylex .
480Use this directive to add parameter declarations for your customized lexer.
481.It Ic %parse-param { Ar argument-declaration Ic }
482By default, the parser accepts no parameters, e.g.,
483.Fn yyparse .
484Use this directive to add parameter declarations for your customized parser.
485.It Ic %pure-parser
486Most variables (other than
487.Va yydebug
488and
489.Va yynerrs )
490are allocated on the stack within
491.Fn yyparse ,
492making the parser reasonably reentrant.
493.It Ic %token-table
494Make the parser's names for tokens available in the
495.Va yytname
496array.
497However,
498.Nm
499yacc
500does not predefine
501.Dq $end ,
502.Dq $error
503or
504.Dq $undefined
505in this array.
506.El
507.Sh PORTABILITY
508According to Robert Corbett:
509.Bd -filled -offset indent
510Berkeley Yacc is an
511.Tn LALR Ns (1)
512parser generator.
513Berkeley Yacc has been made as compatible as possible with
514.Tn AT\*[Am]T
515Yacc.
516Berkeley Yacc can accept any input specification that conforms to the
517.Tn AT\*[Am]T
518Yacc documentation.
519Specifications that take advantage of undocumented features of
520.Tn AT\*[Am]T
521Yacc will probably be rejected.
522.Ed
523.Pp
524The rationale in
525.%U http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
526documents some features of
527.Tn AT\*[Am]T
528yacc which are no longer required for POSIX compliance.
529.Pp
530That said, you may be interested in reusing grammar files with some
531other implementation which is not strictly compatible with
532.Tn AT\*[Am]T
533yacc.
534For instance, there is
535.Xr bison 1 .
536Here are a few differences:
537.Bl -bullet
538.It
539.Nm
540accepts an equals mark preceding the left curly brace
541of an action (as in the original grammar file
542.Dv ftp.y ) :
543.Bd -literal -offset indent
544    |	STAT CRLF
545	= {
546		statcmd();
547	}
548.Ed
549.It
550.Nm
551and
552.Xr bison 1
553emit code in different order, and in particular
554.Xr bison 1
555makes forward reference to common functions such as
556.Fn yylex ,
557.Fn yyparse
558and
559.Fn yyerror
560without providing prototypes.
561.It
562.Xr bison 1
563support for
564.Ql %expect
565is broken in more than one release.
566For best results using
567.Xr bison 1 ,
568delete that directive.
569.It
570.Xr bison 1
571has no equivalent for some of
572.Nm Ns 's
573command-line options, relying on directives embedded in the grammar file.
574.It
575.Xr bison 1
576.Fl y
577option does not affect bison's lack of support for
578features of AT\*[Am]T yacc which were deemed obsolescent.
579.It
580.Nm
581accepts multiple parameters with
582.Ql %lex-param
583and
584.Ql %parse-param
585in two forms
586.Bd -literal -offset indent
587{type1 name1} {type2 name2} ...
588{type1 name1,  type2 name2 ...}
589.Ed
590.Pp
591.Xr bison 1
592accepts the latter (though undocumented), but depending on the
593release may generate bad code.
594.It
595Like
596.Xr bison 1 ,
597.Nm
598will add parameters specified via
599.Ql %parse-param
600to
601.Fn yyparse ,
602.Fn yyerror
603and (if configured for back-tracking)
604to the destructor declared using
605.Ql %destructor .
606.Pp
607.Xr bison 1
608puts the additional parameters
609.Em first
610for
611.Fn yyparse
612and
613.Fn yyerror
614but
615.Em last
616for destructors.
617.Nm
618matches this behavior.
619.El
620.Sh ENVIRONMENT
621The following environment variable is referenced by
622.Nm :
623.Bl -tag -width TMPDIR
624.It Ev TMPDIR
625If the environment variable
626.Ev TMPDIR
627is set, the string denoted by
628.Ev TMPDIR
629will be used as the name of the directory where the temporary
630files are created.
631.El
632.Sh TABLES
633The names of the tables generated by this version of
634.Nm
635are
636.Va yylhs ,
637.Va yylen ,
638.Va yydefred ,
639.Va yydgoto ,
640.Va yysindex ,
641.Va yyrindex ,
642.Va yygindex ,
643.Va yytable ,
644and
645.Va yycheck .
646Two additional tables,
647.Va yyname
648and
649.Va yyrule ,
650are created if
651.Dv YYDEBUG
652is defined and non-zero.
653.Sh FILES
654.Bl -tag -compact
655.It Pa y.code.c
656.It Pa y.tab.c
657.It Pa y.tab.h
658.It Pa y.output
659.It Pa /tmp/yacc.aXXXXXX
660.It Pa /tmp/yacc.tXXXXXX
661.It Pa /tmp/yacc.uXXXXXX
662.El
663.Sh DIAGNOSTICS
664If there are rules that are never reduced, the number of such rules is
665written to the standard error.
666If there are any
667.Tn LALR Ns (1)
668conflicts, the number of conflicts is also written
669to the standard error.
670.\" .Sh SEE ALSO
671.Xr flex 1 ,
672.Xr lex 1
673.\" .Xr yyfix 1
674.Sh STANDARDS
675The
676.Nm
677utility conforms to
678.St -p1003.2 .
679