1.\" Copyright (c) 2000 Jonathan Lemon
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: stable/10/lib/libc/sys/kqueue.2 321829 2017-07-31 22:36:03Z asomers $
26.\"
27.Dd June 22, 2017
28.Dt KQUEUE 2
29.Os
30.Sh NAME
31.Nm kqueue ,
32.Nm kevent
33.Nd kernel event notification mechanism
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/event.h
39.In sys/time.h
40.Ft int
41.Fn kqueue "void"
42.Ft int
43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
44.Fn EV_SET "kev" ident filter flags fflags data udata
45.Sh DESCRIPTION
46The
47.Fn kqueue
48system call
49provides a generic method of notifying the user when an event
50happens or a condition holds, based on the results of small
51pieces of kernel code termed filters.
52A kevent is identified by the (ident, filter) pair; there may only
53be one unique kevent per kqueue.
54.Pp
55The filter is executed upon the initial registration of a kevent
56in order to detect whether a preexisting condition is present, and is also
57executed whenever an event is passed to the filter for evaluation.
58If the filter determines that the condition should be reported,
59then the kevent is placed on the kqueue for the user to retrieve.
60.Pp
61The filter is also run when the user attempts to retrieve the kevent
62from the kqueue.
63If the filter indicates that the condition that triggered
64the event no longer holds, the kevent is removed from the kqueue and
65is not returned.
66.Pp
67Multiple events which trigger the filter do not result in multiple
68kevents being placed on the kqueue; instead, the filter will aggregate
69the events into a single struct kevent.
70Calling
71.Fn close
72on a file descriptor will remove any kevents that reference the descriptor.
73.Pp
74The
75.Fn kqueue
76system call
77creates a new kernel event queue and returns a descriptor.
78The queue is not inherited by a child created with
79.Xr fork 2 .
80However, if
81.Xr rfork 2
82is called without the
83.Dv RFFDG
84flag, then the descriptor table is shared,
85which will allow sharing of the kqueue between two processes.
86.Pp
87The
88.Fn kevent
89system call
90is used to register events with the queue, and return any pending
91events to the user.
92The
93.Fa changelist
94argument
95is a pointer to an array of
96.Va kevent
97structures, as defined in
98.In sys/event.h .
99All changes contained in the
100.Fa changelist
101are applied before any pending events are read from the queue.
102The
103.Fa nchanges
104argument
105gives the size of
106.Fa changelist .
107The
108.Fa eventlist
109argument
110is a pointer to an array of kevent structures.
111The
112.Fa nevents
113argument
114determines the size of
115.Fa eventlist .
116When
117.Fa nevents
118is zero,
119.Fn kevent
120will return immediately even if there is a
121.Fa timeout
122specified unlike
123.Xr select 2 .
124If
125.Fa timeout
126is a non-NULL pointer, it specifies a maximum interval to wait
127for an event, which will be interpreted as a struct timespec.
128If
129.Fa timeout
130is a NULL pointer,
131.Fn kevent
132waits indefinitely.
133To effect a poll, the
134.Fa timeout
135argument should be non-NULL, pointing to a zero-valued
136.Va timespec
137structure.
138The same array may be used for the
139.Fa changelist
140and
141.Fa eventlist .
142.Pp
143The
144.Fn EV_SET
145macro is provided for ease of initializing a
146kevent structure.
147.Pp
148The
149.Va kevent
150structure is defined as:
151.Bd -literal
152struct kevent {
153	uintptr_t ident;	/* identifier for this event */
154	short	  filter;	/* filter for event */
155	u_short	  flags;	/* action flags for kqueue */
156	u_int	  fflags;	/* filter flag value */
157	intptr_t  data;		/* filter data value */
158	void	  *udata;	/* opaque user data identifier */
159};
160.Ed
161.Pp
162The fields of
163.Fa struct kevent
164are:
165.Bl -tag -width "Fa filter"
166.It Fa ident
167Value used to identify this event.
168The exact interpretation is determined by the attached filter,
169but often is a file descriptor.
170.It Fa filter
171Identifies the kernel filter used to process this event.
172The pre-defined
173system filters are described below.
174.It Fa flags
175Actions to perform on the event.
176.It Fa fflags
177Filter-specific flags.
178.It Fa data
179Filter-specific data value.
180.It Fa udata
181Opaque user-defined value passed through the kernel unchanged.
182.El
183.Pp
184The
185.Va flags
186field can contain the following values:
187.Bl -tag -width EV_DISPATCH
188.It Dv EV_ADD
189Adds the event to the kqueue.
190Re-adding an existing event
191will modify the parameters of the original event, and not result
192in a duplicate entry.
193Adding an event automatically enables it,
194unless overridden by the EV_DISABLE flag.
195.It Dv EV_ENABLE
196Permit
197.Fn kevent
198to return the event if it is triggered.
199.It Dv EV_DISABLE
200Disable the event so
201.Fn kevent
202will not return it.
203The filter itself is not disabled.
204.It Dv EV_DISPATCH
205Disable the event source immediately after delivery of an event.
206See
207.Dv EV_DISABLE
208above.
209.It Dv EV_DELETE
210Removes the event from the kqueue.
211Events which are attached to
212file descriptors are automatically deleted on the last close of
213the descriptor.
214.It Dv EV_RECEIPT
215This flag is useful for making bulk changes to a kqueue without draining
216any pending events.
217When passed as input, it forces
218.Dv EV_ERROR
219to always be returned.
220When a filter is successfully added the
221.Va data
222field will be zero.
223.It Dv EV_ONESHOT
224Causes the event to return only the first occurrence of the filter
225being triggered.
226After the user retrieves the event from the kqueue,
227it is deleted.
228.It Dv EV_CLEAR
229After the event is retrieved by the user, its state is reset.
230This is useful for filters which report state transitions
231instead of the current state.
232Note that some filters may automatically
233set this flag internally.
234.It Dv EV_EOF
235Filters may set this flag to indicate filter-specific EOF condition.
236.It Dv EV_ERROR
237See
238.Sx RETURN VALUES
239below.
240.El
241.Pp
242The predefined system filters are listed below.
243Arguments may be passed to and from the filter via the
244.Va fflags
245and
246.Va data
247fields in the kevent structure.
248.Bl -tag -width "Dv EVFILT_PROCDESC"
249.It Dv EVFILT_READ
250Takes a descriptor as the identifier, and returns whenever
251there is data available to read.
252The behavior of the filter is slightly different depending
253on the descriptor type.
254.Bl -tag -width 2n
255.It Sockets
256Sockets which have previously been passed to
257.Fn listen
258return when there is an incoming connection pending.
259.Va data
260contains the size of the listen backlog.
261.Pp
262Other socket descriptors return when there is data to be read,
263subject to the
264.Dv SO_RCVLOWAT
265value of the socket buffer.
266This may be overridden with a per-filter low water mark at the
267time the filter is added by setting the
268.Dv NOTE_LOWAT
269flag in
270.Va fflags ,
271and specifying the new low water mark in
272.Va data .
273On return,
274.Va data
275contains the number of bytes of protocol data available to read.
276.Pp
277If the read direction of the socket has shutdown, then the filter
278also sets
279.Dv EV_EOF
280in
281.Va flags ,
282and returns the socket error (if any) in
283.Va fflags .
284It is possible for EOF to be returned (indicating the connection is gone)
285while there is still data pending in the socket buffer.
286.It Vnodes
287Returns when the file pointer is not at the end of file.
288.Va data
289contains the offset from current position to end of file,
290and may be negative.
291.It "Fifos, Pipes"
292Returns when the there is data to read;
293.Va data
294contains the number of bytes available.
295.Pp
296When the last writer disconnects, the filter will set
297.Dv EV_EOF
298in
299.Va flags .
300This may be cleared by passing in
301.Dv EV_CLEAR ,
302at which point the
303filter will resume waiting for data to become available before
304returning.
305.It "BPF devices"
306Returns when the BPF buffer is full, the BPF timeout has expired, or
307when the BPF has
308.Dq immediate mode
309enabled and there is any data to read;
310.Va data
311contains the number of bytes available.
312.El
313.It Dv EVFILT_WRITE
314Takes a descriptor as the identifier, and returns whenever
315it is possible to write to the descriptor.
316For sockets, pipes
317and fifos,
318.Va data
319will contain the amount of space remaining in the write buffer.
320The filter will set EV_EOF when the reader disconnects, and for
321the fifo case, this may be cleared by use of
322.Dv EV_CLEAR .
323Note that this filter is not supported for vnodes or BPF devices.
324.Pp
325For sockets, the low water mark and socket error handling is
326identical to the
327.Dv EVFILT_READ
328case.
329.It Dv EVFILT_AIO
330Events for this filter are not registered with
331.Fn kevent
332directly but are registered via the
333.Va aio_sigevent
334member of an asychronous I/O request when it is scheduled via an asychronous I/O
335system call such as
336.Fn aio_read .
337The filter returns under the same conditions as
338.Fn aio_error .
339For more details on this filter see
340.Xr sigevent 3 and
341.Xr aio 4 .
342.It Dv EVFILT_VNODE
343Takes a file descriptor as the identifier and the events to watch for in
344.Va fflags ,
345and returns when one or more of the requested events occurs on the descriptor.
346The events to monitor are:
347.Bl -tag -width "Dv NOTE_CLOSE_WRITE"
348.It Dv NOTE_ATTRIB
349The file referenced by the descriptor had its attributes changed.
350.It Dv NOTE_CLOSE
351A file descriptor referencing the monitored file, was closed.
352The closed file descriptor did not have write access.
353.It Dv NOTE_CLOSE_WRITE
354A file descriptor referencing the monitored file, was closed.
355The closed file descriptor had write access.
356.Pp
357This note, as well as
358.Dv NOTE_CLOSE ,
359are not activated when files are closed forcibly by
360.Xr unmount 2 or
361.Xr revoke 2 .
362Instead,
363.Dv NOTE_REVOKE
364is sent for such events.
365.It Dv NOTE_DELETE
366The
367.Fn unlink
368system call was called on the file referenced by the descriptor.
369.It Dv NOTE_EXTEND
370For regular file, the file referenced by the descriptor was extended.
371.Pp
372For directory, reports that a directory entry was added or removed,
373as the result of rename operation.
374The
375.Dv NOTE_EXTEND
376event is not reported when a name is changed inside the directory.
377.It Dv NOTE_LINK
378The link count on the file changed.
379In particular, the
380.Dv NOTE_LINK
381event is reported if a subdirectory was created or deleted inside
382the directory referenced by the descriptor.
383.It Dv NOTE_OPEN
384The file referenced by the descriptor was opened.
385.It Dv NOTE_READ
386A read occurred on the file referenced by the descriptor.
387.It Dv NOTE_RENAME
388The file referenced by the descriptor was renamed.
389.It Dv NOTE_REVOKE
390Access to the file was revoked via
391.Xr revoke 2
392or the underlying file system was unmounted.
393.It Dv NOTE_WRITE
394A write occurred on the file referenced by the descriptor.
395.El
396.Pp
397On return,
398.Va fflags
399contains the events which triggered the filter.
400.It Dv EVFILT_PROC
401Takes the process ID to monitor as the identifier and the events to watch for
402in
403.Va fflags ,
404and returns when the process performs one or more of the requested events.
405If a process can normally see another process, it can attach an event to it.
406The events to monitor are:
407.Bl -tag -width "Dv NOTE_TRACKERR"
408.It Dv NOTE_EXIT
409The process has exited.
410The exit status will be stored in
411.Va data .
412.It Dv NOTE_FORK
413The process has called
414.Fn fork .
415.It Dv NOTE_EXEC
416The process has executed a new process via
417.Xr execve 2
418or a similar call.
419.It Dv NOTE_TRACK
420Follow a process across
421.Fn fork
422calls.
423The parent process registers a new kevent to monitor the child process
424using the same
425.Va fflags
426as the original event.
427The child process will signal an event with
428.Dv NOTE_CHILD
429set in
430.Va fflags
431and the parent PID in
432.Va data .
433.Pp
434If the parent process fails to register a new kevent
435.Pq usually due to resource limitations ,
436it will signal an event with
437.Dv NOTE_TRACKERR
438set in
439.Va fflags ,
440and the child process will not signal a
441.Dv NOTE_CHILD
442event.
443.El
444.Pp
445On return,
446.Va fflags
447contains the events which triggered the filter.
448.It Dv EVFILT_SIGNAL
449Takes the signal number to monitor as the identifier and returns
450when the given signal is delivered to the process.
451This coexists with the
452.Fn signal
453and
454.Fn sigaction
455facilities, and has a lower precedence.
456The filter will record
457all attempts to deliver a signal to a process, even if the signal has
458been marked as
459.Dv SIG_IGN ,
460except for the
461.Dv SIGCHLD
462signal, which, if ignored, won't be recorded by the filter.
463Event notification happens after normal
464signal delivery processing.
465.Va data
466returns the number of times the signal has occurred since the last call to
467.Fn kevent .
468This filter automatically sets the
469.Dv EV_CLEAR
470flag internally.
471.It Dv EVFILT_TIMER
472Establishes an arbitrary timer identified by
473.Va ident .
474When adding a timer,
475.Va data
476specifies the timeout period.
477The timer will be periodic unless
478.Dv EV_ONESHOT
479is specified.
480On return,
481.Va data
482contains the number of times the timeout has expired since the last call to
483.Fn kevent .
484This filter automatically sets the EV_CLEAR flag internally.
485There is a system wide limit on the number of timers
486which is controlled by the
487.Va kern.kq_calloutmax
488sysctl.
489.Bl -tag -width "Dv NOTE_USECONDS"
490.It Dv NOTE_SECONDS
491.Va data
492is in seconds.
493.It Dv NOTE_MSECONDS
494.Va data
495is in milliseconds.
496.It Dv NOTE_USECONDS
497.Va data
498is in microseconds.
499.It Dv NOTE_NSECONDS
500.Va data
501is in nanoseconds.
502.El
503.Pp
504If
505.Va fflags
506is not set, the default is milliseconds. On return,
507.Va fflags
508contains the events which triggered the filter.
509.It Dv EVFILT_USER
510Establishes a user event identified by
511.Va ident
512which is not associated with any kernel mechanism but is triggered by
513user level code.
514The lower 24 bits of the
515.Va fflags
516may be used for user defined flags and manipulated using the following:
517.Bl -tag -width "Dv NOTE_FFLAGSMASK"
518.It Dv NOTE_FFNOP
519Ignore the input
520.Va fflags .
521.It Dv NOTE_FFAND
522Bitwise AND
523.Va fflags .
524.It Dv NOTE_FFOR
525Bitwise OR
526.Va fflags .
527.It Dv NOTE_FFCOPY
528Copy
529.Va fflags .
530.It Dv NOTE_FFCTRLMASK
531Control mask for
532.Va fflags .
533.It Dv NOTE_FFLAGSMASK
534User defined flag mask for
535.Va fflags .
536.El
537.Pp
538A user event is triggered for output with the following:
539.Bl -tag -width "Dv NOTE_FFLAGSMASK"
540.It Dv NOTE_TRIGGER
541Cause the event to be triggered.
542.El
543.Pp
544On return,
545.Va fflags
546contains the users defined flags in the lower 24 bits.
547.El
548.Sh CANCELLATION BEHAVIOUR
549If
550.Fa nevents
551is non-zero, i.e. the function is potentially blocking, the call
552is a cancellation point.
553Otherwise, i.e. if
554.Fa nevents
555is zero, the call is not cancellable.
556Cancellation can only occur before any changes are made to the kqueue,
557or when the call was blocked and no changes to the queue were requested.
558.Sh RETURN VALUES
559The
560.Fn kqueue
561system call
562creates a new kernel event queue and returns a file descriptor.
563If there was an error creating the kernel event queue, a value of -1 is
564returned and errno set.
565.Pp
566The
567.Fn kevent
568system call
569returns the number of events placed in the
570.Fa eventlist ,
571up to the value given by
572.Fa nevents .
573If an error occurs while processing an element of the
574.Fa changelist
575and there is enough room in the
576.Fa eventlist ,
577then the event will be placed in the
578.Fa eventlist
579with
580.Dv EV_ERROR
581set in
582.Va flags
583and the system error in
584.Va data .
585Otherwise,
586.Dv -1
587will be returned, and
588.Dv errno
589will be set to indicate the error condition.
590If the time limit expires, then
591.Fn kevent
592returns 0.
593.Sh EXAMPLES
594.Bd -literal -compact
595#include <sys/types.h>
596#include <sys/event.h>
597#include <sys/time.h>
598#include <err.h>
599#include <fcntl.h>
600#include <stdio.h>
601#include <stdlib.h>
602#include <string.h>
603#include <unistd.h>
604
605int
606main(int argc, char **argv)
607{
608    struct kevent event;    /* Event we want to monitor */
609    struct kevent tevent;   /* Event triggered */
610    int kq, fd, ret;
611
612    if (argc != 2)
613	err(EXIT_FAILURE, "Usage: %s path\en", argv[0]);
614    fd = open(argv[1], O_RDONLY);
615    if (fd == -1)
616	err(EXIT_FAILURE, "Failed to open '%s'", argv[1]);
617
618    /* Create kqueue. */
619    kq = kqueue();
620    if (kq == -1)
621	err(EXIT_FAILURE, "kqueue() failed");
622
623    /* Initialize kevent structure. */
624    EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE,
625	0, NULL);
626    /* Attach event to the kqueue. */
627    ret = kevent(kq, &event, 1, NULL, 0, NULL);
628    if (ret == -1)
629	err(EXIT_FAILURE, "kevent register");
630    if (event.flags & EV_ERROR)
631	errx(EXIT_FAILURE, "Event error: %s", strerror(event.data));
632
633    for (;;) {
634	/* Sleep until something happens. */
635	ret = kevent(kq, NULL, 0, &tevent, 1, NULL);
636	if (ret == -1) {
637	    err(EXIT_FAILURE, "kevent wait");
638	} else if (ret > 0) {
639	    printf("Something was written in '%s'\en", argv[1]);
640	}
641    }
642}
643.Ed
644.Sh ERRORS
645The
646.Fn kqueue
647system call fails if:
648.Bl -tag -width Er
649.It Bq Er ENOMEM
650The kernel failed to allocate enough memory for the kernel queue.
651.It Bq Er EMFILE
652The per-process descriptor table is full.
653.It Bq Er ENFILE
654The system file table is full.
655.El
656.Pp
657The
658.Fn kevent
659system call fails if:
660.Bl -tag -width Er
661.It Bq Er EACCES
662The process does not have permission to register a filter.
663.It Bq Er EFAULT
664There was an error reading or writing the
665.Va kevent
666structure.
667.It Bq Er EBADF
668The specified descriptor is invalid.
669.It Bq Er EINTR
670A signal was delivered before the timeout expired and before any
671events were placed on the kqueue for return.
672.It Bq Er EINTR
673A cancellation request was delivered to the thread, but not yet handled.
674.It Bq Er EINVAL
675The specified time limit or filter is invalid.
676.It Bq Er ENOENT
677The event could not be found to be modified or deleted.
678.It Bq Er ENOMEM
679No memory was available to register the event
680or, in the special case of a timer, the maximum number of
681timers has been exceeded.
682This maximum is configurable via the
683.Va kern.kq_calloutmax
684sysctl.
685.It Bq Er ESRCH
686The specified process to attach to does not exist.
687.El
688.Pp
689When
690.Fn kevent
691call fails with
692.Er EINTR
693error, all changes in the
694.Fa changelist
695have been applied.
696.Sh SEE ALSO
697.Xr aio_error 2 ,
698.Xr aio_read 2 ,
699.Xr aio_return 2 ,
700.Xr poll 2 ,
701.Xr read 2 ,
702.Xr select 2 ,
703.Xr sigaction 2 ,
704.Xr write 2 ,
705.Xr pthread_setcancelstate 3 ,
706.Xr signal 3
707.Sh HISTORY
708The
709.Fn kqueue
710and
711.Fn kevent
712system calls first appeared in
713.Fx 4.1 .
714.Sh AUTHORS
715The
716.Fn kqueue
717system and this manual page were written by
718.An Jonathan Lemon Aq jlemon@FreeBSD.org .
719.Sh BUGS
720The
721.Fa timeout
722value is limited to 24 hours; longer timeouts will be silently
723reinterpreted as 24 hours.
724