1.\" Copyright (c) 2000 Jonathan Lemon 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd April 21, 2020 28.Dt KQUEUE 2 29.Os 30.Sh NAME 31.Nm kqueue , 32.Nm kevent 33.Nd kernel event notification mechanism 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/types.h 38.In sys/event.h 39.In sys/time.h 40.Ft int 41.Fn kqueue "void" 42.Ft int 43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 44.Fn EV_SET "kev" ident filter flags fflags data udata 45.Sh DESCRIPTION 46The 47.Fn kqueue 48system call 49provides a generic method of notifying the user when an event 50happens or a condition holds, based on the results of small 51pieces of kernel code termed filters. 52A kevent is identified by the (ident, filter) pair; there may only 53be one unique kevent per kqueue. 54.Pp 55The filter is executed upon the initial registration of a kevent 56in order to detect whether a preexisting condition is present, and is also 57executed whenever an event is passed to the filter for evaluation. 58If the filter determines that the condition should be reported, 59then the kevent is placed on the kqueue for the user to retrieve. 60.Pp 61The filter is also run when the user attempts to retrieve the kevent 62from the kqueue. 63If the filter indicates that the condition that triggered 64the event no longer holds, the kevent is removed from the kqueue and 65is not returned. 66.Pp 67Multiple events which trigger the filter do not result in multiple 68kevents being placed on the kqueue; instead, the filter will aggregate 69the events into a single struct kevent. 70Calling 71.Fn close 72on a file descriptor will remove any kevents that reference the descriptor. 73.Pp 74The 75.Fn kqueue 76system call 77creates a new kernel event queue and returns a descriptor. 78The queue is not inherited by a child created with 79.Xr fork 2 . 80However, if 81.Xr rfork 2 82is called without the 83.Dv RFFDG 84flag, then the descriptor table is shared, 85which will allow sharing of the kqueue between two processes. 86.Pp 87The 88.Fn kevent 89system call 90is used to register events with the queue, and return any pending 91events to the user. 92The 93.Fa changelist 94argument 95is a pointer to an array of 96.Va kevent 97structures, as defined in 98.In sys/event.h . 99All changes contained in the 100.Fa changelist 101are applied before any pending events are read from the queue. 102The 103.Fa nchanges 104argument 105gives the size of 106.Fa changelist . 107The 108.Fa eventlist 109argument 110is a pointer to an array of kevent structures. 111The 112.Fa nevents 113argument 114determines the size of 115.Fa eventlist . 116When 117.Fa nevents 118is zero, 119.Fn kevent 120will return immediately even if there is a 121.Fa timeout 122specified unlike 123.Xr select 2 . 124If 125.Fa timeout 126is a non-NULL pointer, it specifies a maximum interval to wait 127for an event, which will be interpreted as a struct timespec. 128If 129.Fa timeout 130is a NULL pointer, 131.Fn kevent 132waits indefinitely. 133To effect a poll, the 134.Fa timeout 135argument should be non-NULL, pointing to a zero-valued 136.Va timespec 137structure. 138The same array may be used for the 139.Fa changelist 140and 141.Fa eventlist . 142.Pp 143The 144.Fn EV_SET 145macro is provided for ease of initializing a 146kevent structure. 147.Pp 148The 149.Va kevent 150structure is defined as: 151.Bd -literal 152struct kevent { 153 uintptr_t ident; /* identifier for this event */ 154 short filter; /* filter for event */ 155 u_short flags; /* action flags for kqueue */ 156 u_int fflags; /* filter flag value */ 157 intptr_t data; /* filter data value */ 158 void *udata; /* opaque user data identifier */ 159}; 160.Ed 161.Pp 162The fields of 163.Fa struct kevent 164are: 165.Bl -tag -width "Fa filter" 166.It Fa ident 167Value used to identify this event. 168The exact interpretation is determined by the attached filter, 169but often is a file descriptor. 170.It Fa filter 171Identifies the kernel filter used to process this event. 172The pre-defined 173system filters are described below. 174.It Fa flags 175Actions to perform on the event. 176.It Fa fflags 177Filter-specific flags. 178.It Fa data 179Filter-specific data value. 180.It Fa udata 181Opaque user-defined value passed through the kernel unchanged. 182.El 183.Pp 184The 185.Va flags 186field can contain the following values: 187.Bl -tag -width EV_DISPATCH 188.It Dv EV_ADD 189Adds the event to the kqueue. 190Re-adding an existing event 191will modify the parameters of the original event, and not result 192in a duplicate entry. 193Adding an event automatically enables it, 194unless overridden by the EV_DISABLE flag. 195.It Dv EV_ENABLE 196Permit 197.Fn kevent 198to return the event if it is triggered. 199.It Dv EV_DISABLE 200Disable the event so 201.Fn kevent 202will not return it. 203The filter itself is not disabled. 204.It Dv EV_DISPATCH 205Disable the event source immediately after delivery of an event. 206See 207.Dv EV_DISABLE 208above. 209.It Dv EV_DELETE 210Removes the event from the kqueue. 211Events which are attached to 212file descriptors are automatically deleted on the last close of 213the descriptor. 214.It Dv EV_RECEIPT 215This flag is useful for making bulk changes to a kqueue without draining 216any pending events. 217When passed as input, it forces 218.Dv EV_ERROR 219to always be returned. 220When a filter is successfully added the 221.Va data 222field will be zero. 223Note that if this flag is encountered and there is no remaining space in 224.Fa eventlist 225to hold the 226.Dv EV_ERROR 227event, then subsequent changes will not get processed. 228.It Dv EV_ONESHOT 229Causes the event to return only the first occurrence of the filter 230being triggered. 231After the user retrieves the event from the kqueue, 232it is deleted. 233.It Dv EV_CLEAR 234After the event is retrieved by the user, its state is reset. 235This is useful for filters which report state transitions 236instead of the current state. 237Note that some filters may automatically 238set this flag internally. 239.It Dv EV_EOF 240Filters may set this flag to indicate filter-specific EOF condition. 241.It Dv EV_ERROR 242See 243.Sx RETURN VALUES 244below. 245.El 246.Pp 247The predefined system filters are listed below. 248Arguments may be passed to and from the filter via the 249.Va fflags 250and 251.Va data 252fields in the kevent structure. 253.Bl -tag -width "Dv EVFILT_PROCDESC" 254.It Dv EVFILT_READ 255Takes a descriptor as the identifier, and returns whenever 256there is data available to read. 257The behavior of the filter is slightly different depending 258on the descriptor type. 259.Bl -tag -width 2n 260.It Sockets 261Sockets which have previously been passed to 262.Fn listen 263return when there is an incoming connection pending. 264.Va data 265contains the size of the listen backlog. 266.Pp 267Other socket descriptors return when there is data to be read, 268subject to the 269.Dv SO_RCVLOWAT 270value of the socket buffer. 271This may be overridden with a per-filter low water mark at the 272time the filter is added by setting the 273.Dv NOTE_LOWAT 274flag in 275.Va fflags , 276and specifying the new low water mark in 277.Va data . 278On return, 279.Va data 280contains the number of bytes of protocol data available to read. 281.Pp 282If the read direction of the socket has shutdown, then the filter 283also sets 284.Dv EV_EOF 285in 286.Va flags , 287and returns the socket error (if any) in 288.Va fflags . 289It is possible for EOF to be returned (indicating the connection is gone) 290while there is still data pending in the socket buffer. 291.It Vnodes 292Returns when the file pointer is not at the end of file. 293.Va data 294contains the offset from current position to end of file, 295and may be negative. 296.Pp 297This behavior is different from 298.Xr poll 2 , 299where read events are triggered for regular files unconditionally. 300This event can be triggered unconditionally by setting the 301.Dv NOTE_FILE_POLL 302flag in 303.Va fflags . 304.It "Fifos, Pipes" 305Returns when the there is data to read; 306.Va data 307contains the number of bytes available. 308.Pp 309When the last writer disconnects, the filter will set 310.Dv EV_EOF 311in 312.Va flags . 313This may be cleared by passing in 314.Dv EV_CLEAR , 315at which point the 316filter will resume waiting for data to become available before 317returning. 318.It "BPF devices" 319Returns when the BPF buffer is full, the BPF timeout has expired, or 320when the BPF has 321.Dq immediate mode 322enabled and there is any data to read; 323.Va data 324contains the number of bytes available. 325.El 326.It Dv EVFILT_WRITE 327Takes a descriptor as the identifier, and returns whenever 328it is possible to write to the descriptor. 329For sockets, pipes 330and fifos, 331.Va data 332will contain the amount of space remaining in the write buffer. 333The filter will set EV_EOF when the reader disconnects, and for 334the fifo case, this may be cleared by use of 335.Dv EV_CLEAR . 336Note that this filter is not supported for vnodes or BPF devices. 337.Pp 338For sockets, the low water mark and socket error handling is 339identical to the 340.Dv EVFILT_READ 341case. 342.It Dv EVFILT_AIO 343Events for this filter are not registered with 344.Fn kevent 345directly but are registered via the 346.Va aio_sigevent 347member of an asynchronous I/O request when it is scheduled via an 348asynchronous I/O system call such as 349.Fn aio_read . 350The filter returns under the same conditions as 351.Fn aio_error . 352For more details on this filter see 353.Xr sigevent 3 and 354.Xr aio 4 . 355.It Dv EVFILT_VNODE 356Takes a file descriptor as the identifier and the events to watch for in 357.Va fflags , 358and returns when one or more of the requested events occurs on the descriptor. 359The events to monitor are: 360.Bl -tag -width "Dv NOTE_CLOSE_WRITE" 361.It Dv NOTE_ATTRIB 362The file referenced by the descriptor had its attributes changed. 363.It Dv NOTE_CLOSE 364A file descriptor referencing the monitored file, was closed. 365The closed file descriptor did not have write access. 366.It Dv NOTE_CLOSE_WRITE 367A file descriptor referencing the monitored file, was closed. 368The closed file descriptor had write access. 369.Pp 370This note, as well as 371.Dv NOTE_CLOSE , 372are not activated when files are closed forcibly by 373.Xr unmount 2 or 374.Xr revoke 2 . 375Instead, 376.Dv NOTE_REVOKE 377is sent for such events. 378.It Dv NOTE_DELETE 379The 380.Fn unlink 381system call was called on the file referenced by the descriptor. 382.It Dv NOTE_EXTEND 383For regular file, the file referenced by the descriptor was extended. 384.Pp 385For directory, reports that a directory entry was added or removed, 386as the result of rename operation. 387The 388.Dv NOTE_EXTEND 389event is not reported when a name is changed inside the directory. 390.It Dv NOTE_LINK 391The link count on the file changed. 392In particular, the 393.Dv NOTE_LINK 394event is reported if a subdirectory was created or deleted inside 395the directory referenced by the descriptor. 396.It Dv NOTE_OPEN 397The file referenced by the descriptor was opened. 398.It Dv NOTE_READ 399A read occurred on the file referenced by the descriptor. 400.It Dv NOTE_RENAME 401The file referenced by the descriptor was renamed. 402.It Dv NOTE_REVOKE 403Access to the file was revoked via 404.Xr revoke 2 405or the underlying file system was unmounted. 406.It Dv NOTE_WRITE 407A write occurred on the file referenced by the descriptor. 408.El 409.Pp 410On return, 411.Va fflags 412contains the events which triggered the filter. 413.It Dv EVFILT_PROC 414Takes the process ID to monitor as the identifier and the events to watch for 415in 416.Va fflags , 417and returns when the process performs one or more of the requested events. 418If a process can normally see another process, it can attach an event to it. 419The events to monitor are: 420.Bl -tag -width "Dv NOTE_TRACKERR" 421.It Dv NOTE_EXIT 422The process has exited. 423The exit status will be stored in 424.Va data . 425.It Dv NOTE_FORK 426The process has called 427.Fn fork . 428.It Dv NOTE_EXEC 429The process has executed a new process via 430.Xr execve 2 431or a similar call. 432.It Dv NOTE_TRACK 433Follow a process across 434.Fn fork 435calls. 436The parent process registers a new kevent to monitor the child process 437using the same 438.Va fflags 439as the original event. 440The child process will signal an event with 441.Dv NOTE_CHILD 442set in 443.Va fflags 444and the parent PID in 445.Va data . 446.Pp 447If the parent process fails to register a new kevent 448.Pq usually due to resource limitations , 449it will signal an event with 450.Dv NOTE_TRACKERR 451set in 452.Va fflags , 453and the child process will not signal a 454.Dv NOTE_CHILD 455event. 456.El 457.Pp 458On return, 459.Va fflags 460contains the events which triggered the filter. 461.It Dv EVFILT_PROCDESC 462Takes the process descriptor created by 463.Xr pdfork 2 464to monitor as the identifier and the events to watch for in 465.Va fflags , 466and returns when the associated process performs one or more of the 467requested events. 468The events to monitor are: 469.Bl -tag -width "Dv NOTE_EXIT" 470.It Dv NOTE_EXIT 471The process has exited. 472The exit status will be stored in 473.Va data . 474.El 475.Pp 476On return, 477.Va fflags 478contains the events which triggered the filter. 479.It Dv EVFILT_SIGNAL 480Takes the signal number to monitor as the identifier and returns 481when the given signal is delivered to the process. 482This coexists with the 483.Fn signal 484and 485.Fn sigaction 486facilities, and has a lower precedence. 487The filter will record 488all attempts to deliver a signal to a process, even if the signal has 489been marked as 490.Dv SIG_IGN , 491except for the 492.Dv SIGCHLD 493signal, which, if ignored, will not be recorded by the filter. 494Event notification happens after normal 495signal delivery processing. 496.Va data 497returns the number of times the signal has occurred since the last call to 498.Fn kevent . 499This filter automatically sets the 500.Dv EV_CLEAR 501flag internally. 502.It Dv EVFILT_TIMER 503Establishes an arbitrary timer identified by 504.Va ident . 505When adding a timer, 506.Va data 507specifies the timeout period. 508The timer will be periodic unless 509.Dv EV_ONESHOT 510is specified. 511On return, 512.Va data 513contains the number of times the timeout has expired since the last call to 514.Fn kevent . 515This filter automatically sets the EV_CLEAR flag internally. 516.Bl -tag -width "Dv NOTE_USECONDS" 517.It Dv NOTE_SECONDS 518.Va data 519is in seconds. 520.It Dv NOTE_MSECONDS 521.Va data 522is in milliseconds. 523.It Dv NOTE_USECONDS 524.Va data 525is in microseconds. 526.It Dv NOTE_NSECONDS 527.Va data 528is in nanoseconds. 529.El 530.Pp 531If 532.Va fflags 533is not set, the default is milliseconds. 534On return, 535.Va fflags 536contains the events which triggered the filter. 537.Pp 538If an existing timer is re-added, the existing timer will be 539effectively canceled (throwing away any undelivered record of previous 540timer expiration) and re-started using the new parameters contained in 541.Va data 542and 543.Va fflags . 544.Pp 545There is a system wide limit on the number of timers 546which is controlled by the 547.Va kern.kq_calloutmax 548sysctl. 549.It Dv EVFILT_USER 550Establishes a user event identified by 551.Va ident 552which is not associated with any kernel mechanism but is triggered by 553user level code. 554The lower 24 bits of the 555.Va fflags 556may be used for user defined flags and manipulated using the following: 557.Bl -tag -width "Dv NOTE_FFLAGSMASK" 558.It Dv NOTE_FFNOP 559Ignore the input 560.Va fflags . 561.It Dv NOTE_FFAND 562Bitwise AND 563.Va fflags . 564.It Dv NOTE_FFOR 565Bitwise OR 566.Va fflags . 567.It Dv NOTE_FFCOPY 568Copy 569.Va fflags . 570.It Dv NOTE_FFCTRLMASK 571Control mask for 572.Va fflags . 573.It Dv NOTE_FFLAGSMASK 574User defined flag mask for 575.Va fflags . 576.El 577.Pp 578A user event is triggered for output with the following: 579.Bl -tag -width "Dv NOTE_FFLAGSMASK" 580.It Dv NOTE_TRIGGER 581Cause the event to be triggered. 582.El 583.Pp 584On return, 585.Va fflags 586contains the users defined flags in the lower 24 bits. 587.El 588.Sh CANCELLATION BEHAVIOUR 589If 590.Fa nevents 591is non-zero, i.e., the function is potentially blocking, the call 592is a cancellation point. 593Otherwise, i.e., if 594.Fa nevents 595is zero, the call is not cancellable. 596Cancellation can only occur before any changes are made to the kqueue, 597or when the call was blocked and no changes to the queue were requested. 598.Sh RETURN VALUES 599The 600.Fn kqueue 601system call 602creates a new kernel event queue and returns a file descriptor. 603If there was an error creating the kernel event queue, a value of -1 is 604returned and errno set. 605.Pp 606The 607.Fn kevent 608system call 609returns the number of events placed in the 610.Fa eventlist , 611up to the value given by 612.Fa nevents . 613If an error occurs while processing an element of the 614.Fa changelist 615and there is enough room in the 616.Fa eventlist , 617then the event will be placed in the 618.Fa eventlist 619with 620.Dv EV_ERROR 621set in 622.Va flags 623and the system error in 624.Va data . 625Otherwise, 626.Dv -1 627will be returned, and 628.Dv errno 629will be set to indicate the error condition. 630If the time limit expires, then 631.Fn kevent 632returns 0. 633.Sh EXAMPLES 634.Bd -literal -compact 635#include <sys/types.h> 636#include <sys/event.h> 637#include <sys/time.h> 638#include <err.h> 639#include <fcntl.h> 640#include <stdio.h> 641#include <stdlib.h> 642#include <string.h> 643#include <unistd.h> 644 645int 646main(int argc, char **argv) 647{ 648 struct kevent event; /* Event we want to monitor */ 649 struct kevent tevent; /* Event triggered */ 650 int kq, fd, ret; 651 652 if (argc != 2) 653 err(EXIT_FAILURE, "Usage: %s path\en", argv[0]); 654 fd = open(argv[1], O_RDONLY); 655 if (fd == -1) 656 err(EXIT_FAILURE, "Failed to open '%s'", argv[1]); 657 658 /* Create kqueue. */ 659 kq = kqueue(); 660 if (kq == -1) 661 err(EXIT_FAILURE, "kqueue() failed"); 662 663 /* Initialize kevent structure. */ 664 EV_SET(&event, fd, EVFILT_VNODE, EV_ADD | EV_CLEAR, NOTE_WRITE, 665 0, NULL); 666 /* Attach event to the kqueue. */ 667 ret = kevent(kq, &event, 1, NULL, 0, NULL); 668 if (ret == -1) 669 err(EXIT_FAILURE, "kevent register"); 670 if (event.flags & EV_ERROR) 671 errx(EXIT_FAILURE, "Event error: %s", strerror(event.data)); 672 673 for (;;) { 674 /* Sleep until something happens. */ 675 ret = kevent(kq, NULL, 0, &tevent, 1, NULL); 676 if (ret == -1) { 677 err(EXIT_FAILURE, "kevent wait"); 678 } else if (ret > 0) { 679 printf("Something was written in '%s'\en", argv[1]); 680 } 681 } 682} 683.Ed 684.Sh ERRORS 685The 686.Fn kqueue 687system call fails if: 688.Bl -tag -width Er 689.It Bq Er ENOMEM 690The kernel failed to allocate enough memory for the kernel queue. 691.It Bq Er ENOMEM 692The 693.Dv RLIMIT_KQUEUES 694rlimit 695(see 696.Xr getrlimit 2 ) 697for the current user would be exceeded. 698.It Bq Er EMFILE 699The per-process descriptor table is full. 700.It Bq Er ENFILE 701The system file table is full. 702.El 703.Pp 704The 705.Fn kevent 706system call fails if: 707.Bl -tag -width Er 708.It Bq Er EACCES 709The process does not have permission to register a filter. 710.It Bq Er EFAULT 711There was an error reading or writing the 712.Va kevent 713structure. 714.It Bq Er EBADF 715The specified descriptor is invalid. 716.It Bq Er EINTR 717A signal was delivered before the timeout expired and before any 718events were placed on the kqueue for return. 719.It Bq Er EINTR 720A cancellation request was delivered to the thread, but not yet handled. 721.It Bq Er EINVAL 722The specified time limit or filter is invalid. 723.It Bq Er ENOENT 724The event could not be found to be modified or deleted. 725.It Bq Er ENOMEM 726No memory was available to register the event 727or, in the special case of a timer, the maximum number of 728timers has been exceeded. 729This maximum is configurable via the 730.Va kern.kq_calloutmax 731sysctl. 732.It Bq Er ESRCH 733The specified process to attach to does not exist. 734.El 735.Pp 736When 737.Fn kevent 738call fails with 739.Er EINTR 740error, all changes in the 741.Fa changelist 742have been applied. 743.Sh SEE ALSO 744.Xr aio_error 2 , 745.Xr aio_read 2 , 746.Xr aio_return 2 , 747.Xr poll 2 , 748.Xr read 2 , 749.Xr select 2 , 750.Xr sigaction 2 , 751.Xr write 2 , 752.Xr pthread_setcancelstate 3 , 753.Xr signal 3 754.Sh HISTORY 755The 756.Fn kqueue 757and 758.Fn kevent 759system calls first appeared in 760.Fx 4.1 . 761.Sh AUTHORS 762The 763.Fn kqueue 764system and this manual page were written by 765.An Jonathan Lemon Aq Mt jlemon@FreeBSD.org . 766.Sh BUGS 767The 768.Fa timeout 769value is limited to 24 hours; longer timeouts will be silently 770reinterpreted as 24 hours. 771