1.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd October 4, 2008 27.Dt PMC.P4 3 28.Os 29.Sh NAME 30.Nm pmc.p4 31.Nd measurement events for 32.Tn "Intel Pentium 4" 33and other 34.Tn Netburst 35architecture CPUs 36.Sh LIBRARY 37.Lb libpmc 38.Sh SYNOPSIS 39.In pmc.h 40.Sh DESCRIPTION 41Intel P4 PMCs are present in Intel 42.Tn "Pentium 4" 43and 44.Tn Xeon 45processors that use the 46.Tn Netburst 47CPU architecture. 48.Pp 49These PMCs are documented in 50.Rs 51.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 52.%T "Volume 3: System Programming Guide" 53.%N "Order Number 245472-012" 54.%D 2003 55.%Q "Intel Corporation" 56.Re 57Further information about using these PMCs may be found in 58.Rs 59.%B "IA-32 Intel(R) Architecture Optimization Guide" 60.%D 2003 61.%N "Order Number 248966-009" 62.%Q "Intel Corporation" 63.Re 64Some of these events are affected by processor errata described in 65.Rs 66.%B "Intel(R) Pentium(R) 4 Processor Specification Update" 67.%N "Document Number: 249199-059" 68.%D "April 2005" 69.%Q "Intel Corporation" 70.Re 71.Ss PMC Features 72Intel Pentium 4 PMCs are 40 bits wide. 73Each CPU contains 18 PMCs, divided into 4 groups with 4, 4, 4 and 6 74PMCs respectively. 75On processors with hyperthreading support, PMC resources are shared 76between logical processors. 77These PMCs support the following capabilities: 78.Bl -column "PMC_CAP_INTERRUPT" "Support" 79.It Em Capability Ta Em Support 80.It PMC_CAP_CASCADE Ta Yes 81.It PMC_CAP_EDGE Ta Yes 82.It PMC_CAP_INTERRUPT Ta Yes 83.It PMC_CAP_INVERT Ta Yes 84.It PMC_CAP_READ Ta Yes 85.It PMC_CAP_PRECISE Ta Unimplemented 86.It PMC_CAP_SYSTEM Ta Yes 87.It PMC_CAP_TAGGING Ta Yes 88.It PMC_CAP_THRESHOLD Ta Yes 89.It PMC_CAP_USER Ta Yes 90.It PMC_CAP_WRITE Ta Yes 91.El 92.Ss Event Qualifiers 93Event specifiers for Intel P4 PMCs can have the following common 94qualifiers: 95.Bl -tag -width indent 96.It Li active= Ns Ar choice 97(On P4 HTT CPUs) Filter event counting based on which logical 98processors are active. 99The allowed values of 100.Ar choice 101are: 102.Pp 103.Bl -tag -width indent -compact 104.It Li any 105Count when either logical processor is active. 106.It Li both 107Count when both logical processors are active. 108.It Li none 109Count only when neither logical processor is active. 110.It Li single 111Count only when one logical processor is active. 112.El 113.Pp 114The default is 115.Dq Li both . 116.It Li cascade 117Configure the PMC to cascade onto its partner. 118See 119.Sx "Cascading P4 PMCs" 120below for more information. 121.It Li edge 122Configure the counter to count false to true transitions of the threshold 123comparison output. 124This qualifier only takes effect if a threshold qualifier has also been 125specified. 126.It Li complement 127Configure the counter to increment only when the event count seen is 128less than the threshold qualifier value specified. 129.It Li mask= Ns Ar qualifier 130Many event specifiers for Intel P4 PMCs need to be additionally 131qualified using a mask qualifier. 132The allowed syntax for these qualifiers is event specific and is 133described along with the events. 134.It Li os 135Configure the PMC to count when the CPL of the processor is 0. 136.It Li precise 137Select precise event based sampling. 138Precise sampling is supported by the hardware for a limited set of 139events. 140.It Li tag= Ns Ar value 141Configure the PMC to tag the internal uop selected by the other 142fields in this event specifier with value 143.Ar value . 144This feature is used when cascading PMCs. 145.It Li threshold= Ns Ar value 146Configure the PMC to increment only when the event counts seen are 147greater than the specified threshold value 148.Ar value . 149.It Li usr 150Configure the PMC to count when the CPL of the processor is 1, 2 or 3. 151.El 152.Pp 153If neither of the 154.Dq Li os 155or 156.Dq Li usr 157qualifiers are specified, the default is to enable both. 158.Pp 159On Intel Pentium 4 processors with HTT, events are 160divided into two classes: 161.Pp 162.Bl -tag -width indent -compact 163.It "TS Events" 164are those where hardware can differentiate between events 165generated on one logical processor from those generated on the 166other. 167.It "TI Events" 168are those where hardware cannot differentiate between events 169generated by multiple logical processors in a package. 170.El 171.Pp 172Only TS events are allowed for use with process-mode PMCs on 173Pentium-4/HTT CPUs. 174.Pp 175The event specifiers supported by Intel P4 PMCs are: 176.Bl -tag -width indent 177.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags 178.Pq "TI event" 179Count integer SIMD SSE2 instructions that operate on 128 bit SIMD 180operands. 181Qualifier 182.Ar flags 183can take the following value (which is also the default): 184.Pp 185.Bl -tag -width indent -compact 186.It Li all 187Count all uops operating on 128 bit SIMD integer operands in memory or 188XMM register. 189.El 190.Pp 191If an instruction contains more than one 128 bit MMX uop, then each 192uop will be counted. 193.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags 194.Pq "TI event" 195Count MMX instructions that operate on 64 bit SIMD operands. 196Qualifier 197.Ar flags 198can take the following value (which is also the default): 199.Pp 200.Bl -tag -width indent -compact 201.It Li all 202Count all uops operating on 64 bit SIMD integer operands in memory or 203in MMX registers. 204.El 205.Pp 206If an instruction contains more than one 64 bit MMX uop, then each 207uop will be counted. 208.It Li p4-b2b-cycles 209.Pq "TI event" 210Count back-to-back bus cycles. 211Further documentation for this event is unavailable. 212.It Li p4-bnr 213.Pq "TI event" 214Count bus-not-ready conditions. 215Further documentation for this event is unavailable. 216.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier 217.Pq "TS event" 218Count instruction fetch requests qualified by additional 219flags specified in 220.Ar qualifier . 221At this point only one flag is supported: 222.Pp 223.Bl -tag -width indent -compact 224.It Li tcmiss 225Count trace cache lookup misses. 226.El 227.Pp 228The default qualifier is also 229.Dq Li mask=tcmiss . 230.It Li p4-branch-retired Op Li ,mask= Ns Ar flags 231.Pq "TS event" 232Counts retired branches. 233Qualifier 234.Ar flags 235is a list of the following 236.Ql + 237separated strings: 238.Pp 239.Bl -tag -width indent -compact 240.It Li mmnp 241Count branches not-taken and predicted. 242.It Li mmnm 243Count branches not-taken and mis-predicted. 244.It Li mmtp 245Count branches taken and predicted. 246.It Li mmtm 247Count branches taken and mis-predicted. 248.El 249.Pp 250The default qualifier counts all four kinds of branches. 251.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier 252.Pq "TS event" 253Count the number of entries (clipped at 15) currently active in the 254BSQ. 255Qualifier 256.Ar qualifier 257is a 258.Ql + 259separated set of the following flags: 260.Pp 261.Bl -tag -width indent -compact 262.It Li req-type0 , Li req-type1 263Forms a 2-bit number used to select the request type encoding: 264.Pp 265.Bl -tag -width indent -compact 266.It Li 0 267reads excluding read invalidate 268.It Li 1 269read invalidates 270.It Li 2 271writes other than writebacks 272.It Li 3 273writebacks 274.El 275.Pp 276Bit 277.Dq Li req-type1 278is the MSB for this two bit number. 279.It Li req-len0 , Li req-len1 280Forms a two-bit number that specifies the request length encoding: 281.Pp 282.Bl -tag -width indent -compact 283.It Li 0 2840 chunks 285.It Li 1 2861 chunk 287.It Li 3 2888 chunks 289.El 290.Pp 291Bit 292.Dq Li req-len1 293is the MSB for this two bit number. 294.It Li req-io-type 295Count requests that are input or output requests. 296.It Li req-lock-type 297Count requests that lock the bus. 298.It Li req-lock-cache 299Count requests that lock the cache. 300.It Li req-split-type 301Count requests that is a bus 8-byte chunk that is split across an 3028-byte boundary. 303.It Li req-dem-type 304Count requests that are demand (not prefetches) if set. 305Count requests that are prefetches if not set. 306.It Li req-ord-type 307Count requests that are ordered. 308.It Li mem-type0 , Li mem-type1 , Li mem-type2 309Forms a 3-bit number that specifies a memory type encoding: 310.Pp 311.Bl -tag -width indent -compact 312.It Li 0 313UC 314.It Li 1 315USWC 316.It Li 4 317WT 318.It Li 5 319WP 320.It Li 6 321WB 322.El 323.Pp 324Bit 325.Dq Li mem-type2 326is the MSB of this 3-bit number. 327.El 328.Pp 329The default qualifier has all the above bits set. 330.Pp 331Edge triggering using the 332.Dq Li edge 333qualifier should not be used with this event when counting cycles. 334.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier 335.Pq "TS event" 336Count allocations in the bus sequence unit according to the flags 337specified in 338.Ar qualifier , 339which is a 340.Ql + 341separated set of the following flags: 342.Pp 343.Bl -tag -width indent -compact 344.It Li req-type0 , Li req-type1 345Forms a 2-bit number used to select the request type encoding: 346.Pp 347.Bl -tag -width indent -compact 348.It Li 0 349reads excluding read invalidate 350.It Li 1 351read invalidates 352.It Li 2 353writes other than writebacks 354.It Li 3 355writebacks 356.El 357.Pp 358Bit 359.Dq Li req-type1 360is the MSB for this two bit number. 361.It Li req-len0 , Li req-len1 362Forms a two-bit number that specifies the request length encoding: 363.Pp 364.Bl -tag -width indent -compact 365.It Li 0 3660 chunks 367.It Li 1 3681 chunk 369.It Li 3 3708 chunks 371.El 372.Pp 373Bit 374.Dq Li req-len1 375is the MSB for this two bit number. 376.It Li req-io-type 377Count requests that are input or output requests. 378.It Li req-lock-type 379Count requests that lock the bus. 380.It Li req-lock-cache 381Count requests that lock the cache. 382.It Li req-split-type 383Count requests that is a bus 8-byte chunk that is split across an 3848-byte boundary. 385.It Li req-dem-type 386Count requests that are demand (not prefetches) if set. 387Count requests that are prefetches if not set. 388.It Li req-ord-type 389Count requests that are ordered. 390.It Li mem-type0 , Li mem-type1 , Li mem-type2 391Forms a 3-bit number that specifies a memory type encoding: 392.Pp 393.Bl -tag -width indent -compact 394.It Li 0 395UC 396.It Li 1 397USWC 398.It Li 4 399WT 400.It Li 5 401WP 402.It Li 6 403WB 404.El 405.Pp 406Bit 407.Dq Li mem-type2 408is the MSB of this 3-bit number. 409.El 410.Pp 411The default qualifier has all the above bits set. 412.Pp 413This event is usually used along with the 414.Dq Li edge 415qualifier to avoid multiple counting. 416.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier 417.Pq "TS event" 418Count cache references as seen by the bus unit (2nd or 3rd level 419cache references). 420Qualifier 421.Ar qualifier 422is a 423.Ql + 424separated list of the following keywords: 425.Pp 426.Bl -tag -width indent -compact 427.It Li rd-2ndl-hits 428Count 2nd level cache hits in the shared state. 429.It Li rd-2ndl-hite 430Count 2nd level cache hits in the exclusive state. 431.It Li rd-2ndl-hitm 432Count 2nd level cache hits in the modified state. 433.It Li rd-3rdl-hits 434Count 3rd level cache hits in the shared state. 435.It Li rd-3rdl-hite 436Count 3rd level cache hits in the exclusive state. 437.It Li rd-3rdl-hitm 438Count 3rd level cache hits in the modified state. 439.It Li rd-2ndl-miss 440Count 2nd level cache misses. 441.It Li rd-3rdl-miss 442Count 3rd level cache misses. 443.It Li wr-2ndl-miss 444Count write-back lookups from the data access cache that miss the 2nd 445level cache. 446.El 447.Pp 448The default is to count all the above events. 449.It Li p4-execution-event Op Li ,mask= Ns Ar flags 450.Pq "TS event" 451Count the retirement of tagged uops selected through the execution 452tagging mechanism. 453Qualifier 454.Ar flags 455can contain the following strings separated by 456.Ql + 457characters: 458.Pp 459.Bl -tag -width indent -compact 460.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3 461The marked uops are not bogus. 462.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3 463The marked uops are bogus. 464.El 465.Pp 466This event requires additional (upstream) events to be allocated to 467perform the desired uop tagging. 468The default is to set all the above flags. 469This event can be used for precise event based sampling. 470.It Li p4-front-end-event Op Li ,mask= Ns Ar flags 471.Pq "TS event" 472Count the retirement of tagged uops selected through the front-end 473tagging mechanism. 474Qualifier 475.Ar flags 476can contain the following strings separated by 477.Ql + 478characters: 479.Pp 480.Bl -tag -width indent -compact 481.It Li nbogus 482The marked uops are not bogus. 483.It Li bogus 484The marked uops are bogus. 485.El 486.Pp 487This event requires additional (upstream) events to be allocated to 488perform the desired uop tagging. 489The default is to select both kinds of events. 490This event can be used for precise event based sampling. 491.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags 492.Pq "TI event" 493Count each DBSY or DRDY event selected by qualifier 494.Ar flags . 495Qualifier 496.Ar flags 497is a 498.Ql + 499separated set of the following flags: 500.Pp 501.Bl -tag -width indent -compact 502.It Li drdy-drv 503Count when this processor is driving data onto the bus. 504.It Li drdy-own 505Count when this processor is reading data from the bus. 506.It Li drdy-other 507Count when data is on the bus but not being sampled by this processor. 508.It Li dbsy-drv 509Count when this processor reserves the bus for use in the next cycle 510in order to drive data. 511.It Li dbsy-own 512Count when some agent reserves the bus for use in the next bus cycle 513to drive data that this processor will sample. 514.It Li dbsy-other 515Count when some agent reserves the bus for use in the next bus cycle 516to drive data that this processor will not sample. 517.El 518.Pp 519Flags 520.Dq Li drdy-own 521and 522.Dq Li drdy-other 523are mutually exclusive. 524Flags 525.Dq Li dbsy-own 526and 527.Dq Li dbsy-other 528are mutually exclusive. 529The default value for 530.Ar qualifier 531is 532.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own . 533.It Li p4-global-power-events Op Li ,mask= Ns Ar flags 534.Pq "TS event" 535Count cycles during which the processor is not stopped. 536Qualifier 537.Ar flags 538can take the following value (which is also the default): 539.Pp 540.Bl -tag -width indent -compact 541.It Li running 542Count cycles when the processor is active. 543.El 544.It Li p4-instr-retired Op Li ,mask= Ns Ar flags 545.Pq "TS event" 546Count instructions retired during a clock cycle. 547Qualifier 548.Ar flags 549comprises of the following strings separated by 550.Ql + 551characters: 552.Pp 553.Bl -tag -width indent -compact 554.It Li nbogusntag 555Count non-bogus instructions that are not tagged. 556.It Li nbogustag 557Count non-bogus instructions that are tagged. 558.It Li bogusntag 559Count bogus instructions that are not tagged. 560.It Li bogustag 561Count bogus instructions that are tagged. 562.El 563.Pp 564The default qualifier counts all the above kinds of instructions. 565.It Li p4-ioq-active-entries Xo 566.Op Li ,mask= Ns Ar qualifier 567.Op Li ,busreqtype= Ns Ar req-type 568.Xc 569.Pq "TS event" 570Count the number of entries (clipped at 15) in the IOQ that are 571active. 572The event masks are specified by qualifier 573.Ar qualifier 574and 575.Ar req-type . 576.Pp 577Qualifier 578.Ar qualifier 579is a 580.Ql + 581separated set of the following flags: 582.Pp 583.Bl -tag -width indent -compact 584.It Li all-read 585Count read entries. 586.It Li all-write 587Count write entries. 588.It Li mem-uc 589Count entries accessing un-cacheable memory. 590.It Li mem-wc 591Count entries accessing write-combining memory. 592.It Li mem-wt 593Count entries accessing write-through memory. 594.It Li mem-wp 595Count entries accessing write-protected memory 596.It Li mem-wb 597Count entries accessing write-back memory. 598.It Li own 599Count store requests driven by the processor (i.e., not by other 600processors or by DMA). 601.It Li other 602Count store requests driven by other processors or by DMA. 603.It Li prefetch 604Include hardware and software prefetch requests in the count. 605.El 606.Pp 607The default value for 608.Ar qualifier 609is to enable all the above flags. 610.Pp 611The 612.Ar req-type 613qualifier is a 5-bit number can be additionally used to select a 614specific bus request type. 615The default is 0. 616.Pp 617The 618.Dq Li edge 619qualifier should not be used when counting cycles with this event. 620The exact behavior of this event depends on the processor revision. 621.It Li p4-ioq-allocation Xo 622.Op Li ,mask= Ns Ar qualifier 623.Op Li ,busreqtype= Ns Ar req-type 624.Xc 625.Pq "TS event" 626Count various types of transactions on the bus matching the flags set 627in 628.Ar qualifier 629and 630.Ar req-type . 631.Pp 632Qualifier 633.Ar qualifier 634is a 635.Ql + 636separated set of the following flags: 637.Pp 638.Bl -tag -width indent -compact 639.It Li all-read 640Count read entries. 641.It Li all-write 642Count write entries. 643.It Li mem-uc 644Count entries accessing un-cacheable memory. 645.It Li mem-wc 646Count entries accessing write-combining memory. 647.It Li mem-wt 648Count entries accessing write-through memory. 649.It Li mem-wp 650Count entries accessing write-protected memory 651.It Li mem-wb 652Count entries accessing write-back memory. 653.It Li own 654Count store requests driven by the processor (i.e., not by other 655processors or by DMA). 656.It Li other 657Count store requests driven by other processors or by DMA. 658.It Li prefetch 659Include hardware and software prefetch requests in the count. 660.El 661.Pp 662The default value for 663.Ar qualifier 664is to enable all the above flags. 665.Pp 666The 667.Ar req-type 668qualifier is a 5-bit number can be additionally used to select a 669specific bus request type. 670The default is 0. 671.Pp 672The 673.Dq Li edge 674qualifier is normally used with this event to prevent multiple 675counting. 676The exact behavior of this event depends on the processor revision. 677.It Li p4-itlb-reference Op mask= Ns Ar qualifier 678.Pq "TS event" 679Count translations using the instruction translation look-aside 680buffer. 681The 682.Ar qualifier 683argument is a list of the following strings separated by 684.Ql + 685characters. 686.Pp 687.Bl -tag -width indent -compact 688.It Li hit 689Count ITLB hits. 690.It Li miss 691Count ITLB misses. 692.It Li hit-uc 693Count un-cacheable ITLB hits. 694.El 695.Pp 696If no 697.Ar qualifier 698is specified the default is to count all the three kinds of ITLB 699translations. 700.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier 701.Pq "TS event" 702Count replayed events at the load port. 703Qualifier 704.Ar qualifier 705can take on one value: 706.Pp 707.Bl -tag -width indent -compact 708.It Li split-ld 709Count split loads. 710.El 711.Pp 712The default value for 713.Ar qualifier 714is 715.Dq Li split-ld . 716.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags 717.Pq "TS event" 718Count mispredicted IA-32 branch instructions. 719Qualifier 720.Ar flags 721can take the following value (which is also the default): 722.Pp 723.Bl -tag -width indent -compact 724.It Li nbogus 725Count non-bogus retired branch instructions. 726.El 727.It Li p4-machine-clear Op Li ,mask= Ns Ar flags 728.Pq "TS event" 729Count the number of pipeline clears seen by the processor. 730Qualifier 731.Ar flags 732is a list of the following strings separated by 733.Ql + 734characters: 735.Pp 736.Bl -tag -width indent -compact 737.It Li clear 738Count for a portion of the many cycles when the machine is being 739cleared for any reason. 740.It Li moclear 741Count machine clears due to memory ordering issues. 742.It Li smclear 743Count machine clears due to self-modifying code. 744.El 745.Pp 746Use qualifier 747.Dq Li edge 748to get a count of occurrences of machine clears. 749The default qualifier is 750.Dq Li clear . 751.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list 752.Pq "TS event" 753Count the canceling of various kinds of requests in the data cache 754address control unit of the CPU. 755The qualifier 756.Ar event-list 757is a list of the following strings separated by 758.Ql + 759characters: 760.Pp 761.Bl -tag -width indent -compact 762.It Li st-rb-full 763Requests cancelled because no store request buffer was available. 764.It Li 64k-conf 765Requests that conflict due to 64K aliasing. 766.El 767.Pp 768If 769.Ar event-list 770is not specified, then the default is to count both kinds of events. 771.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list 772.Pq "TS event" 773Count the completion of load split, store split, un-cacheable split and 774un-cacheable load operations selected by qualifier 775.Ar event-list . 776The qualifier 777.Ar event-list 778is a 779.Ql + 780separated list of the following flags: 781.Pp 782.Bl -tag -width indent -compact 783.It Li lsc 784Count load splits completed, excluding loads from un-cacheable or 785write-combining areas. 786.It Li ssc 787Count any split stores completed. 788.El 789.Pp 790The default is to count both kinds of operations. 791.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier 792.Pq "TS event" 793Count load replays triggered by the memory order buffer. 794Qualifier 795.Ar qualifier 796can be a 797.Ql + 798separated list of the following flags: 799.Pp 800.Bl -tag -width indent -compact 801.It Li no-sta 802Count replays because of unknown store addresses. 803.It Li no-std 804Count replays because of unknown store data. 805.It Li partial-data 806Count replays because of partially overlapped data accesses between 807load and store operations. 808.It Li unalgn-addr 809Count replays because of mismatches in the lower 4 bits of load and 810store operations. 811.El 812.Pp 813The default qualifier is 814.Ar no-sta+no-std+partial-data+unalgn-addr . 815.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags 816.Pq "TI event" 817Count packed double-precision uops. 818Qualifier 819.Ar flags 820can take the following value (which is also the default): 821.Pp 822.Bl -tag -width indent -compact 823.It Li all 824Count all uops operating on packed double-precision operands. 825.El 826.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags 827.Pq "TI event" 828Count packed single-precision uops. 829Qualifier 830.Ar flags 831can take the following value (which is also the default): 832.Pp 833.Bl -tag -width indent -compact 834.It Li all 835Count all uops operating on packed single-precision operands. 836.El 837.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier 838.Pq "TI event" 839Count page walks performed by the page miss handler. 840Qualifier 841.Ar qualifier 842can be a 843.Ql + 844separated list of the following keywords: 845.Pp 846.Bl -tag -width indent -compact 847.It Li dtmiss 848Count page walks for data TLB misses. 849.It Li itmiss 850Count page walks for instruction TLB misses. 851.El 852.Pp 853The default value for 854.Ar qualifier 855is 856.Dq Li dtmiss+itmiss . 857.It Li p4-replay-event Op Li ,mask= Ns Ar flags 858.Pq "TS event" 859Count the retirement of tagged uops selected through the replay 860tagging mechanism. 861Qualifier 862.Ar flags 863contains a 864.Ql + 865separated set of the following strings: 866.Pp 867.Bl -tag -width indent -compact 868.It Li nbogus 869The marked uops are not bogus. 870.It Li bogus 871The marked uops are bogus. 872.El 873.Pp 874This event requires additional (upstream) events to be allocated to 875perform the desired uop tagging. 876The default qualifier counts both kinds of uops. 877This event can be used for precise event based sampling. 878.It Li p4-resource-stall Op Li ,mask= Ns Ar flags 879.Pq "TS event" 880Count the occurrence or latency of stalls in the allocator. 881Qualifier 882.Ar flags 883can take the following value (which is also the default): 884.Pp 885.Bl -tag -width indent -compact 886.It Li sbfull 887A stall due to the lack of store buffers. 888.El 889.It Li p4-response 890.Pq "TI event" 891Count different types of responses. 892Further documentation on this event is not available. 893.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags 894.Pq "TS event" 895Count branches retired. 896Qualifier 897.Ar flags 898contains a 899.Ql + 900separated list of strings: 901.Pp 902.Bl -tag -width indent -compact 903.It Li conditional 904Count conditional jumps. 905.It Li call 906Count direct and indirect call branches. 907.It Li return 908Count return branches. 909.It Li indirect 910Count returns, indirect calls or indirect jumps. 911.El 912.Pp 913The default qualifier counts all the above branch types. 914.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags 915.Pq "TS event" 916Count mispredicted branches retired. 917Qualifier 918.Ar flags 919contains a 920.Ql + 921separated list of strings: 922.Pp 923.Bl -tag -width indent -compact 924.It Li conditional 925Count conditional jumps. 926.It Li call 927Count indirect call branches. 928.It Li return 929Count return branches. 930.It Li indirect 931Count returns, indirect calls or indirect jumps. 932.El 933.Pp 934The default qualifier counts all the above branch types. 935.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags 936.Pq "TI event" 937Count the number of scalar double-precision uops. 938Qualifier 939.Ar flags 940can take the following value (which is also the default): 941.Pp 942.Bl -tag -width indent -compact 943.It Li all 944Count the number of scalar double-precision uops. 945.El 946.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags 947.Pq "TI event" 948Count the number of scalar single-precision uops. 949Qualifier 950.Ar flags 951can take the following value (which is also the default): 952.Pp 953.Bl -tag -width indent -compact 954.It Li all 955Count all uops operating on scalar single-precision operands. 956.El 957.It Li p4-snoop 958.Pq "TI event" 959Count snoop traffic. 960Further documentation on this event is not available. 961.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags 962.Pq "TI event" 963Count the number of times an assist is required to handle problems 964with the operands for SSE and SSE2 operations. 965Qualifier 966.Ar flags 967can take the following value (which is also the default): 968.Pp 969.Bl -tag -width indent -compact 970.It Li all 971Count assists for all SSE and SSE2 uops. 972.El 973.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier 974.Pq "TS event" 975Count events replayed at the store port. 976Qualifier 977.Ar qualifier 978can take on one value: 979.Pp 980.Bl -tag -width indent -compact 981.It Li split-st 982Count split stores. 983.El 984.Pp 985The default value for 986.Ar qualifier 987is 988.Dq Li split-st . 989.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier 990.Pq "TI event" 991Count the duration in cycles of operating modes of the trace cache and 992decode engine. 993The desired operating mode is selected by 994.Ar qualifier , 995which is a list of the following strings separated by 996.Ql + 997characters: 998.Pp 999.Bl -tag -width indent -compact 1000.It Li DD 1001Both logical processors are in deliver mode. 1002.It Li DB 1003Logical processor 0 is in deliver mode while logical processor 1 is in 1004build mode. 1005.It Li DI 1006Logical processor 0 is in deliver mode while logical processor 1 is 1007halted, or in machine clear, or transitioning to a long microcode 1008flow. 1009.It Li BD 1010Logical processor 0 is in build mode while logical processor 1 is in 1011deliver mode. 1012.It Li BB 1013Both logical processors are in build mode. 1014.It Li BI 1015Logical processor 0 is in build mode while logical processor 1 is 1016halted, or in machine clear or transitioning to a long microcode 1017flow. 1018.It Li ID 1019Logical processor 0 is halted, or in machine clear or transitioning to 1020a long microcode flow while logical processor 1 is in deliver mode. 1021.It Li IB 1022Logical processor 0 is halted, or in machine clear or transitioning to 1023a long microcode flow while logical processor 1 is in build mode. 1024.El 1025.Pp 1026If there is only one logical processor in the processor package then 1027the qualifier for logical processor 1 is ignored. 1028If no qualifier is specified, the default qualifier is 1029.Dq Li DD+DB+DI+BD+BB+BI+ID+IB . 1030.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags 1031.Pq "TI event" 1032Count the number of times uop delivery changed from the trace cache to 1033MS ROM. 1034Qualifier 1035.Ar flags 1036can take the following value (which is also the default): 1037.Pp 1038.Bl -tag -width indent -compact 1039.It Li cisc 1040Count TC to MS transfers. 1041.El 1042.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags 1043.Pq "TS event" 1044Count the number of valid uops written to the uop queue. 1045Qualifier 1046.Ar flags 1047is a list of the following strings, separated by 1048.Ql + 1049characters: 1050.Pp 1051.Bl -tag -width indent -compact 1052.It Li from-tc-build 1053Count uops being written from the trace cache in build mode. 1054.It Li from-tc-deliver 1055Count uops being written from the trace cache in deliver mode. 1056.It Li from-rom 1057Count uops being written from microcode ROM. 1058.El 1059.Pp 1060The default qualifier counts all the above kinds of uops. 1061.It Li p4-uop-type Op Li ,mask= Ns Ar flags 1062.Pq "TS event" 1063This event is used in conjunction with the front-end at-retirement 1064mechanism to tag load and store uops. 1065Qualifier 1066.Ar flags 1067comprises the following strings separated by 1068.Ql + 1069characters: 1070.Pp 1071.Bl -tag -width indent -compact 1072.It Li tagloads 1073Mark uops that are load operations. 1074.It Li tagstores 1075Mark uops that are store operations. 1076.El 1077.Pp 1078The default qualifier counts both kinds of uops. 1079.It Li p4-uops-retired Op Li ,mask= Ns Ar flags 1080.Pq "TS event" 1081Count uops retired during a clock cycle. 1082Qualifier 1083.Ar flags 1084comprises the following strings separated by 1085.Ql + 1086characters: 1087.Pp 1088.Bl -tag -width indent -compact 1089.It Li nbogus 1090Count marked uops that are not bogus. 1091.It Li bogus 1092Count marked uops that are bogus. 1093.El 1094.Pp 1095The default qualifier counts both kinds of uops. 1096.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags 1097.Pq "TI event" 1098Count write-combining buffer operations. 1099Qualifier 1100.Ar flags 1101contains the following strings separated by 1102.Ql + 1103characters: 1104.Pp 1105.Bl -tag -width indent -compact 1106.It Li wcb-evicts 1107WC buffer evictions due to any cause. 1108.It Li wcb-full-evict 1109WC buffer evictions due to no WC buffer being available. 1110.El 1111.Pp 1112The default qualifier counts both kinds of evictions. 1113.It Li p4-x87-assist Op Li ,mask= Ns Ar flags 1114.Pq "TS event" 1115Count the retirement of x87 instructions that required special 1116handling. 1117Qualifier 1118.Ar flags 1119contains the following strings separated by 1120.Ql + 1121characters: 1122.Pp 1123.Bl -tag -width indent -compact 1124.It Li fpsu 1125Count instructions that saw an FP stack underflow. 1126.It Li fpso 1127Count instructions that saw an FP stack overflow. 1128.It Li poao 1129Count instructions that saw an x87 output overflow. 1130.It Li poau 1131Count instructions that saw an x87 output underflow. 1132.It Li prea 1133Count instructions that needed an x87 input assist. 1134.El 1135.Pp 1136The default qualifier counts all the above types of instruction 1137retirements. 1138.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags 1139.Pq "TI event" 1140Count x87 floating-point uops. 1141Qualifier 1142.Ar flags 1143can take the following value (which is also the default): 1144.Pp 1145.Bl -tag -width indent -compact 1146.It Li all 1147Count all x87 floating-point uops. 1148.El 1149.Pp 1150If an instruction contains more than one x87 floating-point uops, then 1151all x87 floating-point uops will be counted. 1152This event does not count x87 floating-point data movement operations. 1153.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags 1154.Pq "TI event" 1155Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store 1156data or perform register-to-register moves. 1157This event does not count integer move uops. 1158Qualifier 1159.Ar flags 1160may contain the following keywords separated by 1161.Ql + 1162characters: 1163.Pp 1164.Bl -tag -width indent -compact 1165.It Li allp0 1166Count all x87 and SIMD store and move uops. 1167.It Li allp2 1168Count all x87 and SIMD load uops. 1169.El 1170.Pp 1171The default is to count all uops. 1172.Pq Errata 1173This event may be affected by processor errata N43. 1174.El 1175.Ss "Cascading P4 PMCs" 1176PMC cascading support is currently poorly implemented. 1177While individual event counters may be allocated with a 1178.Dq Li cascade 1179qualifier, the current API does not offer the ability 1180to name and allocate all the resources needed for a 1181cascaded event counter pair in a single operation. 1182.Ss "Precise Event Based Sampling" 1183Support for precise event based sampling is currently 1184unimplemented. 1185.Ss Event Name Aliases 1186The following table shows the mapping between the PMC-independent 1187aliases supported by 1188.Lb libpmc 1189and the underlying hardware events used. 1190.Bl -column "branch-mispredicts" "Description" 1191.It Em Alias Ta Em Event 1192.It Li branches Ta Li p4-branch-retired,mask=mmtp+mmtm 1193.It Li branch-mispredicts Ta Li p4-mispred-branch-retired 1194.It Li dc-misses Ta (unsupported) 1195.It Li ic-misses Ta (unsupported) 1196.It Li instructions Ta Li p4-instr-retired,mask=nbogusntag+nbogustag 1197.It Li interrupts Ta Li (unsupported) 1198.It Li unhalted-cycles Ta Li p4-global-power-events 1199.El 1200.Sh SEE ALSO 1201.Xr pmc 3 , 1202.Xr pmc.atom 3 , 1203.Xr pmc.core 3 , 1204.Xr pmc.core2 3 , 1205.Xr pmc.iaf 3 , 1206.Xr pmc.k7 3 , 1207.Xr pmc.k8 3 , 1208.Xr pmc.p5 3 , 1209.Xr pmc.p6 3 , 1210.Xr pmc.soft 3 , 1211.Xr pmc.tsc 3 , 1212.Xr pmclog 3 , 1213.Xr hwpmc 4 1214.Sh HISTORY 1215The 1216.Nm pmc 1217library first appeared in 1218.Fx 6.0 . 1219.Sh AUTHORS 1220The 1221.Lb libpmc 1222library was written by 1223.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org . 1224