1.\" Copyright (c) 2010 George Neville-Neil. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd March 24, 2012 27.Dt PMC.MIPS24K 3 28.Os 29.Sh NAME 30.Nm pmc.mips24k 31.Nd measurement events for 32.Tn MIPS24K 33family CPUs 34.Sh LIBRARY 35.Lb libpmc 36.Sh SYNOPSIS 37.In pmc.h 38.Sh DESCRIPTION 39MIPS PMCs are present in MIPS 40.Tn "24k" 41and other processors in the MIPS family. 42.Pp 43There are two counters supported by the hardware and each is 32 bits 44wide. 45.Pp 46MIPS PMCs are documented in 47.Rs 48.%B "MIPS32 24K Processor Core Family Software User's Manual" 49.%D December 2008 50.%Q "MIPS Technologies Inc." 51.Re 52.Ss Event Specifiers (Programmable PMCs) 53MIPS programmable PMCs support the following events: 54.Bl -tag -width indent 55.It Li CYCLE 56.Pq Event 0, Counter 0/1 57Total number of cycles. 58The performance counters are clocked by the 59top-level gated clock. 60If the core is built with that clock gater 61present, none of the counters will increment while the clock is 62stopped - due to a WAIT instruction. 63.It Li INSTR_EXECUTED 64.Pq Event 1, Counter 0/1 65Total number of instructions completed. 66.It Li BRANCH_COMPLETED 67.Pq Event 2, Counter 0 68Total number of branch instructions completed. 69.It Li BRANCH_MISPRED 70.Pq Event 2, Counter 1 71Counts all branch instructions which completed, but were mispredicted. 72.It Li RETURN 73.Pq Event 3, Counter 0 74Counts all JR R31 instructions completed. 75.It Li RETURN_MISPRED 76.Pq Event 3, Counter 1 77Counts all JR $31 instructions which completed, used the RPS for a prediction, but were mispredicted. 78.It Li RETURN_NOT_31 79.Pq Event 4, Counter 0 80Counts all JR $xx (not $31) and JALR instructions (indirect jumps). 81.It Li RETURN_NOTPRED 82.Pq Event 4, Counter 1 83If RPS use is disabled, JR $31 will not be predicted. 84.It Li ITLB_ACCESS 85.Pq Event 5, Counter 0 86Counts ITLB accesses that are due to fetches showing up in the 87instruction fetch stage of the pipeline and which do not use a fixed 88mapping or are not in unmapped space. 89If an address is fetched twice from the pipe (as in the case of a 90cache miss), that instruction willcount as 2 ITLB accesses. 91Since each fetch gets us 2 instructions,there is one access marked per double 92word. 93.It Li ITLB_MISS 94.Pq Event 5, Counter 1 95Counts all misses in the ITLB except ones that are on the back of another 96miss. 97We cannot process back to back misses and thus those are 98ignored. 99They are also ignored if there is some form of address error. 100.It Li DTLB_ACCESS 101.Pq Event 6, Counter 0 102Counts DTLB access including those in unmapped address spaces. 103.It Li DTLB_MISS 104.Pq Event 6, Counter 1 105Counts DTLB misses. 106Back to back misses that result in only one DTLB 107entry getting refilled are counted as a single miss. 108.It Li JTLB_IACCESS 109.Pq Event 7, Counter 0 110Instruction JTLB accesses are counted exactly the same as ITLB misses. 111.It Li JTLB_IMISS 112.Pq Event 7, Counter 1 113Counts instruction JTLB accesses that result in no match or a match on 114an invalid translation. 115.It Li JTLB_DACCESS 116.Pq Event 8, Counter 0 117Data JTLB accesses. 118.It Li JTLB_DMISS 119.Pq Event 8, Counter 1 120Counts data JTLB accesses that result in no match or a match on an invalid translation. 121.It Li IC_FETCH 122.Pq Event 9, Counter 0 123Counts every time the instruction cache is accessed. 124All replays, 125wasted fetches etc. are counted. 126For example, following a branch, even though the prediction is taken, 127the fall through access is counted. 128.It Li IC_MISS 129.Pq Event 9, Counter 1 130Counts all instruction cache misses that result in a bus request. 131.It Li DC_LOADSTORE 132.Pq Event 10, Counter 0 133Counts cached loads and stores. 134.It Li DC_WRITEBACK 135.Pq Event 10, Counter 1 136Counts cache lines written back to memory due to replacement or cacheops. 137.It Li DC_MISS 138.Pq Event 11, Counter 0/1 139Counts loads and stores that miss in the cache 140.It Li LOAD_MISS 141.Pq Event 13, Counter 0 142Counts number of cacheable loads that miss in the cache. 143.It Li STORE_MISS 144.Pq Event 13, Counter 1 145Counts number of cacheable stores that miss in the cache. 146.It Li INTEGER_COMPLETED 147.Pq Event 14, Counter 0 148Non-floating point, non-Coprocessor 2 instructions. 149.It Li FP_COMPLETED 150.Pq Event 14, Counter 1 151Floating point instructions completed. 152.It Li LOAD_COMPLETED 153.Pq Event 15, Counter 0 154Integer and co-processor loads completed. 155.It Li STORE_COMPLETED 156.Pq Event 15, Counter 1 157Integer and co-processor stores completed. 158.It Li BARRIER_COMPLETED 159.Pq Event 16, Counter 0 160Direct jump (and link) instructions completed. 161.It Li MIPS16_COMPLETED 162.Pq Event 16, Counter 1 163MIPS16c instructions completed. 164.It Li NOP_COMPLETED 165.Pq Event 17, Counter 0 166NOPs completed. 167This includes all instructions that normally write to a general 168purpose register, but where the destination register was set to r0. 169.It Li INTEGER_MULDIV_COMPLETED 170.Pq Event 17, Counter 1 171Integer multiply and divide instructions completed. (MULxx, DIVx, MADDx, MSUBx). 172.It Li RF_STALL 173.Pq Event 18, Counter 0 174Counts the total number of cycles where no instructions are issued 175from the IFU to ALU (the RF stage does not advance) which includes 176both of the previous two events. 177The RT_STALL is different than the sum of them though because cycles 178when both stalls are active will only be counted once. 179.It Li INSTR_REFETCH 180.Pq Event 18, Counter 1 181replay traps (other than uTLB) 182.It Li STORE_COND_COMPLETED 183.Pq Event 19, Counter 0 184Conditional stores completed. 185Counts all events, including failed stores. 186.It Li STORE_COND_FAILED 187.Pq Event 19, Counter 1 188Conditional store instruction that did not update memory. 189Note: While this event and the SC instruction count event can be configured to 190count in specific operating modes, the timing of the events is much 191different and the observed operating mode could change between them, 192causing some inaccuracy in the measured ratio. 193.It Li ICACHE_REQUESTS 194.Pq Event 20, Counter 0 195Note that this only counts PREFs that are actually attempted. 196PREFs to uncached addresses or ones with translation errors are not counted 197.It Li ICACHE_HIT 198.Pq Event 20, Counter 1 199Counts PREF instructions that hit in the cache 200.It Li L2_WRITEBACK 201.Pq Event 21, Counter 0 202Counts cache lines written back to memory due to replacement or cacheops. 203.It Li L2_ACCESS 204.Pq Event 21, Counter 1 205Number of accesses to L2 Cache. 206.It Li L2_MISS 207.Pq Event 22, Counter 0 208Number of accesses that missed in the L2 cache. 209.It Li L2_ERR_CORRECTED 210.Pq Event 22, Counter 1 211Single bit errors in L2 Cache that were detected and corrected. 212.It Li EXCEPTIONS 213.Pq Event 23, Counter 0 214Any type of exception taken. 215.It Li RF_CYCLES_STALLED 216.Pq Event 24, Counter 0 217Counts cycles where the LSU is in fixup and cannot accept a new 218instruction from the ALU. 219Fixups are replays within the LSU that occur when an instruction needs 220to re-access the cache or the DTLB. 221.It Li IFU_CYCLES_STALLED 222.Pq Event 25, Counter 0 223Counts the number of cycles where the fetch unit is not providing a 224valid instruction to the ALU. 225.It Li ALU_CYCLES_STALLED 226.Pq Event 25, Counter 1 227Counts the number of cycles where the ALU pipeline cannot advance. 228.It Li UNCACHED_LOAD 229.Pq Event 33, Counter 0 230Counts uncached and uncached accelerated loads. 231.It Li UNCACHED_STORE 232.Pq Event 33, Counter 1 233Counts uncached and uncached accelerated stores. 234.It Li CP2_REG_TO_REG_COMPLETED 235.Pq Event 35, Counter 0 236Co-processor 2 register to register instructions completed. 237.It Li MFTC_COMPLETED 238.Pq Event 35, Counter 1 239Co-processor 2 move to and from instructions as well as loads and stores. 240.It Li IC_BLOCKED_CYCLES 241.Pq Event 37, Counter 0 242Cycles when IFU stalls because an instruction miss caused the IFU not 243to have any runnable instructions. 244Ignores the stalls due to ITLB misses as well as the 4 cycles 245following a redirect. 246.It Li DC_BLOCKED_CYCLES 247.Pq Event 37, Counter 1 248Counts all cycles where integer pipeline waits on Load return data due 249to a D-cache miss. 250The LSU can signal a "long stall" on a D-cache misses, in which case 251the waiting TC might be rescheduled so other TCs can execute 252instructions till the data returns. 253.It Li L2_IMISS_STALL_CYCLES 254.Pq Event 38, Counter 0 255Cycles where the main pipeline is stalled waiting for a SYNC to complete. 256.It Li L2_DMISS_STALL_CYCLES 257.Pq Event 38, Counter 1 258Cycles where the main pipeline is stalled because of an index conflict 259in the Fill Store Buffer. 260.It Li DMISS_CYCLES 261.Pq Event 39, Counter 0 262Data miss is outstanding, but not necessarily stalling the pipeline. 263The difference between this and D$ miss stall cycles can show the gain 264from non-blocking cache misses. 265.It Li L2_MISS_CYCLES 266.Pq Event 39, Counter 1 267L2 miss is outstanding, but not necessarily stalling the pipeline. 268.It Li UNCACHED_BLOCK_CYCLES 269.Pq Event 40, Counter 0 270Cycles where the processor is stalled on an uncached fetch, load, or store. 271.It Li MDU_STALL_CYCLES 272.Pq Event 41, Counter 0 273Cycles where the processor is stalled on an uncached fetch, load, or store. 274.It Li FPU_STALL_CYCLES 275.Pq Event 41, Counter 1 276Counts all cycles where integer pipeline waits on FPU return data. 277.It Li CP2_STALL_CYCLES 278.Pq Event 42, Counter 0 279Counts all cycles where integer pipeline waits on CP2 return data. 280.It Li COREXTEND_STALL_CYCLES 281.Pq Event 42, Counter 1 282Counts all cycles where integer pipeline waits on CorExtend return data. 283.It Li ISPRAM_STALL_CYCLES 284.Pq Event 43, Counter 0 285Count all pipeline bubbles that are a result of multicycle ISPRAM 286access. 287Pipeline bubbles are defined as all cycles that IFU doesn't present an 288instruction to ALU. 289The four cycles after a redirect are not counted. 290.It Li DSPRAM_STALL_CYCLES 291.Pq Event 43, Counter 1 292Counts stall cycles created by an instruction waiting for access to DSPRAM. 293.It Li CACHE_STALL_CYCLES 294.Pq Event 44, Counter 0 295Counts all cycles the where pipeline is stalled due to CACHE 296instructions. 297Includes cycles where CACHE instructions themselves are 298stalled in the ALU, and cycles where CACHE instructions cause 299subsequent instructions to be stalled. 300.It Li LOAD_TO_USE_STALLS 301.Pq Event 45, Counter 0 302Counts all cycles where integer pipeline waits on Load return data. 303.It Li BASE_MISPRED_STALLS 304.Pq Event 45, Counter 1 305Counts stall cycles due to skewed ALU where the bypass to the address 306generation takes an extra cycle. 307.It Li CPO_READ_STALLS 308.Pq Event 46, Counter 0 309Counts all cycles where integer pipeline waits on return data from 310MFC0, RDHWR instructions. 311.It Li BRANCH_MISPRED_CYCLES 312.Pq Event 46, Counter 1 313This counts the number of cycles from a mispredicted branch until the 314next non-delay slot instruction executes. 315.It Li IFETCH_BUFFER_FULL 316.Pq Event 48, Counter 0 317Counts the number of times an instruction cache miss was detected, but 318both fill buffers were already allocated. 319.It Li FETCH_BUFFER_ALLOCATED 320.Pq Event 48, Counter 1 321Number of cycles where at least one of the IFU fill buffers is 322allocated (miss pending). 323.It Li EJTAG_ITRIGGER 324.Pq Event 49, Counter 0 325Number of times an EJTAG Instruction Trigger Point condition matched. 326.It Li EJTAG_DTRIGGER 327.Pq Event 49, Counter 1 328Number of times an EJTAG Data Trigger Point condition matched. 329.It Li FSB_LT_QUARTER 330.Pq Event 50, Counter 0 331Fill store buffer less than one quarter full. 332.It Li FSB_QUARTER_TO_HALF 333.Pq Event 50, Counter 1 334Fill store buffer between one quarter and one half full. 335.It Li FSB_GT_HALF 336.Pq Event 51, Counter 0 337Fill store buffer more than half full. 338.It Li FSB_FULL_PIPELINE_STALLS 339.Pq Event 51, Counter 1 340Cycles where the pipeline is stalled because the Fill-Store Buffer in LSU is full. 341.It Li LDQ_LT_QUARTER 342.Pq Event 52, Counter 0 343Load data queue less than one quarter full. 344.It Li LDQ_QUARTER_TO_HALF 345.Pq Event 52, Counter 1 346Load data queue between one quarter and one half full. 347.It Li LDQ_GT_HALF 348.Pq Event 53, Counter 0 349Load data queue more than one half full. 350.It Li LDQ_FULL_PIPELINE_STALLS 351.Pq Event 53, Counter 1 352Cycles where the pipeline is stalled because the Load Data Queue in the LSU is full. 353.It Li WBB_LT_QUARTER 354.Pq Event 54, Counter 0 355Write back buffer less than one quarter full. 356.It Li WBB_QUARTER_TO_HALF 357.Pq Event 54, Counter 1 358Write back buffer between one quarter and one half full. 359.It Li WBB_GT_HALF 360.Pq Event 55, Counter 0 361Write back buffer more than one half full. 362.It Li WBB_FULL_PIPELINE_STALLS 363.Pq Event 55 Counter 1 364Cycles where the pipeline is stalled because the Load Data Queue in the LSU is full. 365.It Li REQUEST_LATENCY 366.Pq Event 61, Counter 0 367Measures latency from miss detection until critical dword of response 368is returned, Only counts for cacheable reads. 369.It Li REQUEST_COUNT 370.Pq Event 61, Counter 1 371Counts number of cacheable read requests used for previous latency counter. 372.El 373.Ss Event Name Aliases 374The following table shows the mapping between the PMC-independent 375aliases supported by 376.Lb libpmc 377and the underlying hardware events used. 378.Bl -column "branch-mispredicts" "cpu_clk_unhalted.core_p" 379.It Em Alias Ta Em Event 380.It Li instructions Ta Li INSTR_EXECUTED 381.It Li branches Ta Li BRANCH_COMPLETED 382.It Li branch-mispredicts Ta Li BRANCH_MISPRED 383.El 384.Sh SEE ALSO 385.Xr pmc 3 , 386.Xr pmc.atom 3 , 387.Xr pmc.core 3 , 388.Xr pmc.iaf 3 , 389.Xr pmc.k7 3 , 390.Xr pmc.k8 3 , 391.Xr pmc.octeon 3 , 392.Xr pmc.p4 3 , 393.Xr pmc.p5 3 , 394.Xr pmc.p6 3 , 395.Xr pmc.soft 3 , 396.Xr pmc.tsc 3 , 397.Xr pmc_cpuinfo 3 , 398.Xr pmclog 3 , 399.Xr hwpmc 4 400.Sh HISTORY 401The 402.Nm pmc 403library first appeared in 404.Fx 6.0 . 405.Sh AUTHORS 406.An -nosplit 407The 408.Lb libpmc 409library was written by 410.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org . 411MIPS support was added by 412.An George Neville-Neil Aq Mt gnn@FreeBSD.org . 413