# # Unit masks for the Intel "Haswell" micro architecture # # See http://ark.intel.com/ for help in identifying Haswell based CPUs # include:i386/arch_perfmon name:x02 type:mandatory default:0x2 0x2 No unit mask name:x03 type:mandatory default:0x3 0x3 No unit mask name:x10 type:mandatory default:0x10 0x10 No unit mask name:x1f type:mandatory default:0x1f 0x1f No unit mask name:x20 type:mandatory default:0x20 0x20 No unit mask name:x50 type:mandatory default:0x50 0x50 No unit mask name:ld_blocks type:exclusive default:0x2 0x2 extra: store_forward This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued. 0x8 extra: no_sr The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use name:misalign_mem_ref type:exclusive default:0x1 0x1 extra: loads Speculative cache line split load uops dispatched to L1 cache 0x2 extra: stores Speculative cache line split STA uops dispatched to L1 cache name:dtlb_load_misses type:exclusive default:0x1 0x1 extra: miss_causes_a_walk Load misses in all DTLB levels that cause page walks 0x2 extra: walk_completed_4k Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (4K). 0x4 extra: walk_completed_2m_4m Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (2M/4M). 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses. 0x20 extra: stlb_hit_4k This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks. 0x40 extra: stlb_hit_2m This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks. 0x80 extra: pde_cache_miss DTLB demand load misses with low part of linear-to-physical address translation missed 0xe extra: walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. 0x60 extra: stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks name:uops_issued type:exclusive default:any 0x1 extra: any This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops. 0x10 extra: flags_merge Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch. 0x20 extra: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. 0x40 extra: single_mul Number of Multiply packed/scalar single precision uops allocated 0x1 extra:cmask=1,inv stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread 0x1 extra:cmask=1,inv,any core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads name:l2_rqsts type:exclusive default:0x21 0x21 extra: demand_data_rd_miss Demand Data Read miss L2, no rejects 0x41 extra: demand_data_rd_hit Demand Data Read requests that hit L2 cache 0x30 extra: l2_pf_miss L2 prefetch requests that miss L2 cache 0x50 extra: l2_pf_hit L2 prefetch requests that hit L2 cache 0xe1 extra: all_demand_data_rd Demand Data Read requests 0xe2 extra: all_rfo RFO requests to L2 cache 0xe4 extra: all_code_rd L2 code requests 0xf8 extra: all_pf Requests from L2 hardware prefetchers 0x42 extra: rfo_hit RFO requests that hit L2 cache 0x22 extra: rfo_miss RFO requests that miss L2 cache 0x44 extra: code_rd_hit L2 cache hits when fetching instructions, code reads. 0x24 extra: code_rd_miss L2 cache misses when fetching instructions 0x27 extra: all_demand_miss Demand requests that miss L2 cache 0xe7 extra: all_demand_references Demand requests to L2 cache 0x3f extra: miss All requests that miss L2 cache 0xff extra: references All L2 requests name:l1d_pend_miss type:exclusive default:pending 0x1 extra: pending L1D miss oustandings duration in cycles 0x1 extra:cmask=1 pending_cycles Cycles with L1D load Misses outstanding. name:dtlb_store_misses type:exclusive default:0x1 0x1 extra: miss_causes_a_walk Store misses in all DTLB levels that cause page walks 0x2 extra: walk_completed_4k Store miss in all TLB levels causes a page walk that completes. (4K) 0x4 extra: walk_completed_2m_4m Store misses in all DTLB levels that cause completed page walks (2M/4M) 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses. 0x20 extra: stlb_hit_4k This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks. 0x40 extra: stlb_hit_2m This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks. 0x80 extra: pde_cache_miss DTLB store misses with low part of linear-to-physical address translation missed 0xe extra: walk_completed Store misses in all DTLB levels that cause completed page walks 0x60 extra: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks name:load_hit_pre type:exclusive default:0x1 0x1 extra: sw_pf Not software-prefetch load dispatches that hit FB allocated for software prefetch 0x2 extra: hw_pf Not software-prefetch load dispatches that hit FB allocated for hardware prefetch name:tx_mem type:exclusive default:0x1 0x1 extra: abort_conflict Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address 0x2 extra: abort_capacity_write Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes. 0x4 extra: abort_hle_store_to_elided_lock Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer 0x8 extra: abort_hle_elision_buffer_not_empty Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero. 0x10 extra: abort_hle_elision_buffer_mismatch Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer 0x20 extra: abort_hle_elision_buffer_unsupported_alignment Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer. 0x40 extra: hle_elision_buffer_full Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero. name:move_elimination type:exclusive default:0x1 0x1 extra: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. 0x2 extra: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. 0x4 extra: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated. 0x8 extra: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. name:cpl_cycles type:exclusive default:ring0 0x1 extra: ring0 Unhalted core cycles when the thread is in ring 0 0x2 extra: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 0x1 extra:cmask=1,edge ring0_trans Number of intervals between processor halts while thread is in ring 0 name:tx_exec type:exclusive default:0x1 0x1 extra: misc1 Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort. 0x2 extra: misc2 Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region 0x4 extra: misc3 Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded 0x8 extra: misc4 Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region. 0x10 extra: misc5 Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region name:rs_events type:exclusive default:empty_cycles 0x1 extra: empty_cycles This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end. 0x1 extra:cmask=1,inv,edge empty_end Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues. name:offcore_requests_outstanding type:exclusive default:demand_data_rd 0x1 extra: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue. 0x2 extra: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle 0x4 extra: demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore 0x8 extra: all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore 0x1 extra:cmask=1 cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore 0x8 extra:cmask=1 cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore name:lock_cycles type:exclusive default:0x1 0x1 extra: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock 0x2 extra: cache_lock_duration Cycles when L1D is locked name:idq type:exclusive default:0x2 0x2 extra: empty Instruction Decode Queue (IDQ) empty cycles 0x4 extra: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x8 extra: dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path 0x10 extra: ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x20 extra: ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x30 extra: ms_uops This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance. 0x30 extra:cmask=1 ms_cycles This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance. 0x4 extra:cmask=1 mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path 0x8 extra:cmask=1 dsb_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path 0x10 extra:cmask=1 ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x10 extra:cmask=1,edge ms_dsb_occur Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy 0x18 extra:cmask=4 all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops 0x18 extra:cmask=1 all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering any Uop 0x24 extra:cmask=4 all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops 0x24 extra:cmask=1 all_mite_cycles_any_uops Cycles MITE is delivering any Uop 0x3c extra: mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x30 extra:cmask=1,edge ms_switches Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer name:icache type:exclusive default:0x2 0x2 extra: misses This event counts Instruction Cache (ICACHE) misses. 0x4 extra: ifetch_stall Cycles where a code-fetch stalled due to L1 instruction-cache miss or an iTLB miss name:itlb_misses type:exclusive default:0x1 0x1 extra: miss_causes_a_walk Misses at all ITLB levels that cause page walks 0x2 extra: walk_completed_4k Code miss in all TLB levels causes a page walk that completes. (4K) 0x4 extra: walk_completed_2m_4m Code miss in all TLB levels causes a page walk that completes. (2M/4M) 0x10 extra: walk_duration This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses. 0x20 extra: stlb_hit_4k Core misses that miss the DTLB and hit the STLB (4K) 0x40 extra: stlb_hit_2m Code misses that miss the DTLB and hit the STLB (2M) 0xe extra: walk_completed Misses in all ITLB levels that cause completed page walks 0x60 extra: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks name:ild_stall type:exclusive default:0x1 0x1 extra: lcp This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP). 0x4 extra: iq_full Stall cycles because IQ is full name:br_inst_exec type:exclusive default:0xff 0xff extra: all_branches Speculative and retired branches 0x41 extra: nontaken_conditional Not taken macro-conditional branches 0x81 extra: taken_conditional Taken speculative and retired macro-conditional branches 0x82 extra: taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns 0x88 extra: taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic 0x90 extra: taken_direct_near_call Taken speculative and retired direct near calls 0xa0 extra: taken_indirect_near_call Taken speculative and retired indirect calls 0xc1 extra: all_conditional Speculative and retired macro-conditional branches 0xc2 extra: all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects 0xc4 extra: all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns 0xc8 extra: all_indirect_near_return Speculative and retired indirect return branches. 0xd0 extra: all_direct_near_call Speculative and retired direct near calls name:br_misp_exec type:exclusive default:0xff 0xff extra: all_branches Speculative and retired mispredicted macro conditional branches 0x41 extra: nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches 0x81 extra: taken_conditional Taken speculative and retired mispredicted macro conditional branches 0x84 extra: taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns 0x88 extra: taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic 0xc1 extra: all_conditional Speculative and retired mispredicted macro conditional branches 0xc4 extra: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns 0xa0 extra: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls name:idq_uops_not_delivered type:exclusive default:core 0x1 extra: core This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis. 0x1 extra:cmask=4 cycles_0_uops_deliv_core This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis. 0x1 extra:cmask=3 cycles_le_1_uop_deliv_core Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x1 extra:cmask=2 cycles_le_2_uop_deliv_core Cycles with less than 2 uops delivered by the front end. 0x1 extra:cmask=1 cycles_le_3_uop_deliv_core Cycles with less than 3 uops delivered by the front end. 0x1 extra:cmask=1,inv cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. name:uops_executed_port type:exclusive default:port_0 0x1 extra: port_0 Cycles per thread when uops are executed in port 0 0x2 extra: port_1 Cycles per thread when uops are executed in port 1 0x4 extra: port_2 Cycles per thread when uops are executed in port 2 0x8 extra: port_3 Cycles per thread when uops are executed in port 3 0x10 extra: port_4 Cycles per thread when uops are executed in port 4 0x20 extra: port_5 Cycles per thread when uops are executed in port 5 0x40 extra: port_6 Cycles per thread when uops are executed in port 6 0x80 extra: port_7 Cycles per thread when uops are executed in port 7 0x1 extra:any port_0_core Cycles per core when uops are exectuted in port 0 0x2 extra:any port_1_core Cycles per core when uops are exectuted in port 1 0x4 extra:any port_2_core Cycles per core when uops are dispatched to port 2 0x8 extra:any port_3_core Cycles per core when uops are dispatched to port 3 0x10 extra:any port_4_core Cycles per core when uops are exectuted in port 4 0x20 extra:any port_5_core Cycles per core when uops are exectuted in port 5 0x40 extra:any port_6_core Cycles per core when uops are exectuted in port 6 0x80 extra:any port_7_core Cycles per core when uops are dispatched to port 7 name:resource_stalls type:exclusive default:0x1 0x1 extra: any Resource-related stall cycles 0x4 extra: rs Cycles stalled due to no eligible RS entry available. 0x8 extra: sb This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available. 0x10 extra: rob Cycles stalled due to re-order buffer full. name:cycle_activity type:exclusive default:0x1 0x1 extra:cmask=1 cycles_l2_pending Cycles with pending L2 cache miss loads. 0x8 extra:cmask=8 cycles_l1d_pending Cycles with pending L1 cache miss loads. 0x2 extra:cmask=2 cycles_ldm_pending Cycles with pending memory loads. 0x4 extra:cmask=4 cycles_no_execute This event counts cycles during which no instructions were executed in the execution stage of the pipeline. 0x5 extra:cmask=5 stalls_l2_pending Execution stalls due to L2 cache misses. 0x6 extra:cmask=6 stalls_ldm_pending This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data). 0xc extra:cmask=c stalls_l1d_pending Execution stalls due to L1 data cache misses name:offcore_requests type:exclusive default:0x1 0x1 extra: demand_data_rd Demand Data Read requests sent to uncore 0x2 extra: demand_code_rd Cacheable and noncachaeble code read requests 0x4 extra: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM 0x8 extra: all_data_rd Demand and prefetch data reads name:uops_executed type:exclusive default:0x2 0x2 extra: core Number of uops executed on the core. Errata: HSM31 0x1 extra:cmask=1,inv stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. 0x1 extra:cmask=1 cycles_ge_1_uops_exec This events counts the cycles where at least one uop was executed. It is counted per thread. Errata: HSM31 0x1 extra:cmask=2 cycles_ge_2_uops_exec This events counts the cycles where at least two uop were executed. It is counted per thread. Errata: HSM31 0x1 extra:cmask=3 cycles_ge_3_uops_exec This events counts the cycles where at least three uop were executed. It is counted per thread. Errata: HSM31 0x1 extra:cmask=4 cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread Errata: HSM31 name:page_walker_loads type:exclusive default:0x11 0x11 extra: dtlb_l1 Number of DTLB page walker hits in the L1+FB 0x21 extra: itlb_l1 Number of ITLB page walker hits in the L1+FB 0x41 extra: ept_dtlb_l1 Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB. 0x81 extra: ept_itlb_l1 Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB. 0x12 extra: dtlb_l2 Number of DTLB page walker hits in the L2 0x22 extra: itlb_l2 Number of ITLB page walker hits in the L2 0x42 extra: ept_dtlb_l2 Counts the number of Extended Page Table walks from the DTLB that hit in the L2. 0x82 extra: ept_itlb_l2 Counts the number of Extended Page Table walks from the ITLB that hit in the L2. 0x14 extra: dtlb_l3 Number of DTLB page walker hits in the L3 + XSNP 0x24 extra: itlb_l3 Number of ITLB page walker hits in the L3 + XSNP 0x44 extra: ept_dtlb_l3 Counts the number of Extended Page Table walks from the DTLB that hit in the L3. 0x84 extra: ept_itlb_l3 Counts the number of Extended Page Table walks from the ITLB that hit in the L2. 0x18 extra: dtlb_memory Number of DTLB page walker hits in Memory 0x48 extra: ept_dtlb_memory Counts the number of Extended Page Table walks from the DTLB that hit in memory. 0x88 extra: ept_itlb_memory Counts the number of Extended Page Table walks from the ITLB that hit in memory. name:tlb_flush type:exclusive default:0x1 0x1 extra: dtlb_thread DTLB flush attempts of the thread-specific entries 0x20 extra: stlb_any STLB flush attempts name:other_assists type:exclusive default:0x8 0x8 extra: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. Errata: HSM57 0x10 extra: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. Errata: HSM57 0x40 extra: any_wb_assist Number of times any microcode assist is invoked by HW upon uop writeback. name:uops_retired type:exclusive default:all 0x1 extra: all Actually retired uops. 0x1 extra: all_pebs Actually retired uops. 0x2 extra: retire_slots This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle. 0x2 extra: retire_slots_pebs This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle. 0x1 extra:cmask=1,inv stall_cycles Cycles without actually retired uops. 0x1 extra:cmask=a,inv total_cycles Cycles with less than 10 actually retired uops. 0x1 extra:cmask=1,inv core_stall_cycles Cycles without actually retired uops. name:machine_clears type:exclusive default:cycles 0x1 extra: cycles Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes. 0x2 extra: memory_ordering This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently. 0x4 extra: smc This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently. 0x20 extra: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. 0x1 extra:cmask=1,edge count Number of machine clears (nukes) of any type. name:br_inst_retired type:exclusive default:conditional 0x1 extra: conditional Conditional branch instructions retired. 0x1 extra: conditional_pebs Conditional branch instructions retired. 0x2 extra: near_call Direct and indirect near call instructions retired. 0x2 extra: near_call_pebs Direct and indirect near call instructions retired. 0x8 extra: near_return Return instructions retired. 0x8 extra: near_return_pebs Return instructions retired. 0x10 extra: not_taken Not taken branch instructions retired. 0x20 extra: near_taken Taken branch instructions retired. 0x20 extra: near_taken_pebs Taken branch instructions retired. 0x40 extra: far_branch Far branch instructions retired. 0x4 extra:pebs all_branches_pebs All (macro) branch instructions retired. name:br_misp_retired type:exclusive default:conditional 0x1 extra: conditional Mispredicted conditional branch instructions retired. 0x1 extra: conditional_pebs Mispredicted conditional branch instructions retired. 0x4 extra:pebs all_branches_pebs This event counts all mispredicted branch instructions retired. This is a precise event. 0x20 extra: near_taken number of near branch instructions retired that were mispredicted and taken. 0x20 extra: near_taken_pebs number of near branch instructions retired that were mispredicted and taken. name:hle_retired type:exclusive default:0x1 0x1 extra: start Number of times an HLE execution started. 0x2 extra: commit Number of times an HLE execution successfully committed 0x4 extra: aborted Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x4 extra: aborted_pebs Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x8 extra: aborted_misc1 Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts). 0x10 extra: aborted_misc2 Number of times an HLE execution aborted due to uncommon conditions 0x20 extra: aborted_misc3 Number of times an HLE execution aborted due to HLE-unfriendly instructions 0x40 extra: aborted_misc4 Number of times an HLE execution aborted due to incompatible memory type 0x80 extra: aborted_misc5 Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts) name:rtm_retired type:exclusive default:0x1 0x1 extra: start Number of times an RTM execution started. 0x2 extra: commit Number of times an RTM execution successfully committed 0x4 extra: aborted Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x4 extra: aborted_pebs Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x8 extra: aborted_misc1 Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts) 0x10 extra: aborted_misc2 Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts). 0x20 extra: aborted_misc3 Number of times an RTM execution aborted due to HLE-unfriendly instructions 0x40 extra: aborted_misc4 Number of times an RTM execution aborted due to incompatible memory type 0x80 extra: aborted_misc5 Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt) name:fp_assist type:exclusive default:0x1e 0x1e extra:cmask=1 any Cycles with any input/output SSE or FP assist 0x2 extra: x87_output Number of X87 assists due to output value. 0x4 extra: x87_input Number of X87 assists due to input value. 0x8 extra: simd_output Number of SIMD FP assists due to Output values 0x10 extra: simd_input Number of SIMD FP assists due to input values name:mem_uops_retired type:exclusive default:stlb_miss_loads 0x11 extra: stlb_miss_loads Load uops with true STLB miss retired to architected path. Errata: HSM30 0x11 extra: stlb_miss_loads_pebs Load uops with true STLB miss retired to architected path. Errata: HSM30 0x12 extra: stlb_miss_stores Store uops with true STLB miss retired to architected path. Errata: HSM30 0x12 extra: stlb_miss_stores_pebs Store uops with true STLB miss retired to architected path. Errata: HSM30 0x21 extra: lock_loads Load uops with locked access retired to architected path. Errata: HSM30 0x21 extra: lock_loads_pebs Load uops with locked access retired to architected path. Errata: HSM30 0x41 extra: split_loads Line-splitted load uops retired to architected path. Errata: HSM30 0x41 extra: split_loads_pebs Line-splitted load uops retired to architected path. Errata: HSM30 0x42 extra: split_stores Line-splitted store uops retired to architected path. Errata: HSM30 0x42 extra: split_stores_pebs Line-splitted store uops retired to architected path. Errata: HSM30 0x81 extra: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x81 extra: all_loads_pebs Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x82 extra: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x82 extra: all_stores_pebs Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 name:mem_load_uops_retired type:exclusive default:l1_hit 0x1 extra: l1_hit Retired load uops with L1 cache hits as data sources. Errata: HSM30 0x1 extra: l1_hit_pebs Retired load uops with L1 cache hits as data sources. Errata: HSM30 0x2 extra: l2_hit Retired load uops with L2 cache hits as data sources. Errata: HSM30 0x2 extra: l2_hit_pebs Retired load uops with L2 cache hits as data sources. Errata: HSM30 0x4 extra: l3_hit Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30 0x4 extra: l3_hit_pebs Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30 0x8 extra: l1_miss Retired load uops misses in L1 cache as data sources. Errata: HSM30 0x8 extra: l1_miss_pebs Retired load uops misses in L1 cache as data sources. Errata: HSM30 0x10 extra: l2_miss Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30 0x10 extra: l2_miss_pebs Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30 0x20 extra: l3_miss Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30 0x20 extra: l3_miss_pebs Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30 0x40 extra: hit_lfb Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30 0x40 extra: hit_lfb_pebs Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30 name:mem_load_uops_l3_hit_retired type:exclusive default:xsnp_miss 0x1 extra: xsnp_miss Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30 0x1 extra: xsnp_miss_pebs Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30 0x2 extra: xsnp_hit Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30 0x2 extra: xsnp_hit_pebs Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30 0x4 extra: xsnp_hitm Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30 0x4 extra: xsnp_hitm_pebs Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30 0x8 extra: xsnp_none Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30 0x8 extra: xsnp_none_pebs Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30 name:mem_load_uops_l3_miss_retired type:exclusive default:local_dram 0x1 extra: local_dram This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30 0x1 extra: local_dram_pebs This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30 name:l2_trans type:exclusive default:0x80 0x80 extra: all_requests Transactions accessing L2 pipe 0x1 extra: demand_data_rd Demand Data Read requests that access L2 cache 0x2 extra: rfo RFO requests that access L2 cache 0x4 extra: code_rd L2 cache accesses when fetching instructions 0x8 extra: all_pf L2 or L3 HW prefetches that access L2 cache 0x10 extra: l1d_wb L1D writebacks that access L2 cache 0x20 extra: l2_fill L2 fill requests that access L2 cache 0x40 extra: l2_wb L2 writebacks that access L2 cache name:l2_lines_in type:exclusive default:0x7 0x7 extra: all This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss. 0x1 extra: i L2 cache lines in I state filling L2 0x2 extra: s L2 cache lines in S state filling L2 0x4 extra: e L2 cache lines in E state filling L2 name:l2_lines_out type:exclusive default:0x5 0x5 extra: demand_clean Clean L2 cache lines evicted by demand 0x6 extra: demand_dirty Dirty L2 cache lines evicted by demand