当前位置: 首页>后端>正文

常用perf命令例子

出自:https://www.brendangregg.com/perf.html

简介

perf是linux c不可缺少的性能分析工具

它是使用方法可以用一本几百页的书本来描述

对于使用者来说,如果有现成的常用命令,可以更便利的利用perf提供的功能

并在对这些常用命令的使用过程中,慢慢深入了解perf的机制和Linux的性能几何

基本命令介绍

  • perf

    perf命令支持以下的选项

    一般来说常用的:

    • list:列出可检测的事件
    • record:收集事件的数据
    • stat:打印进程的性能数据
    • report:报告性能的数据
    # perf
    
     usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
    
     The most commonly used perf commands are:
       annotate        Read perf.data (created by perf record) and display annotated code
       archive         Create archive with object files with build-ids found in perf.data file
       bench           General framework for benchmark suites
       buildid-cache   Manage build-id cache.
       buildid-list    List the buildids in a perf.data file
       config          Get and set variables in a configuration file.
       data            Data file related processing
       diff            Read perf.data files and display the differential profile
       evlist          List the event names in a perf.data file
       inject          Filter to augment the events stream with additional information
       kmem            Tool to trace/measure kernel memory properties
       kvm             Tool to trace/measure kvm guest os
       list            List all symbolic event types
       lock            Analyze lock events
       mem             Profile memory accesses
       record          Run a command and record its profile into perf.data
       report          Read perf.data (created by perf record) and display the profile
       sched           Tool to trace/measure scheduler properties (latencies)
       script          Read perf.data (created by perf record) and display trace output
       stat            Run a command and gather performance counter statistics
       test            Runs sanity tests.
       timechart       Tool to visualize total system behavior during a workload
       top             System profiling tool.
       probe           Define new dynamic tracepoints
       trace           strace inspired tool
    
     See 'perf help COMMAND' for more information on a specific command.
    
  • perf list

    列出所有可检测的事件

    root@ubuntu:~# perf list|wc
       2056    7433  135920
    

    perf可以支持2k中不同事件的检测

    # Listing all currently known events:
    # perf list
    List of pre-defined events (to be used in -e):
      cpu-cycles OR cycles                               [Hardware event]
      instructions                                       [Hardware event]
      cache-references                                   [Hardware event]
      cache-misses                                       [Hardware event]
      branch-instructions OR branches                    [Hardware event]
      branch-misses                                      [Hardware event]
      bus-cycles                                         [Hardware event]
      stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
      stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
      ref-cycles                                         [Hardware event]
      cpu-clock                                          [Software event]
      task-clock                                         [Software event]
      page-faults OR faults                              [Software event]
      context-switches OR cs                             [Software event]
      cpu-migrations OR migrations                       [Software event]
      minor-faults                                       [Software event]
      major-faults                                       [Software event]
      alignment-faults                                   [Software event]
      emulation-faults                                   [Software event]
      L1-dcache-loads                                    [Hardware cache event]
      L1-dcache-load-misses                              [Hardware cache event]
      L1-dcache-stores                                   [Hardware cache event]
    [...]
      rNNN                                               [Raw hardware event descriptor]
      cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
       (see 'man perf-list' on how to encode it)
      mem:<addr>[:access]                                [Hardware breakpoint]
      probe:tcp_sendmsg                                  [Tracepoint event]
    [...]
      sched:sched_process_exec                           [Tracepoint event]
      sched:sched_process_fork                           [Tracepoint event]
      sched:sched_process_wait                           [Tracepoint event]
      sched:sched_wait_task                              [Tracepoint event]
      sched:sched_process_exit                           [Tracepoint event]
    [...]
    

    从上面看,perf可以检测以下几个重点内容

    • cpu 对指令的执行的性能指标
    • CPU cache利用的性能指标
    • 缺页中断的情况
    • 上下文切换的情况
    • 进程的调度情况
  • perf stat

# perf stat gzip file1

 Performance counter stats for 'gzip file1':

       1920.159821 task-clock                #    0.991 CPUs utilized          
                13 context-switches          #    0.007 K/sec                  
                 0 CPU-migrations            #    0.000 K/sec                  
               258 page-faults               #    0.134 K/sec                  
     5,649,595,479 cycles                    #    2.942 GHz                     [83.43%]
     1,808,339,931 stalled-cycles-frontend   #   32.01% frontend cycles idle    [83.54%]
     1,171,884,577 stalled-cycles-backend    #   20.74% backend  cycles idle    [66.77%]
     8,625,207,199 instructions              #    1.53  insns per cycle        
                                             #    0.21  stalled cycles per insn [83.51%]
     1,488,797,176 branches                  #  775.351 M/sec                   [82.58%]
        53,395,139 branch-misses             #    3.59% of all branches         [83.78%]

       1.936842598 seconds time elapsed

通过perf stat 可以把进程或者系统的感兴趣事件统计下来,并进行下一步的分析

例如统计gzip命令的L1 cache的利用率

确认locality的好坏

# perf list | grep L1-dcache
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
# perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores gzip file1

 Performance counter stats for 'gzip file1':

     1,947,551,657 L1-dcache-loads
                                            
       153,829,652 L1-dcache-misses
         #    7.90% of all L1-dcache hits  
     1,171,475,286 L1-dcache-stores
                                           

       1.538038091 seconds time elapsed
  • perf record

    收集函数栈的数据

    并发现耗时长的函数

    # perf record -F 99 -a -g -- sleep 30
    [ perf record: Woken up 9 times to write data ]
    [ perf record: Captured and wrote 3.135 MB perf.data (~136971 samples) ]
    # ls -lh perf.data
    -rw------- 1 root root 3.2M Jan 26 07:26 perf.data
    

    通过report分析perf.data

    # perf report --stdio
    # ========
    # captured on: Mon Jan 26 07:26:40 2014
    # hostname : dev2
    # os release : 3.8.6-ubuntu-12-opt
    # perf version : 3.8.6
    # arch : x86_64
    # nrcpus online : 8
    # nrcpus avail : 8
    # cpudesc : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
    # cpuid : GenuineIntel,6,44,2
    # total memory : 8182008 kB
    # cmdline : /usr/bin/perf record -F 99 -a -g -- sleep 30 
    # event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0, config2 = ...
    # HEADER_CPU_TOPOLOGY info available, use -I to display
    # HEADER_NUMA_TOPOLOGY info available, use -I to display
    # pmu mappings: software = 1, breakpoint = 5
    # ========
    #
    # Samples: 22K of event 'cpu-clock'
    # Event count (approx.): 22751
    #
    # Overhead  Command      Shared Object                           Symbol
    # ........  .......  .................  ...............................
    #
        94.12%       dd  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
                     |
                     --- _raw_spin_unlock_irqrestore
                        |          
                        |--96.67%-- extract_buf
                        |          extract_entropy_user
                        |          urandom_read
                        |          vfs_read
                        |          sys_read
                        |          system_call_fastpath
                        |          read
                        |          
                        |--1.69%-- account
                        |          |          
                        |          |--99.72%-- extract_entropy_user
                        |          |          urandom_read
                        |          |          vfs_read
                        |          |          sys_read
                        |          |          system_call_fastpath
                        |          |          read
                        |           --0.28%-- [...]
                        |          
                        |--1.60%-- mix_pool_bytes.constprop.17
    [...]
    

高频命令列表

  • list 事件

    # Listing all currently known events:
    perf list
    
    # Listing sched tracepoints:
    perf list 'sched:*'
    
  • 统计事件

    # CPU counter statistics for the specified command:
    perf stat command
    
    # Detailed CPU counter statistics (includes extras) for the specified command:
    perf stat -d command
    
    # CPU counter statistics for the specified PID, until Ctrl-C:
    perf stat -p PID
    
    # CPU counter statistics for the entire system, for 5 seconds:
    perf stat -a sleep 5
    
    # Various basic CPU statistics, system wide, for 10 seconds:
    perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10
    
    # Various CPU level 1 data cache statistics for the specified command:
    perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores command
    
    # Various CPU data TLB statistics for the specified command:
    perf stat -e dTLB-loads,dTLB-load-misses,dTLB-prefetch-misses command
    
    # Various CPU last level cache statistics for the specified command:
    perf stat -e LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches command
    
    # Using raw PMC counters, eg, counting unhalted core cycles:
    perf stat -e r003c -a sleep 5 
    
    # PMCs: counting cycles and frontend stalls via raw specification:
    perf stat -e cycles -e cpu/event=0x0e,umask=0x01,inv,cmask=0x01/ -a sleep 5
    
    # Count syscalls per-second system-wide:
    perf stat -e raw_syscalls:sys_enter -I 1000 -a
    
    # Count system calls by type for the specified PID, until Ctrl-C:
    perf stat -e 'syscalls:sys_enter_*' -p PID
    
    # Count system calls by type for the entire system, for 5 seconds:
    perf stat -e 'syscalls:sys_enter_*' -a sleep 5
    
    # Count scheduler events for the specified PID, until Ctrl-C:
    perf stat -e 'sched:*' -p PID
    
    # Count scheduler events for the specified PID, for 10 seconds:
    perf stat -e 'sched:*' -p PID sleep 10
    
    # Count ext4 events for the entire system, for 10 seconds:
    perf stat -e 'ext4:*' -a sleep 10
    
    # Count block device I/O events for the entire system, for 10 seconds:
    perf stat -e 'block:*' -a sleep 10
    
    # Count all vmscan events, printing a report every second:
    perf stat -e 'vmscan:*' -a -I 1000
    
  • 性能分析

    # Sample on-CPU functions for the specified command, at 99 Hertz:
    perf record -F 99 command
    
    # Sample on-CPU functions for the specified PID, at 99 Hertz, until Ctrl-C:
    perf record -F 99 -p PID
    
    # Sample on-CPU functions for the specified PID, at 99 Hertz, for 10 seconds:
    perf record -F 99 -p PID sleep 10
    
    # Sample CPU stack traces (via frame pointers) for the specified PID, at 99 Hertz, for 10 seconds:
    perf record -F 99 -p PID -g -- sleep 10
    
    # Sample CPU stack traces for the PID, using dwarf (dbg info) to unwind stacks, at 99 Hertz, for 10 seconds:
    perf record -F 99 -p PID --call-graph dwarf sleep 10
    
    # Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds (< Linux 4.11):
    perf record -F 99 -ag -- sleep 10
    
    # Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds (>= Linux 4.11):
    perf record -F 99 -g -- sleep 10
    
    # If the previous command didn't work, try forcing perf to use the cpu-clock event:
    perf record -F 99 -e cpu-clock -ag -- sleep 10
    
    # Sample CPU stack traces for a container identified by its /sys/fs/cgroup/perf_event cgroup:
    perf record -F 99 -e cpu-clock --cgroup=docker/1d567f4393190204...etc... -a -- sleep 10
    
    # Sample CPU stack traces for the entire system, with dwarf stacks, at 99 Hertz, for 10 seconds:
    perf record -F 99 -a --call-graph dwarf sleep 10
    
    # Sample CPU stack traces for the entire system, using last branch record for stacks, ... (>= Linux 4.?):
    perf record -F 99 -a --call-graph lbr sleep 10
    
    # Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 seconds:
    perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5
    
    # Sample CPU stack traces, once every 100 last level cache misses, for 5 seconds:
    perf record -e LLC-load-misses -c 100 -ag -- sleep 5 
    
    # Sample on-CPU kernel instructions, for 5 seconds:
    perf record -e cycles:k -a -- sleep 5 
    
    # Sample on-CPU user instructions, for 5 seconds:
    perf record -e cycles:u -a -- sleep 5 
    
    # Sample on-CPU user instructions precisely (using PEBS), for 5 seconds:
    perf record -e cycles:up -a -- sleep 5 
    
    # Perform branch tracing (needs HW support), for 1 second:
    perf record -b -a sleep 1
    
    # Sample CPUs at 49 Hertz, and show top addresses and symbols, live (no perf.data file):
    perf top -F 49
    
    # Sample CPUs at 49 Hertz, and show top process names and segments, live:
    perf top -F 49 -ns comm,dso
    
  • 静态跟踪

    # Trace new processes, until Ctrl-C:
    perf record -e sched:sched_process_exec -a
    
    # Sample (take a subset of) context-switches, until Ctrl-C:
    perf record -e context-switches -a
    
    # Trace all context-switches, until Ctrl-C:
    perf record -e context-switches -c 1 -a
    
    # Include raw settings used (see: man perf_event_open):
    perf record -vv -e context-switches -a
    
    # Trace all context-switches via sched tracepoint, until Ctrl-C:
    perf record -e sched:sched_switch -a
    
    # Sample context-switches with stack traces, until Ctrl-C:
    perf record -e context-switches -ag
    
    # Sample context-switches with stack traces, for 10 seconds:
    perf record -e context-switches -ag -- sleep 10
    
    # Sample CS, stack traces, and with timestamps (< Linux 3.17, -T now default):
    perf record -e context-switches -ag -T
    
    # Sample CPU migrations, for 10 seconds:
    perf record -e migrations -a -- sleep 10
    
    # Trace all connect()s with stack traces (outbound connections), until Ctrl-C:
    perf record -e syscalls:sys_enter_connect -ag
    
    # Trace all accepts()s with stack traces (inbound connections), until Ctrl-C:
    perf record -e syscalls:sys_enter_accept* -ag
    
    # Trace all block device (disk I/O) requests with stack traces, until Ctrl-C:
    perf record -e block:block_rq_insert -ag
    
    # Sample at most 100 block device requests per second, until Ctrl-C:
    perf record -F 100 -e block:block_rq_insert -a
    
    # Trace all block device issues and completions (has timestamps), until Ctrl-C:
    perf record -e block:block_rq_issue -e block:block_rq_complete -a
    
    # Trace all block completions, of size at least 100 Kbytes, until Ctrl-C:
    perf record -e block:block_rq_complete --filter 'nr_sector > 200'
    
    # Trace all block completions, synchronous writes only, until Ctrl-C:
    perf record -e block:block_rq_complete --filter 'rwbs == "WS"'
    
    # Trace all block completions, all types of writes, until Ctrl-C:
    perf record -e block:block_rq_complete --filter 'rwbs ~ "*W*"'
    
    # Sample minor faults (RSS growth) with stack traces, until Ctrl-C:
    perf record -e minor-faults -ag
    
    # Trace all minor faults with stack traces, until Ctrl-C:
    perf record -e minor-faults -c 1 -ag
    
    # Sample page faults with stack traces, until Ctrl-C:
    perf record -e page-faults -ag
    
    # Trace all ext4 calls, and write to a non-ext4 location, until Ctrl-C:
    perf record -e 'ext4:*' -o /tmp/perf.data -a 
    
    # Trace kswapd wakeup events, until Ctrl-C:
    perf record -e vmscan:mm_vmscan_wakeup_kswapd -ag
    
    # Add Node.js USDT probes (Linux 4.10+):
    perf buildid-cache --add `which node`
    
    # Trace the node http__server__request USDT event (Linux 4.10+):
    perf record -e sdt_node:http__server__request -a
    
  • 动态跟踪

    # Add a tracepoint for the kernel tcp_sendmsg() function entry ("--add" is optional):
    perf probe --add tcp_sendmsg
    
    # Remove the tcp_sendmsg() tracepoint (or use "--del"):
    perf probe -d tcp_sendmsg
    
    # Add a tracepoint for the kernel tcp_sendmsg() function return:
    perf probe 'tcp_sendmsg%return'
    
    # Show available variables for the kernel tcp_sendmsg() function (needs debuginfo):
    perf probe -V tcp_sendmsg
    
    # Show available variables for the kernel tcp_sendmsg() function, plus external vars (needs debuginfo):
    perf probe -V tcp_sendmsg --externs
    
    # Show available line probes for tcp_sendmsg() (needs debuginfo):
    perf probe -L tcp_sendmsg
    
    # Show available variables for tcp_sendmsg() at line number 81 (needs debuginfo):
    perf probe -V tcp_sendmsg:81
    
    # Add a tracepoint for tcp_sendmsg(), with three entry argument registers (platform specific):
    perf probe 'tcp_sendmsg %ax %dx %cx'
    
    # Add a tracepoint for tcp_sendmsg(), with an alias ("bytes") for the %cx register (platform specific):
    perf probe 'tcp_sendmsg bytes=%cx'
    
    # Trace previously created probe when the bytes (alias) variable is greater than 100:
    perf record -e probe:tcp_sendmsg --filter 'bytes > 100'
    
    # Add a tracepoint for tcp_sendmsg() return, and capture the return value:
    perf probe 'tcp_sendmsg%return $retval'
    
    # Add a tracepoint for tcp_sendmsg(), and "size" entry argument (reliable, but needs debuginfo):
    perf probe 'tcp_sendmsg size'
    
    # Add a tracepoint for tcp_sendmsg(), with size and socket state (needs debuginfo):
    perf probe 'tcp_sendmsg size sk->__sk_common.skc_state'
    
    # Tell me how on Earth you would do this, but don't actually do it (needs debuginfo):
    perf probe -nv 'tcp_sendmsg size sk->__sk_common.skc_state'
    
    # Trace previous probe when size is non-zero, and state is not TCP_ESTABLISHED(1) (needs debuginfo):
    perf record -e probe:tcp_sendmsg --filter 'size > 0 && skc_state != 1' -a
    
    # Add a tracepoint for tcp_sendmsg() line 81 with local variable seglen (needs debuginfo):
    perf probe 'tcp_sendmsg:81 seglen'
    
    # Add a tracepoint for do_sys_open() with the filename as a string (needs debuginfo):
    perf probe 'do_sys_open filename:string'
    
    # Add a tracepoint for myfunc() return, and include the retval as a string:
    perf probe 'myfunc%return +0($retval):string'
    
    # Add a tracepoint for the user-level malloc() function from libc:
    perf probe -x /lib64/libc.so.6 malloc
    
    # Add a tracepoint for this user-level static probe (USDT, aka SDT event):
    perf probe -x /usr/lib64/libpthread-2.24.so %sdt_libpthread:mutex_entry
    
    # List currently available dynamic probes:
    perf probe -l
    
  • 报告

    # Show perf.data in an ncurses browser (TUI) if possible:
    perf report
    
    # Show perf.data with a column for sample count:
    perf report -n
    
    # Show perf.data as a text report, with data coalesced and percentages:
    perf report --stdio
    
    # Report, with stacks in folded format: one line per stack (needs 4.4):
    perf report --stdio -n -g folded
    
    # List all events from perf.data:
    perf script
    
    # List all perf.data events, with data header (newer kernels; was previously default):
    perf script --header
    
    # List all perf.data events, with customized fields (< Linux 4.1):
    perf script -f time,event,trace
    
    # List all perf.data events, with customized fields (>= Linux 4.1):
    perf script -F time,event,trace
    
    # List all perf.data events, with my recommended fields (needs record -a; newer kernels):
    perf script --header -F comm,pid,tid,cpu,time,event,ip,sym,dso 
    
    # List all perf.data events, with my recommended fields (needs record -a; older kernels):
    perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso
    
    # Dump raw contents from perf.data as hex (for debugging):
    perf script -D
    
    # Disassemble and annotate instructions with percentages (needs some debuginfo):
    perf annotate --stdio
    

https://www.xamrdz.com/backend/35w1934368.html

相关文章: