| Akaros Profiling |
| =========================== |
| |
| Contents: |
| |
| (*) Perf |
| - Setup |
| - Example |
| - More Complicated Examples |
| - Differences From Linux |
| |
| (*) mpstat |
| |
| |
| =========================== |
| PERF |
| =========================== |
| Akaros has limited support for perf_events. perf is a tool which utilizes CPU |
| performance counters for performance monitoring and troubleshooting. |
| |
| Akaros has its own version of perf, similar in spirit to Linux's perf, that |
| produces PERFILE2 ABI compliant perf.data files (if not, file a bug!). The |
| kernel generates traces, under the direction of perf. You then copy the traces |
| to a Linux host and process using Linux's perf. |
| |
| |
| SETUP |
| -------------------- |
| To build Akaros's perf directly: |
| |
| (linux)$ cd tools/dev-libs/elfutils ; make install; cd - |
| (linux)$ cd tools/dev-util/perf ; make install; cd - |
| |
| Or to build it along with all apps: |
| |
| (linux)$ make apps-install |
| |
| You will also need suitable recent Linux perf for the reporting of the data |
| (something that understands PERFILE2 format). Unpatched Linux 4.5 perf did the |
| trick. You'll also want libelf and maybe other libraries on your Linux |
| machine. |
| |
| First, install libelf according to your distro. On ubuntu: |
| (linux) $ sudo apt-get install libelf-dev |
| |
| Then try to just install perf using your Linux distro, and install any needed |
| dependencies. On ubuntu, you can install linux-tools-common and whatever else |
| it asks for (something particular to your host kernel). |
| |
| Linux perf changes a lot. Newer versions are usually nicer. I recommend |
| building one of them: Download Linux source, then |
| |
| (linux) $ cd tools/perf/ |
| (linux) $ make |
| |
| Then use your new perf binary. This all is just installing a recent perf - it |
| has little to do with Akaros at this point. If you run into incompatibilities |
| between our perf.data format and the latest Linux, file a bug. |
| |
| |
| BASIC EXAMPLE |
| -------------------- |
| Perf on Akaros supports record, stat, and a few custom options. |
| |
| You should be able to do the following: |
| |
| / $ perf record ls |
| |
| Then scp perf.data to Linux |
| |
| (linux) $ scp AKAROS_MACHINE:perf.data . |
| (linux) $ perf report --kallsyms=obj/kern/ksyms.map --symfs=kern/kfs/ |
| |
| Perf will look on your host machine for the kernel symbol table and for |
| binaries. We need to tell it kallsyms and symfs to override those settings. |
| |
| It can be a hassle to type out the kallsyms and symfs, so we have a script that |
| will automate that. Use scripts/perf in any place that you'd normally use |
| perf. Set your $AKAROS_ROOT (default is ".") and optionally override $PERF_CMD |
| ("default is "perf"). For most people, this will just be: |
| |
| (linux) $ ./scripts/perf report |
| |
| The perf.data file is implied, so the above command is equivalent to: |
| |
| (linux) $ ./scripts/perf report -i perf.data |
| |
| |
| MORE COMPLICATED EXAMPLES |
| -------------------- |
| First, try perf --help for usage. Then check out |
| https://perf.wiki.kernel.org/index.php/Tutorial. We strive to be mostly |
| compatible with the usage of Linux perf. |
| |
| perf stat runs a command and reports the count of events during the run of the |
| command. perf record runs a command and outputs perf.data, which contains |
| backtrace samples from when the event counters overflowed. For those familiar |
| with other perfmon systems, perf stat is like PAPI and perf record is like |
| Oprofile. |
| |
| perf record and stat both track a set of events with the -e flag. -e takes a |
| comma-separated list of events. Events can be expressed in one of three forms: |
| |
| - Generic events (called "pre-defined" events on Linux) |
| - Libpfm events |
| - Raw events |
| |
| Linux's perf only takes Generic and Raw events, so the libpfm4 is an added |
| bonus. |
| |
| Generic events consist of strings like "cycles" or "cache-misses". Raw events |
| aresimple strings of the form "rXXX", where the X's are hex nibbles. The hex |
| codes are passed directly to the PMU. You can actually have 2-4 Xs on Akaros. |
| |
| Libpfm events are strings that correspond to events specific to your machine. |
| Libpfm knows about PMU events for a given machine. It figures out what machine |
| perf is running on and selects events that should be available. Check out |
| http://perfmon2.sourceforge.net/ for more info. |
| |
| To see the list of events available, use `perf list [regex]`, supplying an |
| optional search regex. For example, on a Haswell: |
| |
| / $ perf list unhalted_reference_cycles |
| #----------------------------- |
| IDX : 37748738 |
| PMU name : ix86arch (Intel X86 architectural PMU) |
| Name : UNHALTED_REFERENCE_CYCLES |
| Equiv : None |
| Flags : None |
| Desc : count reference clock cycles while the clock signal on the specific core is running. The reference clock operates at a fixed frequency, irrespective of c |
| ore frequency changes due to performance state transitions |
| Code : 0x13c |
| Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) |
| Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) |
| Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean) |
| Modif-03 : 0x03 : PMU : [i] : invert (boolean) |
| Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer) |
| Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean) |
| #----------------------------- |
| IDX : 322961409 |
| PMU name : hsw_ep (Intel Haswell EP) |
| Name : UNHALTED_REFERENCE_CYCLES |
| Equiv : None |
| Flags : None |
| Desc : Unhalted reference cycles |
| Code : 0x300 |
| Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) |
| Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) |
| Modif-02 : 0x05 : PMU : [t] : measure any thread (boolean) |
| |
| There are two different events for UNHALTED_REFERENCE_CYCLES (case |
| insensitive). libpfm will select the most appropriate one. You can override |
| this selection by specifying a PMU: |
| |
| / $ perf stat -e ix86arch::UNHALTED_REFERENCE_CYCLES ls |
| |
| Here's how to specify multiple events: |
| |
| / $ perf record -e cycles,instructions ls |
| |
| Events also take a set of modifiers. For instance, you can specify running |
| counters only in kernel mode or user mode. Modifiers are separated by a ':'. |
| |
| This will track only user cycles (default is user and kernel): |
| |
| / $ perf record -e cycles:u ls |
| |
| To use a raw event, you need to know the event number. You can either look in |
| your favorite copy of the SDM, or you can ask libpfm. Though if you ask |
| libpfm, you might as well just use its string processing. For example: |
| |
| / $ perf list FLUSH |
| #----------------------------- |
| IDX : 322961462 |
| PMU name : hsw_ep (Intel Haswell EP) |
| Name : TLB_FLUSH |
| Equiv : None |
| Flags : None |
| Desc : TLB flushes |
| Code : 0xbd |
| Umask-00 : 0x01 : PMU : [DTLB_THREAD] : None : Count number of DTLB flushes of thread-specific entries |
| Umask-01 : 0x20 : PMU : [STLB_ANY] : None : Count number of any STLB flushes |
| Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) |
| Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) |
| Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean) |
| Modif-03 : 0x03 : PMU : [i] : invert (boolean) |
| Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer) |
| Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean) |
| Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean) |
| Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean) |
| |
| The raw code is 0xbd. So the following are equivalent (but slightly buggy!): |
| |
| / $ perf stat -e TLB_FLUSH ls |
| / $ perf stat -e rbd ls |
| |
| If you actually run those, rbd will have zero hits, and TLB_FLUSH will give you |
| the error "Failed to parse event string TLB_FLUSH". |
| |
| Some events actually rather particular to their Umasks, and TLB_FLUSH is one of |
| them. TLB_FLUSH wants a Umask. Umasks are selectors for specific sub-types of |
| events. In the case of TLB_FLUSH, we can choose between DTLB_THREAD and |
| STLB_ANY. Umasks are not always required - they just happen to be on my |
| Haswell for TLB_FLUSH. That being said, we can ask for the event like so: |
| |
| / $ perf stat -e TLB_FLUSH:STLB_ANY ls |
| / $ perf stat -e r20bd ls |
| |
| Note that the Umask is placed before the Code. These 16 bits are passed |
| directly to the PMU, and on Intel the format is "umask:event". |
| |
| perf record is based on recording samples when event counters overflow. The |
| number of events required to trigger a sample is referred to as the |
| sample_period. You can set it with -c, e.g. |
| |
| / $ perf record -c 10000 ls |
| |
| |
| DIFFERENCES FROM LINUX |
| -------------------- |
| For the most part, Akaros perf is similar to Linux. A few things are |
| different. |
| |
| The biggest difference is that our perf does not follow processes around. We |
| count events for cores, not processes. You can specify certain cores, but not |
| certain processes. Any options related to tracking specific processes are |
| unsupported. |
| |
| The -F option (frequency) is loosely supported. The kernel cannot adjust the |
| sampling count dynamically to meet a certain frequencey. Instead, we guess |
| that -F is used with cycles, and pick a sample period that will generate |
| samples at the desired frequency if the core is unhalted. YMMV. |
| |
| Akaros currently supports only PMU events. In the future, we may add events |
| like context-switches. |
| |
| |
| =========================== |
| mpstat |
| =========================== |
| Akaros has basic support for mpstat. mpstat gives a high-level glance at where |
| each core is spending its time. |
| |
| For starters, bind kprof somewhere. The basic ifconfig script binds it to |
| /prof. |
| |
| To see the CPU usage, cat mpstat: |
| |
| / $ cat /prof/mpstat |
| CPU: irq kern user idle |
| 0: 1.707136 ( 0%), 24.978659 ( 0%), 0.162845 ( 0%), 13856.233909 ( 99%) |
| |
| To reset the count: |
| |
| / $ echo reset > /prof/mpstat |
| |
| To see the output for a particular command: |
| |
| / $ echo reset > /prof/mpstat ; COMMAND ; cat /prof/mpstat |