| Akaros Profiling | 
 | =========================== | 
 |  | 
 | Contents: | 
 |  | 
 |  (*) Perf | 
 |      - Setup | 
 |      - Example | 
 |      - More Complicated Examples | 
 |      - Differences From Linux | 
 |  | 
 |  (*) mpstat | 
 |  | 
 |  | 
 | =========================== | 
 | PERF | 
 | =========================== | 
 | Akaros has limited support for perf_events.  perf is a tool which utilizes CPU | 
 | performance counters for performance monitoring and troubleshooting. | 
 |  | 
 | Akaros has its own version of perf, similar in spirit to Linux's perf, that | 
 | produces PERFILE2 ABI compliant perf.data files (if not, file a bug!).  The | 
 | kernel generates traces, under the direction of perf.  You then copy the traces | 
 | to a Linux host and process using Linux's perf. | 
 |  | 
 |  | 
 | SETUP | 
 | -------------------- | 
 | To build Akaros's perf directly: | 
 |  | 
 | (linux)$ cd tools/dev-libs/elfutils ; make install; cd - | 
 | (linux)$ cd tools/dev-util/perf ; make install; cd - | 
 |  | 
 | Or to build it along with all apps: | 
 |  | 
 | (linux)$ make apps-install | 
 |  | 
 | You will also need suitable recent Linux perf for the reporting of the data | 
 | (something that understands PERFILE2 format).  Unpatched Linux 4.5 perf did the | 
 | trick.  You'll also want libelf and maybe other libraries on your Linux | 
 | machine. | 
 |  | 
 | First, install libelf according to your distro.  On ubuntu: | 
 | (linux) $ sudo apt-get install libelf-dev | 
 |  | 
 | Then try to just install perf using your Linux distro, and install any needed | 
 | dependencies.  On ubuntu, you can install linux-tools-common and whatever else | 
 | it asks for (something particular to your host kernel). | 
 |  | 
 | Linux perf changes a lot.  Newer versions are usually nicer.  I recommend | 
 | building one of them:  Download Linux source, then | 
 |  | 
 | (linux) $ cd tools/perf/ | 
 | (linux) $ make | 
 |  | 
 | Then use your new perf binary.  This all is just installing a recent perf - it | 
 | has little to do with Akaros at this point.  If you run into incompatibilities | 
 | between our perf.data format and the latest Linux, file a bug. | 
 |  | 
 |  | 
 | BASIC EXAMPLE | 
 | -------------------- | 
 | Perf on Akaros supports record, stat, and a few custom options. | 
 |  | 
 | You should be able to do the following: | 
 |  | 
 | / $ perf record ls | 
 |  | 
 | Then scp perf.data to Linux | 
 |  | 
 | (linux) $ scp AKAROS_MACHINE:perf.data . | 
 | (linux) $ perf report --kallsyms=obj/kern/ksyms.map --symfs=kern/kfs/ | 
 |  | 
 | Perf will look on your host machine for the kernel symbol table and for | 
 | binaries.  We need to tell it kallsyms and symfs to override those settings. | 
 |  | 
 | It can be a hassle to type out the kallsyms and symfs, so we have a script that | 
 | will automate that.  Use scripts/perf in any place that you'd normally use | 
 | perf.  Set your $AKAROS_ROOT (default is ".") and optionally override $PERF_CMD | 
 | ("default is "perf").  For most people, this will just be: | 
 |  | 
 | (linux) $ ./scripts/perf report | 
 |  | 
 | The perf.data file is implied, so the above command is equivalent to: | 
 |  | 
 | (linux) $ ./scripts/perf report -i perf.data | 
 |  | 
 |  | 
 | MORE COMPLICATED EXAMPLES | 
 | -------------------- | 
 | First, try perf --help for usage.  Then check out | 
 | https://perf.wiki.kernel.org/index.php/Tutorial.  We strive to be mostly | 
 | compatible with the usage of Linux perf. | 
 |  | 
 | perf stat runs a command and reports the count of events during the run of the | 
 | command.  perf record runs a command and outputs perf.data, which contains | 
 | backtrace samples from when the event counters overflowed.  For those familiar | 
 | with other perfmon systems, perf stat is like PAPI and perf record is like | 
 | Oprofile. | 
 |  | 
 | perf record and stat both track a set of events with the -e flag.  -e takes a | 
 | comma-separated list of events.  Events can be expressed in one of three forms: | 
 |  | 
 | - Generic events (called "pre-defined" events on Linux) | 
 | - Libpfm events | 
 | - Raw events | 
 |  | 
 | Linux's perf only takes Generic and Raw events, so the libpfm4 is an added | 
 | bonus. | 
 |  | 
 | Generic events consist of strings like "cycles" or "cache-misses".  Raw events | 
 | aresimple strings of the form "rXXX", where the X's are hex nibbles.  The hex | 
 | codes are passed directly to the PMU.  You can actually have 2-4 Xs on Akaros. | 
 |  | 
 | Libpfm events are strings that correspond to events specific to your machine. | 
 | Libpfm knows about PMU events for a given machine.  It figures out what machine | 
 | perf is running on and selects events that should be available.  Check out | 
 | http://perfmon2.sourceforge.net/ for more info. | 
 |  | 
 | To see the list of events available, use `perf list [regex]`, supplying an | 
 | optional search regex.  For example, on a Haswell: | 
 |  | 
 | / $ perf list unhalted_reference_cycles | 
 | #----------------------------- | 
 | IDX      : 37748738 | 
 | PMU name : ix86arch (Intel X86 architectural PMU) | 
 | Name     : UNHALTED_REFERENCE_CYCLES | 
 | Equiv    : None | 
 | Flags    : None | 
 | Desc     : count reference clock cycles while the clock signal on the specific core is running. The reference clock operates at a fixed frequency, irrespective of c | 
 | ore frequency changes due to performance state transitions | 
 | Code     : 0x13c | 
 | Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) | 
 | Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) | 
 | Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean) | 
 | Modif-03 : 0x03 : PMU : [i] : invert (boolean) | 
 | Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer) | 
 | Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean) | 
 | #----------------------------- | 
 | IDX      : 322961409 | 
 | PMU name : hsw_ep (Intel Haswell EP) | 
 | Name     : UNHALTED_REFERENCE_CYCLES | 
 | Equiv    : None | 
 | Flags    : None | 
 | Desc     : Unhalted reference cycles | 
 | Code     : 0x300 | 
 | Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) | 
 | Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) | 
 | Modif-02 : 0x05 : PMU : [t] : measure any thread (boolean) | 
 |  | 
 | There are two different events for UNHALTED_REFERENCE_CYCLES (case | 
 | insensitive).  libpfm will select the most appropriate one.  You can override | 
 | this selection by specifying a PMU: | 
 |  | 
 | / $ perf stat -e ix86arch::UNHALTED_REFERENCE_CYCLES ls | 
 |  | 
 | Here's how to specify multiple events: | 
 |  | 
 | / $ perf record -e cycles,instructions ls | 
 |  | 
 | Events also take a set of modifiers.  For instance, you can specify running | 
 | counters only in kernel mode or user mode.  Modifiers are separated by a ':'. | 
 |  | 
 | This will track only user cycles (default is user and kernel): | 
 |  | 
 | / $ perf record -e cycles:u ls | 
 |  | 
 | To use a raw event, you need to know the event number.  You can either look in | 
 | your favorite copy of the SDM, or you can ask libpfm.  Though if you ask | 
 | libpfm, you might as well just use its string processing.  For example: | 
 |  | 
 | / $ perf list FLUSH | 
 | #----------------------------- | 
 | IDX      : 322961462 | 
 | PMU name : hsw_ep (Intel Haswell EP) | 
 | Name     : TLB_FLUSH | 
 | Equiv    : None | 
 | Flags    : None | 
 | Desc     : TLB flushes | 
 | Code     : 0xbd | 
 | Umask-00 : 0x01 : PMU : [DTLB_THREAD] : None : Count number of DTLB flushes of thread-specific entries | 
 | Umask-01 : 0x20 : PMU : [STLB_ANY] : None : Count number of any STLB flushes | 
 | Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean) | 
 | Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean) | 
 | Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean) | 
 | Modif-03 : 0x03 : PMU : [i] : invert (boolean) | 
 | Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer) | 
 | Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean) | 
 | Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean) | 
 | Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean) | 
 |  | 
 | The raw code is 0xbd.  So the following are equivalent (but slightly buggy!): | 
 |  | 
 | / $ perf stat -e TLB_FLUSH ls | 
 | / $ perf stat -e rbd ls | 
 |  | 
 | If you actually run those, rbd will have zero hits, and TLB_FLUSH will give you | 
 | the error "Failed to parse event string TLB_FLUSH". | 
 |  | 
 | Some events actually rather particular to their Umasks, and TLB_FLUSH is one of | 
 | them.  TLB_FLUSH wants a Umask.  Umasks are selectors for specific sub-types of | 
 | events.  In the case of TLB_FLUSH, we can choose between DTLB_THREAD and | 
 | STLB_ANY.  Umasks are not always required - they just happen to be on my | 
 | Haswell for TLB_FLUSH.  That being said, we can ask for the event like so: | 
 |  | 
 | / $ perf stat -e TLB_FLUSH:STLB_ANY ls | 
 | / $ perf stat -e r20bd ls | 
 |  | 
 | Note that the Umask is placed before the Code.  These 16 bits are passed | 
 | directly to the PMU, and on Intel the format is "umask:event". | 
 |  | 
 | perf record is based on recording samples when event counters overflow.  The | 
 | number of events required to trigger a sample is referred to as the | 
 | sample_period.  You can set it with -c, e.g. | 
 |  | 
 | / $ perf record -c 10000 ls | 
 |  | 
 |  | 
 | DIFFERENCES FROM LINUX | 
 | -------------------- | 
 | For the most part, Akaros perf is similar to Linux.  A few things are | 
 | different. | 
 |  | 
 | The biggest difference is that our perf does not follow processes around.  We | 
 | count events for cores, not processes.  You can specify certain cores, but not | 
 | certain processes.  Any options related to tracking specific processes are | 
 | unsupported. | 
 |  | 
 | The -F option (frequency) is loosely supported.  The kernel cannot adjust the | 
 | sampling count dynamically to meet a certain frequencey.  Instead, we guess | 
 | that -F is used with cycles, and pick a sample period that will generate | 
 | samples at the desired frequency if the core is unhalted.  YMMV. | 
 |  | 
 | Akaros currently supports only PMU events.  In the future, we may add events | 
 | like context-switches. | 
 |  | 
 |  | 
 | =========================== | 
 | mpstat | 
 | =========================== | 
 | Akaros has basic support for mpstat.  mpstat gives a high-level glance at where | 
 | each core is spending its time. | 
 |  | 
 | For starters, bind kprof somewhere.  The basic ifconfig script binds it to | 
 | /prof. | 
 |  | 
 | To see the CPU usage, cat mpstat: | 
 |  | 
 | / $ cat /prof/mpstat | 
 |  CPU:             irq             kern              user                 idle | 
 |    0: 1.707136 (  0%), 24.978659 (  0%), 0.162845 (  0%), 13856.233909 ( 99%) | 
 |  | 
 | To reset the count: | 
 |  | 
 | / $ echo reset > /prof/mpstat | 
 |  | 
 | To see the output for a particular command: | 
 |  | 
 | / $ echo reset > /prof/mpstat ; COMMAND ; cat /prof/mpstat |