blob: 50af4cb2df9a6fe3ac73541eb27004d3feb82367 [file] [log] [blame]
Akaros Profiling
===========================
Contents:
(*) Perf
- Setup
- Example
- More Complicated Examples
- Differences From Linux
(*) mpstat
===========================
PERF
===========================
Akaros has limited support for perf_events. perf is a tool which utilizes CPU
performance counters for performance monitoring and troubleshooting.
Akaros has its own version of perf, similar in spirit to Linux's perf, that
produces PERFILE2 ABI compliant perf.data files (if not, file a bug!). The
kernel generates traces, under the direction of perf. You then copy the traces
to a Linux host and process using Linux's perf.
SETUP
--------------------
To build Akaros's perf directly:
(linux)$ cd tools/dev-libs/elfutils ; make install; cd -
(linux)$ cd tools/dev-util/perf ; make install; cd -
Or to build it along with all apps:
(linux)$ make apps-install
You will also need suitable recent Linux perf for the reporting of the data
(something that understands PERFILE2 format). Unpatched Linux 4.5 perf did the
trick. You'll also want libelf and maybe other libraries on your Linux
machine.
First, install libelf according to your distro. On ubuntu:
(linux) $ sudo apt-get install libelf-dev
Then try to just install perf using your Linux distro, and install any needed
dependencies. On ubuntu, you can install linux-tools-common and whatever else
it asks for (something particular to your host kernel).
Linux perf changes a lot. Newer versions are usually nicer. I recommend
building one of them: Download Linux source, then
(linux) $ cd tools/perf/
(linux) $ make
Then use your new perf binary. This all is just installing a recent perf - it
has little to do with Akaros at this point. If you run into incompatibilities
between our perf.data format and the latest Linux, file a bug.
BASIC EXAMPLE
--------------------
Perf on Akaros supports record, stat, and a few custom options.
You should be able to do the following:
/ $ perf record ls
Then scp perf.data to Linux
(linux) $ scp AKAROS_MACHINE:perf.data .
(linux) $ perf report --kallsyms=obj/kern/ksyms.map --symfs=kern/kfs/
Perf will look on your host machine for the kernel symbol table and for
binaries. We need to tell it kallsyms and symfs to override those settings.
It can be a hassle to type out the kallsyms and symfs, so we have a script that
will automate that. Use scripts/perf in any place that you'd normally use
perf. Set your $AKAROS_ROOT (default is ".") and optionally override $PERF_CMD
("default is "perf"). For most people, this will just be:
(linux) $ ./scripts/perf report
The perf.data file is implied, so the above command is equivalent to:
(linux) $ ./scripts/perf report -i perf.data
MORE COMPLICATED EXAMPLES
--------------------
First, try perf --help for usage. Then check out
https://perf.wiki.kernel.org/index.php/Tutorial. We strive to be mostly
compatible with the usage of Linux perf.
perf stat runs a command and reports the count of events during the run of the
command. perf record runs a command and outputs perf.data, which contains
backtrace samples from when the event counters overflowed. For those familiar
with other perfmon systems, perf stat is like PAPI and perf record is like
Oprofile.
perf record and stat both track a set of events with the -e flag. -e takes a
comma-separated list of events. Events can be expressed in one of three forms:
- Generic events (called "pre-defined" events on Linux)
- Libpfm events
- Raw events
Linux's perf only takes Generic and Raw events, so the libpfm4 is an added
bonus.
Generic events consist of strings like "cycles" or "cache-misses". Raw events
aresimple strings of the form "rXXX", where the X's are hex nibbles. The hex
codes are passed directly to the PMU. You can actually have 2-4 Xs on Akaros.
Libpfm events are strings that correspond to events specific to your machine.
Libpfm knows about PMU events for a given machine. It figures out what machine
perf is running on and selects events that should be available. Check out
http://perfmon2.sourceforge.net/ for more info.
To see the list of events available, use `perf list [regex]`, supplying an
optional search regex. For example, on a Haswell:
/ $ perf list unhalted_reference_cycles
#-----------------------------
IDX : 37748738
PMU name : ix86arch (Intel X86 architectural PMU)
Name : UNHALTED_REFERENCE_CYCLES
Equiv : None
Flags : None
Desc : count reference clock cycles while the clock signal on the specific core is running. The reference clock operates at a fixed frequency, irrespective of c
ore frequency changes due to performance state transitions
Code : 0x13c
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
#-----------------------------
IDX : 322961409
PMU name : hsw_ep (Intel Haswell EP)
Name : UNHALTED_REFERENCE_CYCLES
Equiv : None
Flags : None
Desc : Unhalted reference cycles
Code : 0x300
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x05 : PMU : [t] : measure any thread (boolean)
There are two different events for UNHALTED_REFERENCE_CYCLES (case
insensitive). libpfm will select the most appropriate one. You can override
this selection by specifying a PMU:
/ $ perf stat -e ix86arch::UNHALTED_REFERENCE_CYCLES ls
Here's how to specify multiple events:
/ $ perf record -e cycles,instructions ls
Events also take a set of modifiers. For instance, you can specify running
counters only in kernel mode or user mode. Modifiers are separated by a ':'.
This will track only user cycles (default is user and kernel):
/ $ perf record -e cycles:u ls
To use a raw event, you need to know the event number. You can either look in
your favorite copy of the SDM, or you can ask libpfm. Though if you ask
libpfm, you might as well just use its string processing. For example:
/ $ perf list FLUSH
#-----------------------------
IDX : 322961462
PMU name : hsw_ep (Intel Haswell EP)
Name : TLB_FLUSH
Equiv : None
Flags : None
Desc : TLB flushes
Code : 0xbd
Umask-00 : 0x01 : PMU : [DTLB_THREAD] : None : Count number of DTLB flushes of thread-specific entries
Umask-01 : 0x20 : PMU : [STLB_ANY] : None : Count number of any STLB flushes
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
The raw code is 0xbd. So the following are equivalent (but slightly buggy!):
/ $ perf stat -e TLB_FLUSH ls
/ $ perf stat -e rbd ls
If you actually run those, rbd will have zero hits, and TLB_FLUSH will give you
the error "Failed to parse event string TLB_FLUSH".
Some events actually rather particular to their Umasks, and TLB_FLUSH is one of
them. TLB_FLUSH wants a Umask. Umasks are selectors for specific sub-types of
events. In the case of TLB_FLUSH, we can choose between DTLB_THREAD and
STLB_ANY. Umasks are not always required - they just happen to be on my
Haswell for TLB_FLUSH. That being said, we can ask for the event like so:
/ $ perf stat -e TLB_FLUSH:STLB_ANY ls
/ $ perf stat -e r20bd ls
Note that the Umask is placed before the Code. These 16 bits are passed
directly to the PMU, and on Intel the format is "umask:event".
perf record is based on recording samples when event counters overflow. The
number of events required to trigger a sample is referred to as the
sample_period. You can set it with -c, e.g.
/ $ perf record -c 10000 ls
DIFFERENCES FROM LINUX
--------------------
For the most part, Akaros perf is similar to Linux. A few things are
different.
The biggest difference is that our perf does not follow processes around. We
count events for cores, not processes. You can specify certain cores, but not
certain processes. Any options related to tracking specific processes are
unsupported.
The -F option (frequency) is loosely supported. The kernel cannot adjust the
sampling count dynamically to meet a certain frequencey. Instead, we guess
that -F is used with cycles, and pick a sample period that will generate
samples at the desired frequency if the core is unhalted. YMMV.
Akaros currently supports only PMU events. In the future, we may add events
like context-switches.
===========================
mpstat
===========================
Akaros has basic support for mpstat. mpstat gives a high-level glance at where
each core is spending its time.
For starters, bind kprof somewhere. The basic ifconfig script binds it to
/prof.
To see the CPU usage, cat mpstat:
/ $ cat /prof/mpstat
CPU: irq kern user idle
0: 1.707136 ( 0%), 24.978659 ( 0%), 0.162845 ( 0%), 13856.233909 ( 99%)
To reset the count:
/ $ echo reset > /prof/mpstat
To see the output for a particular command:
/ $ echo reset > /prof/mpstat ; COMMAND ; cat /prof/mpstat