I've been
playing around with several versions of the Windows Performance Toolkit for
some time now and decided to gather all those little annoying things that can
make your life miserable. This way, all known issues (and hopefully their
solutions) will be mentioned in one place and you don't need to spend hours of
googling just to find the answer amongst the comments of some long abandoned
forum.
I also find
it important to highlight that for most (but not all) problems below it was not
me who found the solution. I merely try to collect everything in one place and
this will take some time. The credit shall mainly go to Alois Kraus and Bruce
Dawson. These guys did an awesome job in documenting the various strange
behavior of the tools involved, like wpr, wprui, xperf & wpa. I definitely
recommend you to check out the blogs of these guys.
The issues:
- x64 ETW
stackwalking is broken under Windows 7
- Data for
CLR event providers are not decoded correctly in WPA (Generic Events)
- Data for
custom event providers are not decoded correctly in WPA (Generic Events)
- Symbol
decoding does not work for native images in WPA (?!?)
- Symbol
decoding does not work for managed assemblies in WPA (?!?)
- .etl file
taken on Windows 8(.1?) machine requires at least Win 8(.1) for analysis
- xperf /
wpr won't start if NT Kernel Logger session is already runnning (procexp,
procmon etc) under Windows 7
x64 ETW stackwalking is broken under Windows 7
I read a few
articles mentioning this issue before actually bumping into it. The problem is
that stack walking stops as soon as it detects being inside a JIT-ted method.
The result is that you get broken call stacks for all your managed stuff. It's
very easy to detect this error, as many methods hit in the problematic assembly
are shown as a direct child of the [Root] element in the tree. Make no mistake,
the resulting .etl file is corrupted here, so you can't fix the callstacks
without taking the appr. corrective action and re-profiling.
<img
placeholder>
<Figure x. No method call hierarchy, only a flat list>
However this
works fine for x86 code. As you can see, canvas1_Loaded() is now deep within
the call hierarchy. We should see the same for x64, should there be no bug in
the ETW stack walker.
<img
placeholder>
<Figure x.
Expected method call hierarchy>
Solution:
either switch to x86 for profiling or NGEN your x64 assemblies and repeat
profiling.
Data for CLR event providers are not decoded correctly in WPA (Generic
Events)
Now this is
a real life ruiner. I was experimenting with the CLR Runtime / Rundown
Provider(s) the other day and after spending hours to map hexadecimal values to
CLR ETW keywords, levels and events in WPA that actually make sense I finally
gave up and asked myself the question "Is this really worth it?".
There had to be a way to actually see fields that make sense. Column names were
also totally useless.
Values and column names missing or make no sense |
Solution:
reinstall the CLR manifest, by running the following commands:
wevtutil um
c:\Windows\Microsoft.NET\Framework64\v4.0.30319\CLR-ETW.man
wevtutil im
c:\Windows\Microsoft.NET\Framework64\v4.0.30319\CLR-ETW.man
Note: for a 32-bit OS, just use the
"Framework" folder instead of "Framework64".
Unfortunately,
you have to repeat your profiling scenario, because once again, it's the .etl
file that contains invalid information. After fixing the issue, it takes some
time to get used to the dynamic column headers, but that should be your biggest
problem in life. :-)
All values / column names are shown correctly |
Data for custom event providers are not decoded correctly in WPA (Generic
Events)
My other
favorite. If you happen to use System.Diagnostics.Tracing.EventSource to create
your own ETW event provider you might bump into this. To keep it short, avoid
APi names like Start or Stop. WPA (even the one in Creators Update SDK) seems
to perform some arcane magic while reading / decoding the information in the
.etl file and this unfortunately seems to conflict with certain (let's call 'em
"reserved") names. I might write a separate article about this
problem in the future.
Values and column names are not decoded properly |
Perfview can decode everything properly |
This is how it should be |
Solution #1:
just try to name your methods in your custom event provider in a standard way,
like when you work with a manifest-based generated class, i.e. EventWrite* will
do.
Solution #2:
if you're targeting .NET 4.6+ you can opt-in to use a self describing event
format for your events. Just make sure you new up EventSource with the
appropriate constructor and then WPA will decode everything correctly.
Symbol decoding does not work for native images in WPA (?!?)
In this
case, WPA can typically decode the module but not the function name in a
callstack.
Comparing two WPA versions |
Solution: make sure Visual Studio is closed during data collection, as it seems ngen createpdb fails while it is open.
Symbol decoding does not work for managed assemblies in WPA (?!?)
This problem
can be recognized by the ?!? symbols in the callstacks:
Module and function symbols are not resolved |
Solution:
make sure Visual Studio is closed while capturing traces.
There you go |
To be
continued...