Software Engineering Crunch and more...: December 2014

OK so I've been spending time over the last year occasionally tweaking performance improvements on a multi-core application. This can be a huge timesink. What works best for me is to gather data, try some obvious changes, then get away from the computer and stew on the problem for a bit.

Obviously for the multi-core world, the one goal here is to support scaling as more cores are thrown at a problem. That has meant that performance tweaking requires:

Avoid locking of any kind, otherwise performance won't scale as more cores are thrown into the stewpot
Minimize cache misses or hot cache reloads, increase cache-coherency
Old fashion instruction tweaking (i.e. reducing instruction costs).

The above are listed in their approximate order of importance.

I highly recommend watching the videos listed on this posting as they point out that #2 is often more important that #3 in performance tweaking.

Locking can often be avoided by using userspace RCU, or similar tricks.

Other great resources:

Performance bit twiddling
Awesome parallel programming reference
Detailed Assembly/C/C++ x86 Optimizations

Obviously one of the great tools is just running perf top, a great deal of insight can be gained just by looking at the results the command below produces:

sudo /usr/bin/perf top -p <pid>

Pretty much any kind of hardware/software supported events can be profiled, but by default counts are samples per function.

There are a ton of tools out there to help evaluate performance--just make sure that you understand how the data is being captured and presented otherwise you risk getting sucked down the rabbit-hole of false assumptions...

Software Engineering Crunch and more...

Sunday, December 28, 2014

Random bit slogging notes through some performance issues

Followers

About Me

Labels

My Blog List