I use a tool called compare-perf on a regular basis to test performance changes to code. I've come to realize that this is a better process than many software developers are using for performance comparisons on their code, so I decided to post it so that others can do the same.
You can see the slides for the lightning talk I gave at LCA 2013
It's not perfect. Far from it. There are still major issues that will lead to publication bias, normal distributions are rarely the right thing to assume given the kinds of measurements we're taking, and no attempt is made to take advantage of pairing for increased test power. The point of this is not to give you the best tool I can imagine (I don't have it), it's to give you a really simple tool that's better than you've been doing in the past. You'll end up taking more samples than you need due to its convenience, and then these issues won't matter.
The command script should run whatever test you're trying to measure and print a number (and nothing else). This output will be stuffed into the "before" and "after" files in the current directory.
The beforescript and afterscript should set up the environment in whatever way necessary to make the command run the old or new code respectively. An example script I use for Mesa development:
env \ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/anholt/src/mesa-clean/lib:/home/anholt/src/prefix/lib \ LIBGL_DRIVERS_PATH=/home/anholt/src/mesa-clean/lib \ DISPLAY=:0 \ $@
That's it! Run it, then go get a cup of coffee or take a nap or browse tumblrs or whatever, and come back and hit ^C and accept the results (or let it run for longer, and take that time to read about publication bias and confirmation bias and think seriously about whether what you're doing is going to lead you to truth -- not that I'm saying it's wrong, just that you should keep it in mind).
The results are droppped into the "before" and "after" files in the current directory, so if you have outliers you have identified to be unrelated to your changes (such as from thermal throttling on CPUs), you can potentially trim them out and run ministat again.