I don’t know how I got any performance tuning done before callgrind.
Actually, I do know: I used -pg with gprof.
-pg
gprof