Why Profile?
Because your assumptions of how things work behind the scenes are not always correct. By profiling your code you can identify where the bottlenecks are quantitatively.

How?
PECL to the rescue!

www:~> pear install apd
downloading apd-0.4p1.tgz ...
...done: 39,605 bytes
16 source files, building
running: phpize
PHP Api Version        : 20020918
Zend Module Api No     : 20020429
Zend Extension Api No  : 20021010
building in /var/tmp/pear-build-root/apd-0.4p1
running: /tmp/tmprFlAqf/apd-0.4p1/configure
running: make
apd.so copied to /tmp/tmprFlAqf/apd-0.4p1/apd.so
install ok: apd 0.4p1
Then in your php.ini file:

zend_extension = "/usr/local/lib/php/apd.so"
apd.dumpdir = /tmp
It isn't completely transparent. You need to tell the profiler when to start profiling. At the top of a script you want to profile, add this call:

<?php
apd_set_pprof_trace
();
?>
The use the command-line tool called pprofp:

wwww: ~> pprofp
pprofp <flags> <trace file>
    Sort options
    -a          Sort by alphabetic names of subroutines.
    -l          Sort by number of calls to subroutines
    -m          Sort by memory used in a function call.
    -r          Sort by real time spent in subroutines.
    -R          Sort by real time spent in subroutines (inclusive of child calls).
    -s          Sort by system time spent in subroutines.
    -S          Sort by system time spent in subroutines (inclusive of child calls).
    -u          Sort by user time spent in subroutines.
    -U          Sort by user time spent in subroutines (inclusive of child calls).
    -v          Sort by average amount of time spent in subroutines.
    -z          Sort by user+system time spent in subroutines. (default)

    Display options
    -c          Display Real time elapsed alongside call tree.
    -i          Suppress reporting for php builtin functions
    -O <cnt>    Specifies maximum number of subroutines to display. (default 15)
    -t          Display compressed call tree.
    -T          Display uncompressed call tree.
% pprofp -z /tmp/pprof.48478
 
Trace for /home/y/share/htdocs/bench_main.php
Total Elapsed Time =    0.01
Total System Time  =    0.01
Total User Time    =    0.01
 
 
         Real         User        System             secs/    cumm
%Time (excl/cumm)  (excl/cumm)  (excl/cumm) Calls    call    s/call  Memory Usage Name
--------------------------------------------------------------------------------------
100.0  0.00  0.00   0.01  0.01   0.01  0.01   200   0.0001    0.0001            0 foo
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->set_var2
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->set_var1
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->set_var3
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->set_var4
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 var_dump
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->disp
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 test_class->test_class
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 main
  0.0  0.00  0.00   0.00  0.00   0.00  0.00    26   0.0000    0.0000            0 strtoupper
  0.0  0.00  0.00   0.00  0.00   0.00  0.00    52   0.0000    0.0000            0 substr
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 implode
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     2   0.0000    0.0000            0 include
  0.0  0.01  0.01   0.00  0.00   0.00  0.00     2   0.0000    0.0000            0 range
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000            0 explode
Trace for /home/rasmus/phpweb/index.php
Total Elapsed Time =    0.69
Total System Time  =    0.01
Total User Time    =    0.08

         Real         User        System             secs/    cumm
%Time (excl/cumm)  (excl/cumm)  (excl/cumm) Calls    call    s/call  Memory Usage Name
--------------------------------------------------------------------------------------
 33.3  0.11  0.13   0.02  0.03   0.01  0.01     7   0.0043    0.0057       298336 require_once
 22.2  0.02  0.02   0.02  0.02   0.00  0.00   183   0.0001    0.0001       -33944 feof
 11.1  0.01  0.01   0.01  0.01   0.00  0.00     3   0.0033    0.0033       -14808 define
 11.1  0.04  0.04   0.01  0.01   0.00  0.00   182   0.0001    0.0001       112040 fgetcsv
 11.1  0.25  0.25   0.01  0.01   0.00  0.00     6   0.0017    0.0017         3768 getimagesize
 11.1  0.01  0.01   0.01  0.01   0.00  0.00    55   0.0002    0.0002         2568 sprintf
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     7   0.0000    0.0000         -136 printf
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000          136 htmlspecialchars
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000          -16 mirror_provider_url
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     7   0.0000    0.0000          112 spacer
  0.0  0.00  0.00   0.00  0.00   0.00  0.00    10   0.0000    0.0000         -552 delim
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000          112 mirror_provider
  0.0  0.00  0.00   0.00  0.00   0.00  0.00    20   0.0000    0.0000         -624 print_link
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000           24 have_stats
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000          -72 make_submit
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     2   0.0000    0.0000          112 strrchr
  0.0  0.08  0.08   0.00  0.00   0.00  0.00     2   0.0000    0.0000          168 filesize
  0.0  0.00  0.00   0.00  0.00   0.00  0.00     1   0.0000    0.0000          -16 commonfooter
  0.0  0.00  0.11   0.00  0.00   0.00  0.00     2   0.0000    0.0000            0 download_link
  0.0  0.00  0.25   0.00  0.01   0.00  0.00     6   0.0000    0.0017          208 make_image
KCacheGrind Output
www:~> cd /tmp
www:/tmp> ls -lat pprof.*
-rw-r--r--  1 nobody 4294967295  62771 Apr 17 07:36 pprof.02466.0
-rw-r--r--  1 nobody 4294967295 269311 Apr 17 07:36 pprof.02465.0
-rw-r--r--  1 nobody 4294967295  62779 Apr 17 07:34 pprof.02464.0
www:/tmp> pprof2calltree -f pprof.02465.0
Writing kcachegrind compatible output to cachegrind.out.pprof.02465.0
www:/tmp> kcachegrind cachegrind.out.pprof.02465.0