Bye bye, regular expressions!
Thanks to the tidy parser, no longer do you have to rely on regular expressions to extract content from another HTML document. Rather, you can use a much more maintainable solution!

In the following example (found in the ext/tidy/examples/ directory), I use the tidy parser to extract all of the links from an arbitary HTML document

Grabbing URLs from an HTML document
<?php
    
function dump_nodes(tidy_node $node, &$urls NULL) {

        
$urls = (is_array($urls)) ? $urls : array();
    
        if(isset(
$node->id)) {
            if(
$node->id == TIDY_TAG_A) {
                
$urls[] = $node->attribute['href'];
                }
        }
            
        if(
$node->hasChildren()) {

            foreach(
$node->child as $c) {

                
dump_nodes($c$urls);
        
            }

        }
    
        return 
$urls;
        }

    
$a tidy_parse_file($_SERVER['argv'][1]);
    
$a->cleanRepair();
    
print_r(dump_nodes($a->html()));
?>