Making indexing parts out of text.

Text
"This standard was developed from ISO/IEC 9075:1989"

Whitespace:
"This" "standard" "was" "developed" "from" "ISO/IEC" "9075:1989"

Continuous letters:
"This" "standard" "was" "developed" "from" "ISO" "IEC"
    

HTML
"<li><em>If it exists</em>, the STATUS of the W3C document.</li>"

"If" "it" "exists" "the" "status" "of" "the" "w3c" "document"