Tuesday, August 7, 2007

Hpricot Limitations

Well, after fighting with Cygwin and Hpricot for a few days, I've given in and dropped Cygwin from my stack for the Ruby/Windows testing for now.

Within five minutes of doing that, I was able to get something done with Hpricot, and bump into its limitations within a minute or so after that.

For example:


irb(main):001:0> require 'rubygems'
=> false

irb(main):002:0> require 'hpricot'
=> true

irb(main):003:0> require 'open-uri'
=> true

irb(main):004:0> doc = Hpricot(open("http://code.whytheluckystiff.net/hpricot/wi
ki/HpricotBasics"))
=> #<Hpricot::Doc {doctype "<!DOCTYPE html\n" " PUBLIC \"-//W3C//DTD XHTML 1.

...

"> "http://trac.edgewall.com/" </a>} "\n " </p>} "\n" </div>} "\n\n\n\n "} </bod
y>} "\n" </html>} "\n\n">

irb(main):005:0> (doc/"//img")
=> #<Hpricot::Elements[{emptyelem <img src="/hpricot/chrome/site/images/hpricot-
small.png" alt="hpricot">}, {emptyelem <img src="/hpricot/chrome/common/trac_log
o_mini.png" height="30" alt="Trac Powered" width="107">}]>

irb(main):006:0> (doc/"//img[@alt='hpricot'")
=> #<Hpricot::Elements[{emptyelem <img src="/hpricot/chrome/site/images/hpricot-
small.png" alt="hpricot">}, {emptyelem <img src="/hpricot/chrome/common/trac_log
o_mini.png" height="30" alt="Trac Powered" width="107">}]>

irb(main):007:0> (doc/"//img[@alt='hpricot']")
=> #<Hpricot::Elements[{emptyelem <img src="/hpricot/chrome/site/images/hpricot-
small.png" alt="hpricot">}]>

irb(main):008:0> (doc/"//a/img[@alt='hpricot']")
=> #<Hpricot::Elements[{emptyelem <img src="/hpricot/chrome/site/images/hpricot-
small.png" alt="hpricot">}]>

irb(main):009:0> (doc/"//a[img/@alt='hpricot']")
=> #<Hpricot::Elements[]>


There's nothing wrong with the second query; it's a valid XPath expression, it's just not supported by jQuery, which means it isn't supported by Hpricot. Too bad; I guess i'll have to work around the syntax limitations.

No comments: