Nokogiri vs Hpricot
4
I faced a performance problem while using the library Mechanize to scrap an HTML page.
After fetching that page :
@agent = WWW::Mechanize.new
@page = @agent.get(http://www.webometrics.info/top4000.asp)
#.....
I loop for 50 times to exctract the data I need; in each loop I use the method search() as follows:
50.times do |i|
institute = Institute.new
institute.name = @page.search("/table/tr[#{i}]/td[2]/a").inner_html
end
Since the loop seemed to take a long time, I added the following to see exactly how long it takes :
Time.now
50.times do |i|
institute.name = @page.search("/table/tr[#{i}]/td[2]/a").inner_html
end
Time.now
And this is what I got:
Thu Dec 18 13:09:16 +0200 2008
Thu Dec 18 13:09:28 +0200 2008
The loop took 12 seconds! Because of this, I started looking for a better solution. Searching for a solution to make it faster, I found that Mechanize uses Hpricot Library to Parse HTML. I started to look for another library that could parse faster than Hpricot and there I found a benchmark Hpricot vs Nokogiri that showed that Nokogiri seems must faster in searching by xpath. So I gave it a try and the results were surprising.
All I needed to do to make it work in my code was to add the following:
require 'Nokogiri'
WWW::Mechanize.html_parser = Nokogiri::HTML
@agent = WWW::Mechanize.new
#.........
Time.now
50.times do |i|
institute = Institute.new
institute.name = @page.search("/table/tr[#{i}]/td[2]/a").inner_html
end
Time.now
Running that gave the following:
Thu Dec 18 13:19:20 +0200 2008
Thu Dec 18 13:19:20 +0200 2008
That means that it just took less than a second!
That showed how much Nokogiri is faster than Hpricot when it comes to searching by xpath
Written By:
Alfred Nagy
Comments
Post a Comment
eSpace podcast Prodcast
Archive
Latest Comments
- SpectraMind Commented on Egypt Wins UK's National Outsourcing Association Award
- Rofaida Awad Commented on Go Egypt Go!
- Different Mike Commented on Only idiots change their iPhone root password!
- Mike Commented on Only idiots change their iPhone root password!
- smile Commented on Only idiots change their iPhone root password!

