I’m still playing around with the ‘irrelevant’ in an effort to stay ‘relevant’!
Back to the SEDI site for some more inspiration and this time to strip-out a listing of all (7,236) Canadian issuers listed here. You could bash-around here, working through the alphabet and numerals to get at the data; but there is a better way!
No need for Watir this time as I am able to bypass the javascript elements and use just Mechanize and Nokogiri Gems (v. addictive!) for a lightening-fast strip-and-parse from this otherwise ‘clunky’ site.
require 'nokogiri' require 'mechanize' require 'open-uri'
agent = Mechanize.new ary_of_firms = Array.new rows = Array.new
## Defines Array Class for HTML Table output ###
class Array def to_cells(tag) self.map { |c| "<#{tag}>#{c}</#{tag}>" }.join end end
file = File.open('./SEDI_Issuers.html', "w")
[*('A'..'Z'),*('0'..'9')].each do |letter|
page = agent.get('https://www.sedi.ca/sedi/SVTSelectSediIssuer?menukey=15.02.00&locale=en_CA')
sedi_form = page.form('form1') sedi_form.ISSUER_NAME = letter button = sedi_form.button_with(:value => "Search") page = agent.submit(sedi_form, button) doc = Nokogiri::HTML(page.body) i = 0 table = doc.css('table')[9] #td:nth-child(3) font table.css('td:nth-child(3) font').each do |firm| if i > 0 ary_of_firms << firm.text.strip end i+=1 end
end
ary_of_firms.sort.uniq.each do |companies| puts companies rows << {"SEDI Issuer" => companies} end
### Rolls HTML Table output ### headers = "<tr>#{rows[0].keys.to_cells('th')}</tr>" cells = rows.map do |row| "<tr>#{row.values.to_cells('td')}</tr>" end.join("\n ") table = "<table border=\"1\"> #{headers} #{cells} </table>" file.puts table