I’m still playing around with the ‘irrelevant’ in an effort to stay ‘relevant’!
Back to the SEDI site for some more inspiration and this time to strip-out a listing of all (7,236) Canadian issuers listed here. You could bash-around here, working through the alphabet and numerals to get at the data; but there is a better way!
No need for Watir this time as I am able to bypass the javascript elements and use just Mechanize and Nokogiri Gems (v. addictive!) for a lightening-fast strip-and-parse from this otherwise ‘clunky’ site.
require 'nokogiri' require 'mechanize' require 'open-uri'
agent = Mechanize.new ary_of_firms = Array.new rows = Array.new
## Defines Array Class for HTML Table output ###
class Array
def to_cells(tag)
self.map { |c| "<#{tag}>#{c}</#{tag}>" }.join
end
end
file = File.open('./SEDI_Issuers.html' , "w")
[*('A'..'Z'),*('0'..'9')].each do |letter|
page = agent.get('https://www.sedi.ca/sedi/SVTSelectSediIssuer?menukey=15.02.00&locale=en_CA')
sedi_form = page.form('form1')
sedi_form.ISSUER_NAME = letter
button = sedi_form.button_with(:value => "Search")
page = agent.submit(sedi_form, button)
doc = Nokogiri::HTML(page.body)
i = 0
table = doc.css('table')[9]
#td:nth-child(3) font
table.css('td:nth-child(3) font').each do |firm|
if i > 0
ary_of_firms << firm.text.strip
end
i+=1
end
end
ary_of_firms.sort.uniq.each do |companies|
puts companies
rows << {"SEDI Issuer" => companies}
end
### Rolls HTML Table output ###
headers = "<tr>#{rows[0].keys.to_cells('th')}</tr>"
cells = rows.map do |row|
"<tr>#{row.values.to_cells('td')}</tr>"
end.join("\n ")
table = "<table border=\"1\">
#{headers}
#{cells}
</table>"
file.puts table