Tag Archives: Nokogiri

SEDI Issuers

I’m still playing around with the ‘irrelevant’ in an effort to stay ‘relevant’!

Back to the SEDI site for some more inspiration and this time to strip-out a listing of all (7,236) Canadian issuers listed here. You could bash-around here, working through the alphabet and numerals to get at the data; but there is a better way!

No need for Watir this time as I am able to bypass the javascript elements and use just Mechanize and Nokogiri Gems (v. addictive!) for a lightening-fast strip-and-parse from this otherwise ‘clunky’ site.

 

require 'nokogiri'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
ary_of_firms = Array.new
rows = Array.new
 ## Defines Array Class for HTML Table output ###
class Array 
 def to_cells(tag)
 self.map { |c| "<#{tag}>#{c}</#{tag}>" }.join
 end
end
file = File.open('./SEDI_Issuers.html'  , "w")
[*('A'..'Z'),*('0'..'9')].each do |letter|
 page = agent.get('https://www.sedi.ca/sedi/SVTSelectSediIssuer?menukey=15.02.00&locale=en_CA')
sedi_form = page.form('form1')

 sedi_form.ISSUER_NAME = letter

 button = sedi_form.button_with(:value => "Search")

 page = agent.submit(sedi_form, button)

 doc = Nokogiri::HTML(page.body)

 i = 0

 table = doc.css('table')[9]

 #td:nth-child(3) font

 table.css('td:nth-child(3) font').each do |firm|
 if i > 0
 ary_of_firms << firm.text.strip
 end
 i+=1
 end
end
ary_of_firms.sort.uniq.each do |companies|
 puts companies
 rows << {"SEDI Issuer" => companies}
end
### Rolls HTML Table output ###
headers = "<tr>#{rows[0].keys.to_cells('th')}</tr>"
cells = rows.map do |row|
 "<tr>#{row.values.to_cells('td')}</tr>"
end.join("\n ")
table = "<table border=\"1\">
 #{headers}
 #{cells}
</table>"
file.puts table

FINRA List of Members – A “True” Listing of FINRA Members

My last post scraping data of FINRA registrants for a particular Financial Institution got me thinking about getting a listing of all the firms that FINRA regulates.

FINRA does provide a site with a “listing” of all the firms they regulate

, but I wanted all the raw data for use in a spreadsheet (or where ever) and this was the challenge.

This was a great opportunity to use the Ruby Mechanize Gem alongside the Nokogiri Gem for parsing the output. Together these two Gems are very powerful and both crawl and mine data with beautiful efficiency , grabbing the desired data and getting me my “true” list of all FINRA Members.

require 'nokogiri'
require 'mechanize'
require 'open-uri'
ary_of_members = Array.new
rows = Array.new
agent = Mechanize.new
### Defines Array Class for HTML Table output ###
 class Array
 def to_cells(tag)
  self.map { |c| "<#{tag}>#{c}" }.join
 end
 end
file = File.open('./finra_members.html', "w")
page = agent.get('http://www.finra.org/AboutFINRA/MemberFirms/ListOfMembers/p012909')
doc = Nokogiri::HTML(page.body)
doc.css('.FNRW_Alphabetical_DL-result').each do |linkz|
page = agent.get('http://www.finra.org' + linkz['href'])
 doc = Nokogiri::HTML(page.body)
##### Search for nodes by css
doc.css('#col2cont span').each do |firm|
 if firm.to_s =~ /Mailing Address|10pt'>.*?<\/p><\/span>/

 else
 if firm.text =~ /\w+/
  ary_of_members << firm.text
 end
end
end
end
(0..ary_of_members.length).step(2) do |n|
 rows << {"FINRA Member" => ary_of_members[n], "FINRA Member Address" => ary_of_members[n+1]}
end
### Rolls HTML Table output ###
headers = "#{rows[0].keys.to_cells('th')}"
cells = rows.map do |row|
"#{row.values.to_cells('td')}"
end.join("\n ") 
table = "#{headers} #{cells}"
file.puts table

Quick Query for FINRA BrokerCheck

The task was to generate a listing of active FINRA registrants for a particular FI form FINRAs BrokerCheck website and once again some Ruby script with a Watir and Nokogiri Gem was the goto.

 

require 'nokogiri'
require 'rubygems'
require 'watir-webdriver'
rows = Array.new
dupes = Array.new
### Defines Array Class for HTML Table output ###
class Array 
 def to_cells(tag)
 self.map { |c| "<#{tag}>#{c}</#{tag}>" }.join
 end
end

browser = Watir::Browser.new :chrome
browser.goto 'http://brokercheck.finra.org/Search/Search.aspx'
[*('A'..'Z')].each do |letter|

browser.text_field(:name => 'ctl00$phContent$ucUnifiedSearch$txtIndvl').set "#{letter}" + "*"

browser.text_field(:name => 'ctl00$phContent$ucUnifiedSearch$txtFirm').set 'BMO'

browser.input(:name => 'ctl00$phContent$ucUnifiedSearch$lbtnFreeFormSearch').click
loop do
table_Rows = browser.table(:id    , 'ctl00_phContent_gvBrokerTable').rows.length
for i in 0..table_Rows.to_i-1 

 doc = Nokogiri::HTML.parse(browser.html)

 name = browser.table(:id  , 'ctl00_phContent_gvBrokerTable')[i][0].div(:class, 'gvListItemStyle').span.text
 lic_status = browser.table(:id, 'ctl00_phContent_gvBrokerTable')[i][1].text
 status = browser.table(:id, 'ctl00_phContent_gvBrokerTable')[i][0].div(:class, 'gvListItemStyle').text

 if lic_status.to_s =~ /Not Licensed/
elsif status.to_s =~ /BMO/

 source = doc.css('.GrayTextShade:nth-child(3)')[i].text

 if dupes.include?(name)
 else
 puts "#{name}\t#{source}"
 rows << {"Name" => name, "Registration" => source}
 dupes << name
 end
end
end
 if browser.link(:id =>'ctl00_phContent_navPager_lbNext').exists?
browser.link(:id =>'ctl00_phContent_navPager_lbNext').click

 else

 browser.goto 'http://brokercheck.finra.org/Search/Search.aspx'
 break
end
end
end
### Rolls HTML Table output ###
headers = "<tr>#{rows[0].keys.to_cells('th')}</tr>"
cells = rows.map do |row|
 "<tr>#{row.values.to_cells('td')}</tr>"
end.join("\n ")
table = "<table border=\"1\">
 #{headers}
 #{cells}
</table>"
puts table