Mining Edgar Tender Offers

This paper describes how use the HTMLEditorKit to perform web data mining on EDGAR (Electronic Data-Gathering, Analysis, and Retrieval system). EDGAR is the SEC’s (U.S. Securities and Exchange Commission) means of automating the collection, validation, indexing, acceptance, and forwarding of submissions. Some entities are regulated by the SEC (e.g. publicly traded firms) and are required, by law, to file with the SEC. Our focus is on making use of EDGAR to get information about company offers to purchase stock, known as tender offers. These offers are filed with companies, using their Central Index Key (CIK). The CIK is used on the SEC’s computer system to identify corporations and individual people who have filed a disclosure with the SEC. We show how to map a stock ticker symbol into a CIK and how to extract tender offer data. Our example show how we extract the number of shares tendered, the price range for the auction, the honoring of “odd lots” and the initial termination date for the auction. The methodology for converting the web data source into internal data structures is based on using HTML as input into a parser-call-back facility that builds up a data structure using a context sensitive table data. Screen scraping is a popular means of data entry, but the unstructured nature of HTML pages makes this a challenge.

[1]  Dougal A. Lyon Multi-threaded data mining of EDGAR CIKs (Central Index Keys) from ticker symbols , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.