ALGOMINE: A SYSTEM FOR EXTRACTING AND SEARCHING FOR ALGORITHMS IN SCHOLARLY BIG DATA
Main Article Content
Abstract
Algorithms are normally published in scholarly articles, especially in the computational sciences and related fields. The ability to mechanically find and retrieve these algorithms in this increasingly wide collection of scholarly digital articles would enable algorithm discovery, indexing, analysis, and searching. In recent time, AlgoMine, a search engine for algorithms, has been investigated as module of CiteSeerX with the intent of serving a large algorithm database. Now, over 200,000 algorithms have been mined from over 2 million scholarly articles. This paper proposes a new set of scalable techniques used by AlgoMine to search and retrieve algorithm representations in a different pool of scholarly articles. Specifically, hybrid machine learning approaches are proposed to discover algorithm presentation. Then, techniques to extract textual metadata for each algorithm are discussed. Conclusively, a demonstration version of AlgoMine that is built on Solr/Lucene open source indexing and search system is presented.