All posts by Philipp Zumstein

Two papers accepted for conferences (JCDL 2018, UMAP 2018)

The paper Linked Open Citation Database: Enabling Libraries to Contribute to an Open and Interconnected Citation Graph was accepted to the JCDL 2018 and will be presented at Fort Worth, Texas in June.



The paper Multi-Model Adversarial Autoencoders for Recommendations of Citations and Subject Labels was accepted to the UMAP 2018 conference and will be presented in Singapore in July.



New Dataset: Labeled Reference Lists for Image Segmentation

For our Linked Open Citation Database, we develop new approaches to extract reference data from reference lists. One step in this process includes the segmentation of such lists into single references, i.e., for each reference, a bounding box is determined.

For training and evaluation purposes we labeled 515 pages containing references from books and chapters. For each page we manually annotated a box for each reference on that page, resulting in a total of 10.722 boxes and their coordinates in XML files, e.g.

where the first box is saved as:


See here for the complete XML of this page file with all boxes.

The complete data set can be downloaded from MADATA together with the bibliographic information to enable you to create data citations: