Sunday, August 21, 2016

Weekly Report 6: ContentMine output to JSON to HTML

The "small program" proved more of a challenge than it seemed. Making a program to generate the JSON (link) was fairly easy. Loop through directories, find files, loop through files, collect XML data, save all collected data as JSON in a file. It took a while, but I think I spent the most time of it setting up the logistics, i.e. a nice logger, a file system reader and an argument processor.

The generated JSON was around 11 MB for 250 papers, so I didn't put it on GitHub, but it's fairly ease to reproduce. Here's a step-by-step guide. After you generate the data, put the JSON file and html/card_c03.html in your localhost (the html can't load the JSON if you don't) and open the latter in a browser, preferably Chrome/Chromium (I haven't tested it in other browsers). You may need to change the file path at line 459 to the place where you stored the JSON file, but this shouldn't be too much of a problem. Also, the content in the columns is capped to 50 items per column. You can change this for papers, genus and species respectively at lines 327, 369, and 419.

If you don't have the time to reproduce the data, here is a static demo (GUI under development). Click to expand the "cards" (items). The items are again capped at 50 per column. The papers are sorted on PMCID, the genus and species in order of appearance in the papers. The extra information at the bottom of the cards in the column of species and genus are in what papers they are mentioned, and how often. The info at the bottom of the article cards should be self-explanatory.

Current GUI

Finishing the GUI will take longer than making the JSON, mostly since CSS can be pretty annoying when you're trying to make nice things without too much JavaScript. I'll have to rethink the design of the cards because things don't fit now, a way to display the columns more nicely, and much more. All this might take a while, as there are lots of features I would like to try to implement.

The blogpost about Activation of defence pathways in Scots pine bark after feeding by pine weevil (Hylobius abietis) is postponed. I'll work on it when I'm done with the project above.

No comments:

Post a Comment