Sunday, September 4, 2016

Weekly Report 7: Interactive ContentMine output

This weekly report covers the past two weeks. I blogged twice last week, and I figured that was enough.

Last week I blogged about word clouds from ContentMine output. I also blogged about ctj. This week, I have combined both into interactive lists, as seen here and in the images below.

List overview. From left to right: articles, and genus/genera
and species that were mentioned in the articles.

Search results. Here one paper (doi:10.1186/1471-2164-13-589)

I made a NodeJS JSONtoJSON converter (here). It takes the ctj output, strips all information that I don't use, generates some lists, and outputs a minified JSON file. I load this in the HTML-file and generate the "cards". I'll probably move that process to a NodeJS file as well. This will cause a larger filesize, but hopefully a shorter loading time. I also need to make the scrolling more effecient; I don't need to load cards people don't view.

The "generate word cloud" button doesn't work yet, because it currently needs to load data from a file that's to big to put on GitHub efficiently. I'll fix this later.

In the next few weeks I'll fix the issues above and start to see how I can extract more "facts". Currently I only know where what is mentioned, where "what" is limited to species, genus, words, human genes, and regex matches. In the future I want to find metabolites, chemicals and the relation between these and conifer species.

1 comment:

  1. Just wondering about the provenance of how you got to this data. Also with "open notebook science" in mind... Do the Content Mine tools record what they have done in the output files, so that at any later stage you have enough information (parameters, etc) in that provenance output to repeat the exact same steps? If so, which would be really cool, and if this was in any way machine readable, this HTML page could have a "back" page with the commands to run to create this HTML yourself... (OK, just a thought, while waiting at Schiphol for the boarding to start... :)