Sunday, September 11, 2016

Weekly Report 8: Visualising Zika articles

Last week I wanted to look into extracting more facts, and the relation between found species and compounds. This would be done by extending ami. However, it became clear there will be big improvements to ami in the future, and things like ChemicalTagger and OSCAR are planned to be implemented anyway. It's better to wait for those things to complete before extending it for my own purposes.

Instead I improved the card page for future use. I didn't have too much time to do stuff this week, so I mainly wanted to demonstrate how you could use it with other data.

Article page

Here it is. It's very similar, of course. It has the same design, and comparable data structure. Word clouds now work. You can view them by opening an article and clicking on "Click to load word cloud". It uses a custom API using cloud.js (repo, license). It works by providing URL parameters file (URL of file with ctj output structure containing word data) and pmcid (PubMed Central ID).

I'll talk more about the process of getting data to display in a similar manner. Below is a command dump, but this doesn't cover custom programs. First you get papers, with getpapers. I used the query 'ABSTRACT:zika OR ABSTRACT:dengue OR ABSTRACT:spondweni'. There is nothing really special to this. ABSTRACT: helps with assuring the article remotely covers it, and the other parts are just topics. You can replace this to anything you want. You can use the limit 500 for now.

Then, you take it through the ContentMine pipeline (i.e. norma and ami). You use the ami plugins ami2-species, ami2-words and ami2-sequence. This gives a file system as output, which you can convert to JSON with ctj. Now you minify the file size by removing all data you don't use with c05.js, which I'll document later. The file paths are hard-coded but if you stick to the file structure I've used in the command dump it should work. Finally, you change the file paths in card_c05.html to what you want.

To make the wordcloud API working, you use c05-words.js. The file paths are hard-coded in this file as well, so look out for that. It may try to save a file in a directory that doesn't exist. I'll solve this sometime. Change the file path at line 208 to the output of c05-words.js, and you should be done... Note that you can't load files with a file:// protocol, so you may have to host it somewhere.

Commands used

Next week I'll probably add a better search function and similar things, and see if I can help with extending ami.

No comments:

Post a Comment