Yesterday I published a blogpost, where I talked about ctj and how and why to convert ContentMine's CProjects to JSON. At the end, I mentioned this post, where I would talk about how to use it in different programs, and with d3.js. So here we go. For starters, let's make the data about word frequencies look nice. Not readable (then we would use a table), but visually pleasing. Let's make a word cloud. Skip to the part where I talk about converting the data.
Figure 1: Word Cloud (see text below) |
Most Google results for "d3 js word cloud" point to cloud.js (repo, license). The problem was, I could not find the source code. Both index.js and build/d3.layout.cloud.js use require()
in one of the first lines, and therefore I assumed it was intended for NodeJS.
Figure 2: Different font size scalings: log n, √n, n (where n is the number of occurrences of a word) |
Here is the static result. For the live demo you need the input data and a localhost. Here is a guide on how to get the input data. To apply it, change the path on line 17 and change the PMCID on line 19 to the one you want to show. Of course, this needs to be one of an article that exists in your data. For jQuery to be able to fetch the JSON, you need a server-like thing, because local files fetching local files on the same location still counts as not the same domain.
Figure 3: See paragraph right |
Now, the interesting part. When you finish making a design, you want to feed it words. We fetch the words from the output of ctj. I did it with jQuery as that is what I normally use. In the callback function, we get the word frequencies of a certain article (
data.articles[ "PMC3830773" ].AMIResults.frequencies
) and change the format to one cloud.js can handle easier. This can be anything, but you need to specify what e.g. here, and it is probably better to remove all data that will not be used. Then we add the words to the cloud (layout
) and start generating the visual output with .start()
.$.get( 'path/to/data.json', function( data, err ){
var frequencies = data.articles[ "PMC3830773" ].AMIResults.frequencies
, words = frequencies.map( function ( v ) {
return { text: v.word, size: v.count }
} )
layout.words(words).start();
});
Now that it is generated, we can see what words, stripped from common words such as prepositions and pronouns, are used most. The articles are about pines, so we see lots of words confirming that. "conifers", "wood", etc. We also notice some errors, like "./--" and "[].", not recognising punctuation ("wood", "Wood", "wood." and "wood,"), and CSS (?!): {background
, #ffffff;}
and px;}
. These are all problems with ContentMine's ami2-word plugin and will be fixed. No worries.
More examples on how to use CProject data as JSON coming soon. Perhaps popular word combinations.
Example articles used:
- Figure 1 and 2: Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms (10.1186/1471-2164-13-589)
- Figure 3 and code block: The Transcriptomics of Secondary Growth and Wood Formation in Conifers (10.1155/2013/974324)