Friday, October 28, 2016

Citation.js Version 0.2: NPM package, CSL and more

In the last two weeks I've been busy with making Version 0.2 of Citation.js. Here I'll explain some of the changes and the reasoning behind them.

In the past months I've updated Citation.js several times, and the changes included a Node.js program for the commandline and better Wikidata input parsing. While I was working with the "old" code, I noticed some annoying issues in it.

One of the biggest things was the internal data format. When Cite(), the main program, parses input, makes it into JSON with a standardised scheme, which is used everywhere else in the program, e.g. for sorting and outputting. The scheme I used was something I made up to accommodate for the features it had back when there was next to no input parsing, and you were expected to input JSON directly, either by file or by webform. It wasn't scalable at all, and some of the methods were patched so much they only worked in specific test cases.

Old interface of Citation.js (pre-v0.1). It would fetch the first paragraph of Wikipedia about a certain formatting style. Many of the supporting methods of this version stayed in v0.1.

Now I use CSL-JSON, the scheme used by, among others, citeproc-js, and the standard described by the Citation Style Language. It is designed by professionals, or at least people more qualified to making standards. It is quite similar to my old scheme, with some exceptions. Big advantages are the way it stores date and person information. Before, I had to hope users provided names in the correct format. Now, it doesn't matter, as it gets parsed to CSL. The same goes for the date. Another advantage is the new output potential. Besides the output of CSL-JSON, it is now possible to use citeproc-js directly, without extra conversion.

Using a new data format also meant a lot of cleanup in the code. Almost all methods had to be re-written to account for it, but this was mostly a chance to write them more properly. Now, only Cite() exists in the global scope, which is good, because it means other parts don't take up variable names, etc. The entire program is now optimised for both browser and Node.js use, although it uses synchronous requests. From the perspective of the program it is necessary to be able to use synchronous requests. However, it is possible for users to bypass this for a big part. It is mostly used for >Wikidata parsing. An example:

Let's take the input "Q21972834". This is a Wikidata Entity ID, and it points to Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data (10.1093/bioinformatics/btt178). If Cite() only has the ID, it has to fetch the corresponding data (JSON). Because Cite() is called as a function, and is expected to return something, it has to make the request synchronously. However, if the user fetches the data asynchronously and calls Cite() in the callback, that is bypassed.

var xhr = new XMLHttpRequest();

xhr.open(
  /* Method */ 'GET',
  /* URL    */ 'https://www.wikidata.org/wiki/Special:EntityData/Q21972834.json',
  /* Async  */ true
)

xhr.addEventListener( 'load', function () {
  var data = this.responseText
  
  var cite = new Cite( data )
  
  // Etc...
} );

xhr.send( null )

The JSON gets to Cite() with only async(hronous) requests. The problem is that this JSON doesn't contain everything. Instead of the name of the journal it's published in, it contains a numeric reference to it. To get that name, I have to make another request, which has to be sync as well. I hope there is some way in the Wikidata API to turn references off and names (or "labels") on, but I haven't found one yet. That being said, I had to search a long time to find the on-switch for cross-domain requests in the Wikidata API as well, so it might be hidden somewhere. If that's the case then sync requests can be bypassed everywhere, which would be nice, as browsers are on the verge of dropping support.

Probably the biggest news is that it is a NPM (Node Package Manager) package (link). This means you can download the code without having to clone the git repository, or copy the source file. It's even available online, although the default npm demo seems to be broken. Luckily, I have a regular demo as well. As of writing, the npm page says the package has been downloaded 234 times already, but that number has been the same for a day, so I guess there is an issue with npm. If not, that's really cool.

No comments:

Post a Comment