Friday, December 22, 2017

Citation.js Version 0.4 Beta: New Docs and Input Plugins

Citation.js Version 0.4 Beta: New Docs and Input Plugins

It’s been a while. Really. But now it’s back, with a new release: v0.4.0-0, the v0.4 beta. Below I explain some of the changes in this release, and then the road-map of Citation.js for v0.4 and v0.5. Also, Citation.js has a DOI now:

DOI

Also, @jsterlibs tweeted about Citation.js.

Release

Input plugins

The main change in this release, is the addition of input plugins. This is the first step towards releasing v0.4, as explained below. Although there’s specific documentation available here and here, I’ll put an example here as well, as the API can be confusing.

The API for registering input plugins will be changed once or twice before the release of v0.4, once to make it less complex and weird, and possibly a second time to incorporate output plugins.

Adding a format

Say you wanted to add the RIS format (I should probably do that sometime). First, let’s define a type, to set things off.

const type = '@ris/text'

I can’t really define the actual parser here, but I’ll add a variable in the example code.

const parse = data => { /* return CSL-JSON or some format supported by any other parser */ }

Testing if a given string is in the RIS format, we’ll use regex. This regex matches if any line starts with two alphanumerical characters, two spaces and a hyphen. That’s a pretty fuzzy match; all lines should start with that sequence. However, this is just an example, and proper regex would be less elegant.

const risRegex = /^\w{2}\ {2}-/gm

Now, let’s define the dataType of RIS input. When using regex to test input, the dataType is automatically determined to be 'String' anyway, but for the sake of being clear:

const dataType = 'String'

Now, to combine it all:

Cite.parse.add(type, {
  parse,
  parseType: risRegex,
  dataType
})

Changing parsers

Now, say someone else wrote that code above, and you need to use it without modifying it, but you want a better regex? That can be arranged:

Cite.parse.add(type, {
  dataType: 'String',
  parseType: /^(?:\w{2}\ {2}-.*\n)+(\w{2}\ {2}-.*)?$/g
})

Because the options dataType, parseType, elementConstraint and propertyConstraint are all treated as one thing, you need to pass everyone of those when replacing the type checker. dataType is still not actually mandatory in this example, but is passed to demonstrate this.

Disabling a format

There is currently no way to remove a format.

In this scenario, some plugin registered a new format to get citation data from github.com URLs. Unfortunately, it doesn’t work. Instead of recognising the URL as a GitHub-specific URL, the type is @else/url, the generic URL type. This is because the generic version was registered earlier, and there is not yet a category for generic types (see #104). If you don’t use the @else/url type checker anyway, you can disable it like this:

Cite.parse.add('@else/url', {dataType: 'String', parseType: () => false})

I’m not sure why the dataType is needed here, as you’re disabling the parser anyway, but it doesn’t seem to work without it.

Docs

Proper CLI docs

Recently, I’ve been updating the documentation on Citation.js, and it has been a pleasant experience, despite some hiccups in the documentation engine. There are tutorials now, and a lot of the JSDoc comments have been improved. I still want to improve some things in the theme, like clickable header links (similar to GitHub and NPM behaviour), and showing sub-tutorials in the navigation.

Backwards compatibility

This release should be largely backwards compatible. If there are any regressions, please report them in the bug tracker.

Road-map

v0.4

v0.4 “is about making it easier to expand on input and output formats, possibly by creating schemes and methods parsing those schemes that can, say, convert BibJSON to CSL JSON based on a JSON file, something that can be stored independently of implementation.”

The idea is to allow for all kinds of plugins, both in input parsing and output formatting, to be registered on the Cite object, and to treat all internal parsers and formatters the same. Currently, input parser plugins are possible (see above), although they will be improved before v0.4. Output parsers will be made soon too, and will be backwards-incompatible, because of the change in the output option format.

v0.5

Currently, the plan is to use JSON-LD as the internal format in v0.5, while still keeping CSL-JSON as the internal scheme. Any input should then be converted to a list (or object) of fields, which will be added to the internal data store. Each field should be transformed to the CSL-JSON scheme individually, and added as a copy. As a result, there won’t be data loss when certain fields aren’t available in CSL-JSON, but are in the input and output schemes. It should also make it easier to write parsers for e.g. BibJSON.

Of course, edge cases should be taken care of. For example, there is no Wikidata property for the CSL-JSON field original-title, but that information can still be derived from the labels combined with the P364 (original language) property.