Tuesday, May 31, 2022

Citation.js Version 0.6: CSL v1.0.2

Since the citation-js npm package was first published, version 0.6 is the first major version of Citation.js that did not start out as a pre-release. Version 0.3 itself spent almost 6 months in pre-release, but only received updates for less than half a month. Version 0.4 spent more than a year in pre-release and received updates for about 4 months. Version 0.5 takes the cake with one and a half years in pre-release, receiving updates for a year, also making it the best-maintained version.

Yellow flowers with lots of little "rays" on greenish brown stems

Tussilago farfara, March 27th, 2022

Version 0.6 is a major version bump because it introduces a number of breaking changes, including raising the minimal Node.js version to 14. Since April 2022, Node.js 12 is End-Of-Life, which led to a lot of dependencies dropping support. Now, Citation.js does so too. Other changes include the following:

Update data format to CSL v1.0.2

The internal data format is now updated from CSL v1.0.1 to v1.0.2. This introduces the software type and the generic document type, as well as some other types, and some new fields. The event field is also renamed to event-title. That, and software replacing book, makes it so that CSL v1.0.2 is not compatible with CSL v1.0.1 styles, making it a breaking change.

  • CSL data is now automatically upgraded to v1.0.2 on input.
  • Cite#data ((new Cite()).data) now contains CSL v1.0.2 data.
  • Output formatters of plugins now receive CSL v1.0.2 data as input.
  • util (import { util } from '@citation-js/core') now has two functions, downgradeCsl and upgradeCsl, to convert data between the two versions.
  • The data formatter (.format('data')) now takes a version option. When set to '1.0.1', this downgrades the CSL data before outputting.
  • @citation-js/plugin-csl already automatically downgrades CSL to v1.0.1 for compatibility with the style files.
  • Custom fields are now generally put in the custom object, instead of prefixing an underscore to the field name.

The mappings are also updated. Especially the RIS and BibLaTeX mappings were made more complete by the increased capabilities of the CSL data schema. Non-core plugins are also being updated, mainly affecting @citation-js/plugin-software-formats and @citation-js/plugin-zotero-translation-server.

Test coverage

While updating the plugin mappings, the test suites of the plugins were also expanded. This led to the identification of a number of bugs, that were also fixed in this release:

  • BibLaTeX
    • handling of CSL entries without a type
    • handling of bookpagination
    • handling of masterthesis
  • RIS
    • RegExp pattern for ISSNs
    • Name parsing of single-component names

Closing issues

A number of issues were also fixed in this release:

  • Adding full support for the Bib(La)TeX crossref field
  • Mapping BibLaTeX eid to number instead of page
  • Adding a mapping for the custom BibLaTeX s2id field
  • In Wikidata, getting issue/volume/page/publication date info from qualifiers as well as top-level properties.

CSL styles

The bundled styles (apa, vancouver, and harvard1) were updated. Note that harvard1 is now an alias for harvard-cite-them-right. Quoting the documentation:

The original “harvard1.csl” independent CSL style was not based on any style guide, but it nevertheless remained popular because it was included by default in Mendeley Desktop. We have now taken one step to remove this style from the central CSL repository by turning it into a dependent style that uses the Harvard author-date format specified by the popular “Cite Them Right” guide. This dependent style will likely be removed from the CSL repository entirely at some point in the future.
http://www.zotero.org/styles/harvard1, CC-BY-SA-3.0

Looking forward

Some breaking changes are still pending, mainly changes to the plugin API and the removal of some left-over APIs. However, I also want to work on a more comprehensive format for machine-readable mappings, a format for mappings for linked data, and of course implementing more mappings in general!

Monday, May 30, 2022

A story about a university login with a broken security configuration, and a mildly uncooperative help desk

Last semester I followed some courses at a different university, and went through the process of collecting login credentials and multi-factor authentication tokens and familiarizing myself with a network of university systems all over again. Most (but not all) of those systems use the main single sign-on login process of the university, at https://login.universityfoo.nl.

Note
The university has two main domains, let’s call them universityfoo.nl and universitybar.nl.

One of those systems is Brightspace, used by course coordinators to communicate course information, syllabi, and additional documents to students. Very important for someone new to the university, especially someone who did not go through the normal process of introduction weeks and tutorials. But when I logged in at https://login.universityfoo.nl, I was met with a blank screen. Other systems worked fine, including those to set up my email, but Brightspace did not.

Naturally I opened the trusted Chrome DevTools and saw the following error:

Refused to navigate to 'https://brightspace.universitybar.nl/d2l/lp/auth/login/samlLogin.d2l' because it violates the following Content Security Policy directive: "navigate-to 'self' https://*.universityfoo.nl:443 https://*.services.universitybar.nl:443".

That was already pretty clear: one of the Content Security Policy directives was simply blocking any navigation to any domains other than a short list of exceptions, which did not include the domain that Brightspace was on. But that seems like a major problem, one that would have been caught already unless I had some incredibly (un)lucky timing.

In the background, a Chrome window with a single tab showing a blank page. In the foreground, a Chrome DevTools showing the error mentioned above.

It turned out, however, that specifically the navigate-to directive was not supported yet at all, in any browser, at least according to MDN. However, in the Chromium code the following could be found:

// Content counterpart of ExperimentalContentSecurityPolicyFeatures in
// third_party/blink/renderer/platform/runtime_enabled_features.json5. Enables
// experimental Content Security Policy features ('navigate-to' and
// 'prefetch-src').
public static final String EXPERIMENTAL_CONTENT_SECURITY_POLICY_FEATURES = "ExperimentalContentSecurityPolicyFeatures";

Turns out I had the #enable-experimental-web-platform-features flag enabled, for some reason, and that flag probably included the EXPERIMENTAL_CONTENT_SECURITY_POLICY_FEATURES. I probably enabled the flag for development at some point? I do not even remember. But that meant the navigate-to directive was just wrong.

Since I did not want to disable the flag (or were not sure whether it would help), I instead turned to ModHeader: a Chrome web extension to modify requests and responses in the browser. I mainly use it to view DOI content negotiation requests in the browser instead of using cURL. With that I could modify the navigate-to part of the Content-Security-Policy header to the following (line breaks and [...] mine):

Content-Security-Policy: [...] navigate-to 'self'
https://*.universityfoo.nl:443
https://*.services.universitybar.nl:443
https://brightspace.universitybar.nl:443; [...]

This finally allowed me to log in to Brightspace.

Naturally I wanted to share my findings, especially since whenever navigate-to gets support without experimental flags, Brightspace log in breaks for everyone, so I went to the online university helpdesk. There, I was also met with a blank page. Imagine that. Suddenly logging in to Brightspace does not work anymore, and all the students going to the digital helpdesk are met with a blank page as well. Students panicking, the IT department (maybe) panicking because they were not doing any upgrades or maintenance or anything. Good thing I got a sneak preview of the problem, so I could warn them. First, bypassing navigate-to for the helpdesk as well:

Content-Security-Policy: [...] navigate-to 'self'
https://*.universityfoo.nl:443
https://*.services.universitybar.nl:443
https://brightspace.universitybar.nl:443
https://helpdesk.universitybar.nl:443; [...]

However, when I sent a message detailing the problem, I was met with “can you try clearing your cache?” I did, even though I knew that was not the problem, and it did not help. I did know what would help though, but they clearly did not care since I am reproducing the problem while writing the blog post almost 9 months later. When I confirmed that clearing the cache did not help, I was asked to disable #enable-experimental-web-platform-features. Which, sure, but that was not really the point. Anyway, I guess they will probably find out in time anyway, but I was still a bit disappointed.

Friday, May 27, 2022

Citation.js Version 0.5 and a 2022 update

Version 0.5.0

Version 0.5.0 of Citation.js was released on April 1st, 2021.

BibTeX and BibLaTeX

After the update to the Bib(La)TeX file parser, described in the earlier BibTeX Rework: Syntax Update blog post, the mapping of BibTeX and BibLaTeX data to CSL-JSON was also updated. The mapping is now split in two, one for BibLaTeX (which is backwards-compatible with BibTeX) and one for BibTeX. The output formats were also updated to output either BibTeX-compatible files or BibLaTeX-compatible files. The most common difference there is the use of year and month versus date respectively. In addition, a number of updates were made to the file parser.

Core changes

In the Grammar utility class, bugs were fixed an behavior was updated to better account for the Bib(La)TeX parser. Some of the code for correcting CSL-JSON was also updated, including moving the code correcting results of the Crossref API from the DOI plugin to the core module as CSL-JSON from the API may end up in Citation.js through other methods than the DOI plugin. Earlier in 0.5 development, some of the HTTP handling code was also updated for increased stability.

2022 update

v0.5.1v0.5.7

The version released since v0.5.0 mostly contain bug fixes and small enhancements. The latter includes some more descriptive errors in certain places, as well as mapping some non-standard fields in Bib(La)TeX and RIS.

New site design

The design of the Citation.js site was updated for the first time since 2018. The changes were detailed in the recent Citation.js: New site blog post.

New plugins

New plugins for the refer file format (plugin-refer) and the RefWorks tagged format (plugin-refworks) were released.

More coming

More changes are expected, including more long-awaited output formats, better mappings for software and datasets, and more work on machine-readable mappings.

Citation.js: New site

Citation.js: New site

I recently updated the website of Citation.js. This involved getting rid of the Material Design Lite framework, simplifying and refreshing the site design and modernisering some of the code behind it. Additionally, I updated the content of the homepage, and added some functionality to the interface of the blog page and the demo.

Homepage

The old layout of the homepage had a dark grey background, with in the middle a grid of four cards with the main content of the site, and between the top and bottom row the Citation.js banner-variant logo. The grid of cards had a background of syntax-highlighted source code. This is actually the start of the Citation.js v2 code, which at that point still consisted of a single file. On the very top of the page was a yellow header and in the bottom a thin black footer.

The new layout incorporates a lot of the design elements of the first design, but in a way that hopefully improves the readability and feel of the page. The yellow header remains but the links are centered instead of right-aligned. The footer is full-width (though the text is still centered) and has a larger font size and vertical padding. The grid is gone, instead the top of the page has a background of code in full-width with the banner logo and some introductory text. The other content is now aligned in a single row, and the cards are replaced with plain text, although the headers still have white text with a slight shadow on a dark background.

Blog

The blog page had the same header, footer, and dark grey background in the old layout, with individual blog posts as cards and the introductory text and search bar as a slightly wider card.

The new layout mirrors the changes to the homepage, especially the white background and full-width code background and the changing of cards to plain text. To the right of the blog content is now a sidebar listing the blog posts per year, which moves to the bottom of the page on narrow screens. Below the search bar is now a clickable list of tags.

Demo page

The design of the demo page has not been updated since I made it in April 2016, being more or less plain-text but with paragraphs limited in width and centered.

The new design adds the header, footer, and code background from the homepage as well as some styles for the headers. The interface of the demo is simplified at the cost of easy-to-read code. That also means that the live view of the code is removed.

API documentation

The styles of the home page now also apply to the API documentation.

Friday, March 26, 2021

GitHub pages 404 redirection

Recently I moved the Citation.js API documentation from /api to /api/0.3, to put the new documentation on /api/0.5. I fixed all the links to the documentation, but I still got a issue request regarding a 404 error after just a few days. All in all, I had to redirect pages from /api/* to /api/0.3/* while all these pages are hosted as static files on GitHub Pages.

There are three ways I found to do this:

  1. I make otherwise empty HTML files in /api/* that redirect to /api/0.3/* via JavaScript or a <meta> tag.
  2. I make use of jekyll-redirect-from. This is equivalent to option 1, I think.

Option 1 seemed like a hassle and I do not use Jekyll so option 2 seemed out of the question as well. However, we still have option 3 to consider:

  1. I add a 404.html to the repository which gets served automatically on a 404. It then redirects to /api/0.3/* with JavaScript, and gives guidance on how to find the new URL manually if JavaScript is disabled.

404.html is just a normal 404 page with 4 lines of JavaScript:

var docsPattern = /(\/api)(\/(?!0.[35]\/)|$)/  
  
if (docsPattern.test(location.pathname)) {  
    location.pathname = location.pathname.replace(docsPattern, '$1/0.3$2')  
}

Breaking down the RegExp pattern:

  • (\/api) matches “/api” in the URL
  • (\/(?!0.[35]\/)|$) matches one of two things, immediately after “/api”
    • Either $, the end of the string (like “https://citation.js.org/api” without the trailing slash)
    • Or \/(?!0.[35]\/), which matches a forward slash ("/api/") followed by anything except “0.3” or “0.5”. This is to avoid matching things like “/apical/” or “/api/0.3/does-not-exist”.

This is not the neatest solution but I like it conceptually. It shows a bit of potential for Single-Page Applications as well: you can serve the same HTML+JavaScript for every possible path without having to rely on URLs like https://example.org/#/path/page. The problem is that you still get the 404 HTTP status (as you should), so if a browser or search crawler decides to care you have a problem.

Try it out now: https://citation.js.org/api/

The new "Page not Found" page in the same style as the homepage.

Wednesday, February 24, 2021

Mid-week effect of Dutch COVID-19 case reporting

Every week since the start of January I heard headlines like “3963 new infections, a bit more than average”. In context, that meant that the amount of positive COVID-19 that day was higher than the average daily amount of cases in the previous seven days. Even though we heard that every week, overall the cases were going down. What was going on? Well, somewhat unsurprisingly, the amount of reported cases was higher on weekdays than on Monday and in the weekend. The news site I linked above mentions the midweek-effect as well, but on Twitter and on the radio you mainly hear the headline, not the caveats.

Anyway, I wanted to see what the midweek-effect actually looked like. The daily infection data is available from the RIVM website, RIVM being the Dutch National Institute for Public Health and the Environment. I started by reproducing the graph they made, partly to check the file and partly for fun (see Fig. 1).

Amount of new cases on each GGD notification date
Figure 1: Amount of positive COVID-19 tests since the start of December. The date used is the date when the test result was reported to the GGD, the municipal health service. In yellow are the test results RIVM got this week, in purple previous data.

To decompose the various effects in this data, I used R’s stats::stl() function on a time series with frequency=7. This took a while to figure out, as a mis-configuration on my end of the time series led to STL interpreting each day as a season, instead of each week. Because the seasonal effect is multiplicative and not additive, I had to log-transform the data as well. This means the seasonal component has a higher amplitude when the overall amount of cases is higher (see Fig. 2).

Seasonal component and trend of data previously shown
Figure 2: Weekly component and general trend of positive COVID-19 tests, overlayed on daily data. The solid line indicates the overall trend, while the dashed line combines the overall trend with the seasonal component. Yellow still indicates test results from this week, purple the previous data.

In the end, this does not mean much at all. The accepted explanation of the midweek-effect has to do with when tests are carried out and reported; not when infections actually take place. If we look at the cases where the date of disease onset is available (source, Public Domain Mark 1.0), the weekly trend is less present (see Fig. 3).

Seasonal component and trend of new cases per day of disease onset
Figure 3: Number of new confirmed cases per day of disease onset. The solid line indicates the overall trend, while the dashed line combines the overall trend with the seasonal component. The gray area show daily cases.

Comparing the two weekly trends, the midweek-effect of reported cases is higher than the weekly effect of the disease onset of cases. In addition, the latter peaks on Mondays and declines throughout the week before as opposed to the midweek-effect (see Fig. 4). Whether this effect is actually present or another artifact of data reporting is unclear to me.

Weekly pattern of GGD test results and reported disease onsets
Figure 4: Weekly component of reported cases and disease onsets. The solid line shows the pattern for the day of disease onset, the dashed line for the day when the test result is reported.

Monday, March 23, 2020

Economics of open source versus open science

Common postman

Common postman (Heliconius melpomene) on a Lantana

Almost two years ago I started participating on the then-new (now-archived) npm forum. I had been using npm for a few years at that point, and I had some free time to spend providing technical support, for fun. I fixed a number of bugs in the CLI, and users thanked me for those. My impact was limited, but the work was fulfilling. That is, until the developers at npm I had been working with got laid off.

Later in 2019 came the second hit: shortly after a popular JavaScript library started displaying ads to fund maintainers raised a ruckus on Twitter, npm started banning terminal ads. The ensuing chaos was a wake-up call for me. Lots of people started talking about the economics of open-source development, suggesting that open source is a fake ideology propagated by tech companies in Silicon Valley to generate value at no cost — to the companies, that is.

We were putting hours and hours of work into some ideology, and the corporations that profited from our open-source libraries gave us nothing in return. Everyone keeps laughing about the enormous dependency trees of Node.js projects, but that also means every project depends on a lot of other open-source projects, mostly by unpaid maintainers. Similarly, the bugs that I fixed for the npm CLI had a very small impact in the grand scheme of things, but npm is used by almost every company that uses JavaScript — most likely including Google, Amazon, Apple and Facebook. And a small percentage multiplied by almost all the tech capital in the world is still quite a lot.

This contradicts with what I have been taught about Open Science: ideally, all aspects of all science should be open to everyone, to allow small players to take part. The more small players can take part, the better the science is, both morally and in quality & quantity.

While in the tech world, a small but vocal group is trying to bring about a revolution to rethink open source to help the individual, at the same time the science community has just gotten into the idea of expanding open source — again, to help the individual. Is open science just a few years behind open source?

One important thing to note is that both revolutions are trying to bring about the same thing: fair representation. In fair open source, this is about maintainers of public infrastructure (in the form of libraries) getting part of the profit generated by companies using it. In open science, this is about letting everyone take part in science, from people without affiliation to people whose institution cannot or does not want to pay for access, and lowering the barrier by making source code and data available.

The main difference is probably that scientists usually get paid, at which point it is easier to choose whether to make your work open or not: not making it open would be a waste. Additionally, there is the notion that any science is good for science (and the world) as a whole: even if commercial pharmaceutical companies get to use open research (and open source software) from researcher that they did not fund, advances in pharmaceutics are good for everyone. Plus, open science helps the smaller players, which would be beneficial for competition and so prices (if market forces finally follow through).

In the middle of this is me. I maintain an open-source project (Citation.js) aimed at people who care about bibliographical data — e.g. scientists and librarians. It has 142 stars on GitHub. I am proud of it. Neither side really applies to me: I cannot think of any commercial application that needs my library, nor do I receive funding for working on a (very small) part of the scientific community. So, which revolution should I follow? Fair open source or open science?

At the moment, I am fine with keeping it as it is. Though tiresome, it is also fulfilling, and right now I can still use the Exposure™. For the longer term, I guess I will naively carry on until I burn out or someone convinces me otherwise.


Note: a proposed solution for fair open source is the Parity Public License: it allows people to use it in private without limitations, and otherwise it requires the project using it to be open-source as well. Additionally, it is possible to buy licenses for closed-source work. To me, it seems a bit limited. Licenses like this can quickly become complex to use. Do I want people to be able to use Citation.js on their personal website without making the website open source? I do not think that would be possible with this license, without personally giving permission to people who would want that.

There are probably better blog posts to be read about the trade-offs of such licenses. If you find any, I will add them here.