Friday, August 16, 2019

Debugging the Karmabug

Debugging the Karmabug

For reasons I will not go through right now, I needed a new library for making synchronous HTTP requests in Node.js. I know what you are saying, “But that’s one of the seven deadly sins of JavaScript!” 1 Well, just know I had my reasons, and I wanted to replace sync-request.

Since I was already using the Fetch API with node-fetch in the async part of my library, I thought: why not build a sync-fetch, using node-fetch under the hood like sync-request uses (then-)request. Two days later, it was actually working, as far as I could tell. However, if I wanted to publish this horror I need some actual testing.

Luckily node-fetch has a nice test suite, I only needed to convert 194 test cases to use the synchronous API. Not fun work, but worth its while, maybe. Anyway, the first test cases worked, but then it got stuck on the first actual request.

This is were I have to introduce you to the Karmabug. You see, after some testing I figured out that only the combination of my sync-fetch and the test server just… stopped. The arguments were correct, my fetch works with https://example.org and the test server works with node-fetch, but this combination simply did not. Investigating either pointed to the other, and I had no idea what to do next.

That would make a good tweet, I thought. “Karma for making sync HTTP requests I guess.” Literally two minutes later it hit me: that was exactly what was going on. The test server could not respond to the requests because the request itself was blocking the event loop.

1 Incidentally, the seven deadly sins of JavaScript all happen to be Sloth.

Monday, August 5, 2019

Citation.js: RIS Rework Pt. 2

Citation.js: RIS Rework Pt. 2

In the last post I explained how I started implementing the RIS specification that I found in the Internet Archive, only to discover that there is an older specification, which seems to be more common at times.

Now, I have implemented the old spec, which luckily was not nearly as complex. One thing that came to my attention was that there were a lot of redundant tags: for title, there are TI, T1, CT and usually BT; for journal names there’s JO & JF, and JA, J1, & J2 for abbreviated journal names. While I can imagine some nuanced difference in meaning between those tags, those meanings are not documented, and not trivial to figure out either. Anyway, that is not a problem for the implementation.

I also updated the implementation of the new spec, to fix some mistakes and add some more mappings. In addition, because in real life there seem to be some implementations that export a mix of the two specifications, I created an implementation based on the new spec, that if needed can defer to the old one — and some random properties that Wikipedia and Zotero have picked up somewhere, and are not in either spec.

How do the results look? First of all, the example that was giving me issues in the last post looks a lot nicer now:

{ issue: '1',
  page: '230-265',
  type: 'article-journal',
  volume: '47',
  title: 'On computable numbers, with an application to the Entscheidungsproblem',
  author: [ { family: 'Turing', given: 'Alan Mathison' } ],
  issued: { 'date-parts': [ [ 1937 ] ] },
  'container-title': 'Proc. of London Mathematical Society' }

The only thing that was missing when I first tried this out was the end of the page range, because SP in the new spec is the entire page range, while in the old spec you need both SP and EP. I had to fix that manually — not a problem, just something to keep in mind when re-running the scripts.

One other thing to check was how to the mappings look from above, without all the type-specific shenanigans. I keep a (public) spreadsheet with mappings from CSL-JSON to all kinds of different formats, so I added the RIS mappings. So, a sanity check. Does it make sense?

RIS mappings

No, not at all! The RIS tag SE is mapped to 10 different CSL variables, and the CSL number variables is mapped to 9 different RIS tags. To me, it does not make any sense, even accounting for the fact that I know there is some variation between entry types.

The question that remains is, does it work? Even if it does not look like it makes sense, the output could still make sense, if other implementations follow the specification to a similar degree. I know Zotero does not entirely follow it — all the spec anomalies that are implemented are attributed to weird EndNote quirks, not the weird spec quirks.

That made me wonder to what degree the EndNote implementation follows the specification. However, I do not have EndNote, so this is a call to action! Can you help me with clearing up the RIS cloud for me by submitting your RIS exports to a CC0-licensed repo? Preferably with all kinds of reference types — articles, books, chapters, conference papers, webpages, software, patents, bills, maps, artworks, whatever you can find. For legal reasons, please replace abstracts and other copyrightable content by [... omitted].

In the meantime, I will be collecting RIS exports from other sources, like Zotero and Mendeley and websites like Google Scholar, BMC and PubMed Central. If you know of any other sources, please let me know!