Tuesday, December 31, 2024

Citation.js: 2024 in review

This past year was relatively quiet for Citation.js as well.


Ulex europaeus, observed December 24th, 2024, Vlieland, The Netherlands.

Changes

  • BibTeX: output of non-ASCII characters was improved.
  • BibLaTeX: support for data annotations was added!
  • DOI: the DOI pattern was broadened to include non-standard DOI formats.
  • Support for ORCIDs was improved, making it possible to map authors’ ORCIDs to different formats.

New Year’s Eve tradition

After the releases on New Year’s Eve of 2016, 2017, 2021, 2022, and 2023, this New Year’s Eve also brings the new v0.7.17 release. The CSL field publisher is now mapped to the BibTeX field organization for paper-conference (inproceedings) entries.

Happy New Year!

Tuesday, October 15, 2024

Next.js, SWC, and citeproc-js

Last year I got a bug report that Citation.js was not working when built in a Next.js production environment for unclear reasons. Next.js is a popular server framework to make web applications with React, and by default transforms all JavaScript files and their dependencies into “chunks” to improve page load times. In production environments, Next.js uses the Rust-based “Speedy Web Compiler” SWC to optimize and minify JavaScript code.

I was able to figure out that somewhere, this process transformed an already difficult-to-grok function (makeRegExp) in the citeproc dependency into actually broken code. After some trial and error I found the following MCVE (Minimal Complete Verifiable Example):

function foo (bar) {
    var bar = bar.slice()
    return bar
}

foo(["bar"])

// equivalent to

function foo (bar) {
    var bar // a no-op in this case, apparently
    bar = bar.slice()
    return bar
}

foo(["bar"])

But then, in the chunks generated by Next.js, the argument bar gets optimized away from foo(), generating the following code (it also inlines the function).

var bar;
bar = bar.slice();

Now, this is a simple mistake to make. If you expect var bar to actually re-declare the bar argument, the argument is clearly unused and can be removed. Due to the quirks of JavaScript that is not the case though, and the incorrect assumption leads to incorrect code.

This is not a one-off thing though: last August I got another, similar bug report with the same cause: some slightly non-idiomatic code (CSL.parseXml) from citeproc got mis-compiled by SWC. I found another MCVE:

function foo (arg) {
    const a = { b: [] }
    const c = [a.b]
    c[0].push(arg)
    return a.b[0]
}

The compiler misses that c[0] refers to the same object as a.b and thinks that makes the function a no-op, though it does not optimize it away fully, instead producing the following:

function (n) {
    return [[]][0].push(n), [][0]
}

This was apparently already noticed and fixed last May though the SWC patch still has to land in a stable version of Next.js. Interestingly, the patch includes a test fixture that uses CSL.parseXml as example code; apparently citeproc is a good stress-test of JavaScript compilers.

This is all fine with me, I am not going to blame the maintainers of a complex open-source project like SWC for occasional bugs. However, I would like to see a popular framework like Next.js, with 6.5 million downloads per week and corporate backing, to do more testing for such essential parts of their infrastructure. I also do not see them among the sponsors of SWC.

Edited 2024-10-15 at 17:26: Actually, the creator for SWC is also a maintainer for Next.js, though I do not know in which order. Given that, it makes more sense that they switched away from the well-tested but slower BabelJS in version 12, and more confusing why they did not test it a bit more thoroughly.

Tuesday, March 5, 2024

Citation.js: BibLaTeX Data Annotations support

Version 0.7.9 of Citation.js comes with a new feature: plugin-bibtex now supports the import and export of Data Annotations in BibLaTeX files. This means ORCID identifiers from DOI, Wikidata, CFF, and other sources can now be exported to BibLaTeX. Combined with a BibLaTeX style that displays ORCID identifiers, you can now quickly improve your reference lists with ORCIDs.

const { Cite } = require('@citation-js/core')  
require('@citation-js/plugin-bibtex')  
require('@citation-js/plugin-doi')

Cite
  .async('10.1111/icad.12730')
  .then(cite => cite.format('biblatex'))

This produces the following BibLaTeX file (note the author+an:orcid field):

@article{Willighagen2024Mapping,  
  author = {Willighagen, Lars G. and Jongejans, Eelke},  
  author+an:orcid = {1="http://orcid.org/0000-0002-4751-4637"; 2="http://orcid.org/0000-0003-1148-7419"},  
  journaltitle = {Insect Conservation and Diversity},  
  shortjournal = {Insect Conserv Diversity},  
  doi = {10.1111/icad.12730},  
  issn = {1752-458X},  
  date = {2024-03-02},  
  language = {en},  
  publisher = {Wiley},  
  title = {Mapping wing morphs of \textit{{Tetrix} subulata} using citizen science data: Flightless groundhoppers are more prevalent in grasslands near water},  
  url = {http://dx.doi.org/10.1111/icad.12730},  
}

References

Saturday, March 2, 2024

Including ORCID identifiers in BibLaTeX (and using them)

On the Fediverse, @petrichor@digipres.club posited the question how to include identifiers for authors in Bib(La)TeX-based bibliographies:

Any Bib(La)TeX/biber users have a preferred way to include author identifiers like ORCID or ISNI in your .bib file? Ideally supported by a citation style that will include the identifiers and/or hyperlink the authors.

https://digipres.club/@petrichor/112020378570169913

I have wanted to try including ORCIDs in bibliographies for a while now, and while CSL-JSON makes it nearly trivial to encode, neither CSL styles nor CSL processors are at the point where those can actually be inserted in the formatted bibliography. However, BibLaTeX may grant more opportunities, so this piqued my interest.

I first thought of the Extended Name Format (Kime et al., 2023, §3.4), which allows breaking names up in key-value pairs. Normally, those are reserved for name parts (family, given, etc.), but I believed I had seen a way to define additional “name parts”, one of which could be used for specifying the ORCID. However, in the process of figuring that out, I found the actual, intended, proper solution.

BibLaTeX has, exactly for things like this, Data Annotations (Kime et al., 2023, §3.7). For every field, or every item of every field in the case of list fields, additional annotations can be provided. (There are some additional features and nuances; for a full explanation see the manual.) For ORCIDs, data annotations could look like this:

@software{willighagen_2022_7017208,
  author          = {Willighagen, Lars and
                     Willighagen, Egon},
  author+an:orcid = {1="0000-0002-4751-4637"; 2="0000-0001-7542-0286"},
  title           = {ISAAC Chrome Extension},
  month           = aug,
  year            = 2022,
  publisher       = {Zenodo},
  version         = {v1.4.0},
  doi             = {10.5281/zenodo.7017208}
}

Now, implementing it in a BibLaTeX style proved more difficult than I hoped, but that might have been due to my inexperience with argument expansion and Biber internals. I started with the authoryear style and looked for the default name format that it uses in bibliographies; this turned out to be family-given/given-family. I copied that definition, and amended it to include the ORCID icon after each name (when available). To insert the icon, I used the orcidlink package. This part was tricky, as \getitemannotation does not work in an argument to \orcidlink or \href, but I ended up with the following.

\DeclareNameFormat{family-given/given-family}{%
  % ...
  \hasitemannotation[\currentname][orcid]
    {~\orcidlink{\expandafter\csname abx@annotation@literal@item@\currentname @orcid@\the\value{listcount}\endcsname}}
    {}%
  % ...
  }

References. Willighagen, Lars [ORCID icon with cyan outline] and Egon Willighagen [ORCID icon with cyan outline] (Aug. 2022). ISAAC Chrome Extension. Version v1.4.0. DOI: 10.5281/zenodo.7017208

You could repeat the same with ISNI links, or Wikidata, VIAF, you get the idea. Then you could put the \DeclareNameFormat in a new authoryear-orcid.bbx file so that the changes do not show up in the in-text citations, and set the bibliography style like so:

\usepackage[bibstyle=authoryear-orcid]{biblatex}

This can all be seen in action on Overleaf: https://www.overleaf.com/read/gvxqmrqmwswh#f156b5

References

Friday, February 2, 2024

Three new userscripts for Wikidata

Today I worked on three user scripts for Wikidata. Together, these tools hopefully make the data in Wikidata more accessible and make it easier to navigate between items. To enable these, include one or more of the following lines in your common.js (depending on which script(s) you want):

mw.loader.load('//www.wikidata.org/w/index.php?title=User:Lagewi/properties.js&oldid=2039401177&action=raw&ctype=text/javascript');
mw.loader.load('//www.wikidata.org/w/index.php?title=User:Lagewi/references.js&oldid=2039248554&action=raw&ctype=text/javascript');
mw.loader.load('//www.wikidata.org/w/index.php?title=User:Lagewi/bibliography.js&oldid=2039245516&action=raw&ctype=text/javascript');

User:Lagewi/properties.js

Note: It turns out there is a Gadget, EasyQuery, that does something similar to this user script.

Inspired by the interface for observation fields on iNaturalist, I wanted to easily find entities that also used a certain property, or that had a specific property-value combination. This script adds two types of links:

  • For each property, a link to query results listing entities that have that property.
  • For each statement, a link to query results listing entities that have that claim, e.g. this property-value combination. (This does not account for qualifiers.)

These queries are certainly not useful for all properties: listing instance of (P31) human (Q5) is not particularly meaningful in the context of a specific person. However, sometimes it just helps to find other items that use a property, and listing other compounds found in taxon (P703) Pinus sylvestris (Q133128) is interesting.

https://www.wikidata.org/wiki/User:Lagewi/properties.js

Screenshot of three claims on a Wikidata page. Below each property is a link titled "List entities with this property", and below each value a link titled "List entities with this claim".

User:Lagewi/references.js

Sometimes, the data on Wikidata does not answer all your questions. Some types of information are difficult to encode in statements, or simply has not been encoded on Wikidata yet. In such cases, it might be useful to go through the references attached to claims of the entity, for additional information. To simplify this process, this user script lists all unique references based on stated in (P248) and reference URL (P854). The references are listed in a collapsible list below the table of labels and descriptions, collapsed by default to not be obtrusive.

https://www.wikidata.org/wiki/User:Lagewi/references.js

Screenshot of the Wikidata page of Sylvie Deleurance (Q122350848) with, below the list of labels, descriptions, and aliases, a collapsible list titled "Other resources", containing links.

User:Lagewi/bibliography.js

Going a step further than the previous script, this user script appends a full bibliography at the bottom of the page. This uses the (relatively) new citation-js tool hosted on Toolforge (made using Citation.js). Every reference in the list of claims has a footnote link to its entry in the bibliography, and every entry in the bibliography has a list of back-references to the claims where the reference is used, labeled by the property number.

https://www.wikidata.org/wiki/User:Lagewi/bibliography.js

Screenshot of two claims on Wikidata, each with a reference. Both references have a darker grey header containing a link, respectively "[4]" and "[5]".

Screenshot of a numbered list titled "Bibliography". Each item in the list is a formatted reference, and below each reference is a series of links titled with a property number.

Sunday, December 31, 2023

Citation.js: 2023 in review

This past year was relatively quiet for Citation.js, as changes to the more prominent components (BibTeX, RIS, Wikidata) start to slow down. I believe this is a good sign, and that it indicates the quality of the mappings is high. Nonetheless, following the reworks of BibTeX and RIS in the past couple of years, some room for improvement still came up.


Tytthaspis sedecimpunctata, observed May 9th, 2021, Sint-Oedenrode, The Netherlands.

Changes

  • BibTeX: The mappings of the fields howpublished, langid, and addendum are improved. Plain BibTeX now allows doi, url, and more (see below).
  • RIS: PY (publication year) is now always exported, and imported more resiliently.
  • Wikidata: Names of institutions, and author names that differ from the person name, are now handled better.
  • CSL: a single citation entry can now be exported as documented.

New plugins

New users

  • I started working on a publicly available API running Citation.js on Wikimedia Toolforge. This API can be used to extract bibliographical data from Wikidata items, or to import bibliographical data into Wikidata. Available at https://citation-js.toolforge.org/.
  • I found out that Codeberg uses Citation.js for the “Cite this repository” feature.

New Year’s Eve tradition

After the releases on New Year’s Eve of 2016, 2017, 2021, and 2022, this New Year’s Eve also brings the new v0.7.5 release. It turns out that plain BibTeX has more fields than documented in the manual! At some point, natbib introduced the plainnat styles which include doi, eid, isbn, issn, and url. These are now supported in bibtex export, as well as strict BibTeX import. (The default, non-strict BibTeX import is basically just BibLaTeX, for which these fields were already supported.) Thank you to @jheer and @kc9jud for bringing this up!

Happy New Year!

Saturday, August 12, 2023

Finding shield bug nymphs on iNaturalist

Working on translating a key to the European shield bug nymphs (Puchkov, 1961) I thought I would look for pictures of the earlier life stages (nymphs, Fig. 1) of shield bugs (Pentatomoidea) on iNaturalist and found few observations actually had the life stage annotation. I do not have the exact numbers of Europe as a whole at that point in time, but Denmark currently has around 19.8% and the United Kingdom has around 29.4% of the observations annotated (GBIF.org, 2023).

Figure 1: Fourth instar nymph of Nezara viridula (Linnaeus, 1758). 2023.vi.22, Bad Bellingen, Germany.

So I set out to add those annotations myself instead, starting with the Netherlands, followed by the rest of the Benelux, Germany, and Ireland. Last Monday, I finished the annotating the observations of France. These regions total to about 80 000 observations, of which I annotated a bit more than 40 000 (again, I do not have the exact numbers from before I started).

Methods

I made these annotations on the iNaturalist Identify tool which has plenty of keyboard shortcuts that I found after using the mouse for 2000 observations. This allowed me to develop some muscle memory, and I ended up annotating a single page of 30 observations in around 60 seconds, so 2 seconds per observation. Most of that time was usually spent waiting for the images to load, and there were plenty of small glitches in the interface to further slow me down (including a memory leak requiring me to reload every 10-ish pages).

I was not able to annotate 715 of the verifiable observations (i.e. those with pictures, a location, and a time). In some cases, the pictures were simply not clear enough (or taken too closely) for me to determine with certainty the life stage. Another issue I had to work around were observations of multiple individuals at different life stages. Common were observations of egg clusters and just-hatched nymphs of Halyomorpha halys (Stål, 1855); the “parent bug” Elasmucha grisea (Linnaeus, 1758) doing parenting; kale plants infested with adults and nymphs of Eurydema; and adults of various species in the process of laying eggs. However, there were also many observations containing multiple pictures where one was of an adult and a second of a nymph, with no indication that it was the same individual at different times. There is currently no way to annotate multiple life stages on a single observation on iNaturalist except through non-standard observation fields, which are a lot more laborious to use and can be disabled by users.

Results

Coloring the observations by life stage on a map clearly shows the effect of the work, with the aforementioned countries covered in red; and the most of the rest of Europe in blue (Fig. 2). (There are two other notable red patches, in Abruzzo, Italy and in Granada, Spain. These are not my doing, and seem to be caused by two prolific observers annotating their own observations, respectively esant and aggranada.)

Figure 2: Map of research-grade iNaturalist observations of Pentatomoidea in Europe, colored by whether or not they have a life stage annotation.

These annotations mean additional data is available on the seasonality of these species. For example, looking at the four most observed species already reveals that Pentatoma rufipes (Linnaeus, 1758) overwinters as nymphs, whereas the other three species overwinter as adults (Fig. 3). The larger volume of data also means that more detailed analyses with more explanatory variables can be carried out. For example, the effect of climate change on the life cycle of invasive species like H. halys.

Figure 3: Seasonality of nymphs and adults of the four most of observed shield bug species.

In addition, for less common species the classification of life stages makes it possible to find more about the morphology of the earlier life stages of these species. This is useful for individuals who are working on keys (such as me), but perhaps also for computer vision models. Classifying the not-yet identified observations of nymphs as such also allows for more targeted searches by identifiers, potentially leading to even more research-grade observations of rarer species.

It should be said though, that even Chlorochroa pinicola (Mulsant & Rey, 1852), which is not particularly common in West Europe, still has many more validated pictures on Waarneming.nl than on iNaturalist. In fact, nearly half (43.2%) of all observations with images of Pentatomoidea in Europe are in the Netherlands. These are not all annotated with a life stage though, and the Observation.org platform (which Waarneming.nl is part of) seemingly only allows curators and observers add life stage annotations to an observation.

Luckily, iNaturalist does allow for this and enables me to contribute hopefully valuable data to GBIF for further analysis, by myself or by others. I will continue adding annotations — I have now started on the observations from Switzerland, luckily a lot fewer than those from France. At the same time, I am maintaining the high rate of annotation in the countries I have already annotated. In August, this means annotating about 200 observations per day (10–15 minutes) which is entirely doable. It does quickly start to add up if you are on holiday for a week, as you do in August, but that is still fewer observations than the entirety of France. Still, for this reason I hope other identifiers (or even better, observers) start annotating more as well.

References


Written with StackEdit.