Tuesday, October 21, 2025

Updates to taxonomic coverage and search result scoring

Two parts of the Library of Identification Resources have gotten major updates in the last year. First, the taxon coverage field (also labeled “For identifying …” in some places) is now linked to external databases, namely Wikidata and GBIF. Second, the scoring and sorting of search results in the Find resources tool was made more transparent and visible.

Taxon coverage

The taxon coverage field specifies the taxa to which the identification resource applies. For example, for the Key to the British Scathophagidae (Diptera) by Stuart G. Ball (B17) that would be the family Scathophagidae.

Previously, this was a plain-text field. Now, the values are all linked to a separate table. In this table, the rank of the taxon is given, and mappings are made with Wikidata and GBIF. The main taxonomy data in GBIF does not include minor ranks such as superfamilies, subfamilies, and subgenera; for those taxa, the GBIF identifiers of its children are recorded instead. The same goes for outdated taxa which are now considered paraphyletic, or were synonymized or split up.

Additionally, the parent taxa of each taxon are recorded, allowing statistics on the number of resources in larger groups. This is also shown on the taxon pages, such as that of Animalia (T55):

Screenshot of page about the taxonomic kingdom Animalia, with a permalink, GBIF identifier, links to Wikidata and Scholia, classification ("Biota > Animalia"), and a donut chart labelled "Children" with sections "Arthropoda" (~85%), "Chordata" (~7%), etc.

Search result sorting

In the Find resources tool, available at identification-resources.github.io/find-resources, search results are scored on a number of different factors. The score is now visually displayed next to the taxonomic completeness. When clicking on the scores, their factors are now shown in three groups which are explained in more detail.

In addition, results can now be sorted by those specific groups of factors, instead of only by the total score. Results can now also be filtered by language or by characteristics of observations and/or organisms, including keys specifically for females or males, or keys for nests galls, eggs, nymphs, etc.

Sunday, September 28, 2025

10 years of Citation.js

10 years ago, on September 28th, 2015, I pushed the first commit of “C[1]”, which would later form the basis of Citation.js. Back then, it was a simple webapp that took bibliographical data from manual input from a form and converted it to APA. I had not learned about Citation Style Language (CSL) or CSL-JSON yet, so the implementation was not particularly interoperable, but it served its purpose: allowing me and my classmates to format bibliographies without stressing over the correct punctuation.

In April of 2016 I created the larsgw/citation.js repository on GitHub, containing a JavaScript file for browser usage of Citation.js. In September of the same year, I added support for Node.js, including a CLI. Next, in November of 2018 I moved most of the code to a new repository under the new citation-js GitHub organization. Finally, in 2019 I published an article in PeerJ Computer Science (doi:10.7717/peerj-cs.214) on the software and mappings. Since then, development has been relatively stable.

During that time, it has been used in several projects. We have used it in Scholia for importing metadata from DOIs and ISBNs and exporting citations. At the Cuneiform Digital Library Initiative we use it to provide page citations and for importing BibTeX metadata. Additionally, it is used in Forgejo (and Gitea) to implement CITATION.cff support, which can also be seen on Codeberg. It is also used in several blogs and personal websites. If you know of any other cool uses, please let me know!

On to the next 10 years!

Striped fly (Stomorhina lunata) on the stalk of a yellow composite flower, with a diffuse beige background
Stomorhina lunata, 9.vi.2025, Noorbeek, NL

Monday, September 1, 2025

New paper: "Library of Identification Resources: a FAIR overview of taxonomic keys"

Biodiversity research is supported by an ever-increasing volume of citizen science observations, on platforms such as Waarneming.nl/Observation.org and iNaturalist.org. Taxonomic expertise is essential to sustain these platforms, but can be difficult to spread due to the decentralized nature of many citizen science projects. In our new scientific article in Biodiversity Data Journal we describe how and why to record information resources for the taxonomic identification of organisms in a FAIR database, and how to query that data to find applicable resources for an observation.

So I created the Library of Identification Resources (LoIR) which so far contains 2,158 records of such information resources, 54% of which are freely available online. At the moment, most resources are meant for groups of insects in parts of Northwestern Europe, but anyone can help by adding more resources!

See below for caption
Fig. 1: Geographical and taxonomic focus of the resources currently included in the Library of Identification Resources. (A) Choropleth of the geographic scopes of resources in the catalog. 460 publications with a geographic scope that cannot be expressed in administrative borders were omitted. (B) Breakdown of publications by the taxonomic group and continent. Publications spanning multiple continents and/or multiple taxonomic groups are counted for the category “Other”.

A major feature of the LoIR is a special search engine, where someone can enter an observation of an organism, for example a hoverfly in Nijmegen, The Netherlands, and it returns the most applicable resources for that observation. It works by comparing the list of expected species of hoverflies in The Netherlands to the different available resources. Try it out!

As the database and search engine grow, more and more citizen scientists should be able to find the resources needed to continue their extensive work.

The article, written with Eelke Jongejans, can be found here: https://doi.org/10.3897/BDJ.13.e161726

Monday, March 24, 2025

New paper: Identification of Cholevinae larvae

In 2022, I started my Master’s in Biology, at Radboud University in the Netherlands where I had just finished my Bachelor’s degree. The Master’s programme includes two research internships of 36 EC (approx. 6 months), both of which include writing a thesis. As I had been working on a database of identification keys, I was interested in a project focused on taxonomy for my first research internship.

Thanks to Henk Siepel I ended up contacting Menno Schilthuizen at Naturalis, who suggested I work on Cholevinae larvae. Schilthuizen had been collecting Cholevinae larvae since the 1980s, and had also received material from Peter Zwick who started collecting larvae in different areas of Germany in the 1960s. The challenge was to use this material to make an identification key based on these specimens.

Although the first description of a larva of Cholevinae was published back in 1961 by J. C. Schiødte, descriptions have since been relatively few and far between. This also meant that there are almost no existing identification keys for the larval Cholevinae. Making these descriptions and keys is difficult, as you need larvae from a known species. This is only possible if the larvae are cultured from adults, which takes time and effort, if molts are collected and the emerged adults is identified, or if DNA barcoding can be used. The specimens collected by Zwick and Schilthuizen mainly used the first method.

However, there happened to be a recent, detailed description of Sciodrepoides watsoni, a species for which I also had specimens. I started by comparing the larvae of S. watosni (as well as a few of the related S. fumatus) to the drawings and descriptions made by Kilian and Mądra. From there, I could start looking at different species and identify potential areas and types of characteristics that are consistent enough within a species, but that differ between separate species. To illustrate these differences I also made schematic drawings (Fig. 1) of different sets of characteristic features. Finally, I measured certain parts of the larvae, where possible for specimens preserved in microscope slides.


Figure 1: Illustrations of Cholevinae larvae

At the end of the 6 months, I had a complete key to all species for which specimens were available, but only for the 1st instar. When the larvae molt for the first time, they gain secondary bristles, grow in size, and more, meaning the identifying characteristics cannot always be used for both the 1st instar, and the 2nd and 3rd instars. I ended up spending another year or so to finalize the key for all instars. This includes 28 of the 39 species of Cholevinae occurring in the Netherlands, and a lot of descriptions for which no (detailed) description was available previously. In a true full circle moment, I could add my own work to the aforementioned database of identification keys (as B1860).

Ultimately, collaborating with Schilthuizen, Siepel, and Zwick, this culminated in an article, Comparative morphology of the larval stages of Cholevinae (Coleoptera: Leiodidae), with special reference to those in the Netherlands. We were able to publish this in the final issue of Tijdschrift voor Entomologie, which is unfortunately being discontinued after 167 volumes. Again, many thanks to Menno Schilthuizen, Peter Zwick, and Henk Siepel for this great opportunity. Check it out!

References

  • Willighagen, L. (2022, augustus 6). Library of Identification Resources. Syntaxus Baccata. https://doi.org/10.59350/h8qka-z4a05
  • Schiødte, J. C. (1861). De metamorphosi eleutheratorum observationes: Bidrag til insekternes udviklingshistorie (pp. 1–558). Thieles Bogtrykkeri. https://doi.org/10.5962/bhl.title.8797
  • Kilian, A., & Mądra, A. (2015). Comments on the biology of Sciodrepoides watsoni watsoni (Spence, 1813) with descriptions of larvae and pupa (Coleoptera: Leiodidae: Cholevinae). Zootaxa, 3955(1), 45–64. https://doi.org/10.11646/zootaxa.3955.1.2
  • Willighagen, L. G., Schilthuizen, M., Siepel, H., & Zwick, P. (2025). Comparative morphology of the larval stages of Cholevinae (Coleoptera: Leiodidae), with special reference to those in the Netherlands. Tijdschrift Voor Entomologie, 167, 59–101. https://doi.org/10.1163/22119434-bja10033

Written with StackEdit.

Tuesday, December 31, 2024

Citation.js: 2024 in review

This past year was relatively quiet for Citation.js as well.


Ulex europaeus, observed December 24th, 2024, Vlieland, The Netherlands.

Changes

  • BibTeX: output of non-ASCII characters was improved.
  • BibLaTeX: support for data annotations was added!
  • DOI: the DOI pattern was broadened to include non-standard DOI formats.
  • Support for ORCIDs was improved, making it possible to map authors’ ORCIDs to different formats.

New Year’s Eve tradition

After the releases on New Year’s Eve of 2016, 2017, 2021, 2022, and 2023, this New Year’s Eve also brings the new v0.7.17 release. The CSL field publisher is now mapped to the BibTeX field organization for paper-conference (inproceedings) entries.

Happy New Year!

Tuesday, October 15, 2024

Next.js, SWC, and citeproc-js

Last year I got a bug report that Citation.js was not working when built in a Next.js production environment for unclear reasons. Next.js is a popular server framework to make web applications with React, and by default transforms all JavaScript files and their dependencies into “chunks” to improve page load times. In production environments, Next.js uses the Rust-based “Speedy Web Compiler” SWC to optimize and minify JavaScript code.

I was able to figure out that somewhere, this process transformed an already difficult-to-grok function (makeRegExp) in the citeproc dependency into actually broken code. After some trial and error I found the following MCVE (Minimal Complete Verifiable Example):

function foo (bar) {
    var bar = bar.slice()
    return bar
}

foo(["bar"])

// equivalent to

function foo (bar) {
    var bar // a no-op in this case, apparently
    bar = bar.slice()
    return bar
}

foo(["bar"])

But then, in the chunks generated by Next.js, the argument bar gets optimized away from foo(), generating the following code (it also inlines the function).

var bar;
bar = bar.slice();

Now, this is a simple mistake to make. If you expect var bar to actually re-declare the bar argument, the argument is clearly unused and can be removed. Due to the quirks of JavaScript that is not the case though, and the incorrect assumption leads to incorrect code.

This is not a one-off thing though: last August I got another, similar bug report with the same cause: some slightly non-idiomatic code (CSL.parseXml) from citeproc got mis-compiled by SWC. I found another MCVE:

function foo (arg) {
    const a = { b: [] }
    const c = [a.b]
    c[0].push(arg)
    return a.b[0]
}

The compiler misses that c[0] refers to the same object as a.b and thinks that makes the function a no-op, though it does not optimize it away fully, instead producing the following:

function (n) {
    return [[]][0].push(n), [][0]
}

This was apparently already noticed and fixed last May though the SWC patch still has to land in a stable version of Next.js. Interestingly, the patch includes a test fixture that uses CSL.parseXml as example code; apparently citeproc is a good stress-test of JavaScript compilers.

This is all fine with me, I am not going to blame the maintainers of a complex open-source project like SWC for occasional bugs. However, I would like to see a popular framework like Next.js, with 6.5 million downloads per week and corporate backing, to do more testing for such essential parts of their infrastructure. I also do not see them among the sponsors of SWC.

Edited 2024-10-15 at 17:26: Actually, the creator for SWC is also a maintainer for Next.js, though I do not know in which order. Given that, it makes more sense that they switched away from the well-tested but slower BabelJS in version 12, and more confusing why they did not test it a bit more thoroughly.

Tuesday, March 5, 2024

Citation.js: BibLaTeX Data Annotations support

Version 0.7.9 of Citation.js comes with a new feature: plugin-bibtex now supports the import and export of Data Annotations in BibLaTeX files. This means ORCID identifiers from DOI, Wikidata, CFF, and other sources can now be exported to BibLaTeX. Combined with a BibLaTeX style that displays ORCID identifiers, you can now quickly improve your reference lists with ORCIDs.

const { Cite } = require('@citation-js/core')  
require('@citation-js/plugin-bibtex')  
require('@citation-js/plugin-doi')

Cite
  .async('10.1111/icad.12730')
  .then(cite => cite.format('biblatex'))

This produces the following BibLaTeX file (note the author+an:orcid field):

@article{Willighagen2024Mapping,  
  author = {Willighagen, Lars G. and Jongejans, Eelke},  
  author+an:orcid = {1="http://orcid.org/0000-0002-4751-4637"; 2="http://orcid.org/0000-0003-1148-7419"},  
  journaltitle = {Insect Conservation and Diversity},  
  shortjournal = {Insect Conserv Diversity},  
  doi = {10.1111/icad.12730},  
  issn = {1752-458X},  
  date = {2024-03-02},  
  language = {en},  
  publisher = {Wiley},  
  title = {Mapping wing morphs of \textit{{Tetrix} subulata} using citizen science data: Flightless groundhoppers are more prevalent in grasslands near water},  
  url = {http://dx.doi.org/10.1111/icad.12730},  
}

References