Syntaxus baccata: BibTeX

Showing posts with label BibTeX. Show all posts

Tuesday, March 5, 2024

Citation.js: BibLaTeX Data Annotations support

Version 0.7.9 of Citation.js comes with a new feature: plugin-bibtex now supports the import and export of Data Annotations in BibLaTeX files. This means ORCID identifiers from DOI, Wikidata, CFF, and other sources can now be exported to BibLaTeX. Combined with a BibLaTeX style that displays ORCID identifiers, you can now quickly improve your reference lists with ORCIDs.

const { Cite } = require('@citation-js/core')  
require('@citation-js/plugin-bibtex')  
require('@citation-js/plugin-doi')

Cite
  .async('10.1111/icad.12730')
  .then(cite => cite.format('biblatex'))

This produces the following BibLaTeX file (note the author+an:orcid field):

@article{Willighagen2024Mapping,  
  author = {Willighagen, Lars G. and Jongejans, Eelke},  
  author+an:orcid = {1="http://orcid.org/0000-0002-4751-4637"; 2="http://orcid.org/0000-0003-1148-7419"},  
  journaltitle = {Insect Conservation and Diversity},  
  shortjournal = {Insect Conserv Diversity},  
  doi = {10.1111/icad.12730},  
  issn = {1752-458X},  
  date = {2024-03-02},  
  language = {en},  
  publisher = {Wiley},  
  title = {Mapping wing morphs of \textit{{Tetrix} subulata} using citizen science data: Flightless groundhoppers are more prevalent in grasslands near water},  
  url = {http://dx.doi.org/10.1111/icad.12730},  
}

References

Willighagen, L. (2024). Including ORCID identifiers in BibLaTeX (and using them). Syntaxus Baccata. https://doi.org/10.59350/bk8yd-b1307

Saturday, March 2, 2024

Including ORCID identifiers in BibLaTeX (and using them)

On the Fediverse, @petrichor@digipres.club posited the question how to include identifiers for authors in Bib(La)TeX-based bibliographies:

Any Bib(La)TeX/biber users have a preferred way to include author identifiers like ORCID or ISNI in your .bib file? Ideally supported by a citation style that will include the identifiers and/or hyperlink the authors.

https://digipres.club/@petrichor/112020378570169913

I have wanted to try including ORCIDs in bibliographies for a while now, and while CSL-JSON makes it nearly trivial to encode, neither CSL styles nor CSL processors are at the point where those can actually be inserted in the formatted bibliography. However, BibLaTeX may grant more opportunities, so this piqued my interest.

I first thought of the Extended Name Format (Kime et al., 2023, §3.4), which allows breaking names up in key-value pairs. Normally, those are reserved for name parts (family, given, etc.), but I believed I had seen a way to define additional “name parts”, one of which could be used for specifying the ORCID. However, in the process of figuring that out, I found the actual, intended, proper solution.

BibLaTeX has, exactly for things like this, Data Annotations (Kime et al., 2023, §3.7). For every field, or every item of every field in the case of list fields, additional annotations can be provided. (There are some additional features and nuances; for a full explanation see the manual.) For ORCIDs, data annotations could look like this:

@software{willighagen_2022_7017208,
  author          = {Willighagen, Lars and
                     Willighagen, Egon},
  author+an:orcid = {1="0000-0002-4751-4637"; 2="0000-0001-7542-0286"},
  title           = {ISAAC Chrome Extension},
  month           = aug,
  year            = 2022,
  publisher       = {Zenodo},
  version         = {v1.4.0},
  doi             = {10.5281/zenodo.7017208}
}

Now, implementing it in a BibLaTeX style proved more difficult than I hoped, but that might have been due to my inexperience with argument expansion and Biber internals. I started with the authoryear style and looked for the default name format that it uses in bibliographies; this turned out to be family-given/given-family. I copied that definition, and amended it to include the ORCID icon after each name (when available). To insert the icon, I used the orcidlink package. This part was tricky, as \getitemannotation does not work in an argument to \orcidlink or \href, but I ended up with the following.

\DeclareNameFormat{family-given/given-family}{%
  % ...
  \hasitemannotation[\currentname][orcid]
    {~\orcidlink{\expandafter\csname abx@annotation@literal@item@\currentname @orcid@\the\value{listcount}\endcsname}}
    {}%
  % ...
  }

References. Willighagen, Lars [ORCID icon with cyan outline] and Egon Willighagen [ORCID icon with cyan outline] (Aug. 2022). ISAAC Chrome Extension. Version v1.4.0. DOI: 10.5281/zenodo.7017208

You could repeat the same with ISNI links, or Wikidata, VIAF, you get the idea. Then you could put the \DeclareNameFormat in a new authoryear-orcid.bbx file so that the changes do not show up in the in-text citations, and set the bibliography style like so:

\usepackage[bibstyle=authoryear-orcid]{biblatex}

This can all be seen in action on Overleaf: https://www.overleaf.com/read/gvxqmrqmwswh#f156b5

References

Philip Kime, Moritz Wemheuer, Philipp Lehman (March 5, 2023). The biblatex Package. Programmable Bibliographies and Citations. Version 3.19. http://mirrors.ctan.org/macros/latex/contrib/biblatex/doc/biblatex.pdf

Friday, May 27, 2022

Citation.js Version 0.5 and a 2022 update

Version 0.5.0

Version 0.5.0 of Citation.js was released on April 1st, 2021.

BibTeX and BibLaTeX

After the update to the Bib(La)TeX file parser, described in the earlier BibTeX Rework: Syntax Update blog post, the mapping of BibTeX and BibLaTeX data to CSL-JSON was also updated. The mapping is now split in two, one for BibLaTeX (which is backwards-compatible with BibTeX) and one for BibTeX. The output formats were also updated to output either BibTeX-compatible files or BibLaTeX-compatible files. The most common difference there is the use of year and month versus date respectively. In addition, a number of updates were made to the file parser.

Core changes

In the Grammar utility class, bugs were fixed an behavior was updated to better account for the Bib(La)TeX parser. Some of the code for correcting CSL-JSON was also updated, including moving the code correcting results of the Crossref API from the DOI plugin to the core module as CSL-JSON from the API may end up in Citation.js through other methods than the DOI plugin. Earlier in 0.5 development, some of the HTTP handling code was also updated for increased stability.

2022 update

`v0.5.1`–`v0.5.7`

The version released since v0.5.0 mostly contain bug fixes and small enhancements. The latter includes some more descriptive errors in certain places, as well as mapping some non-standard fields in Bib(La)TeX and RIS.

New site design

The design of the Citation.js site was updated for the first time since 2018. The changes were detailed in the recent Citation.js: New site blog post.

New plugins

New plugins for the refer file format (plugin-refer) and the RefWorks tagged format (plugin-refworks) were released.

More coming

More changes are expected, including more long-awaited output formats, better mappings for software and datasets, and more work on machine-readable mappings.

Friday, October 11, 2019

BibTeX Rework: Syntax Update

A rework of the BibTeX parser has been on the backlog since at least August 15, 2017, and recently I started working on actually carrying it out — systematically. There were a number of things to be improved:

Complete syntax support: again, supporting BibTeX by looking at examples leads in a lack of support for less seen features like @string, @preamble and parentheses for enclosing entries instead of braces.
More complete mappings: since I did not have any specifications when making the BibTeX parser, I could not find a complete list of fields, hence no complete mapping.
Distinction between BibTeX and BibLaTeX: although there may not be any problems when importing, using year/month/day or date matters a lot if you have to output either.
Proper schema validation: BibTeX defines required fields, but Citation.js does not check if those all exist.

In this blogpost, I will describe how I went about solving the first point: complete syntax support. Part of the problem was that I was running a bad parser, which was difficult to extend and not performing that well.

To improve this, I collected a number of BibTeX parsers to compare them on a number of criteria: performance, syntax support, build steps, and ease of maintaining. I used two single-entry BibTeX files for debugging, and a longer BibTeX file (5.2 MiB, 3345 entries) for some rough performance testing. The outcomes:

	Using	Time (single entry)	Time (3345 entries)	Syntax
Current	`TokenStack`	~8ms	~1800ms	old
Idea	moo, `Grammar`	~2ms	~1150ms	old
Idea (new)	moo, `Grammar`	~3ms	~750ms	new
Generator	moo, nearley	~20ms	N/A	new
astrocite	PEG.js	~9ms	~1670ms	new
fiduswriter	biblatex-csl-converter	~160ms	~119000ms	new
Zotero	BibTeX translator	~180ms	~31000ms	old

So, the current parser was performing pretty well actually, especially compared to astrocite which I still consider a good target to aim for. TokenStack, however, was an unnecessarily complex part resulting in poor performance — and poor maintainability.

I had some trouble with PEG.js so I turned to other approaches. One thing I came across was nearley. However, this would introduce both an extra build step and an extra run-time dependency, and as the table shows did not perform very well. I assume that is on me, and my grammar-writing capabilities. One good thing that did come out of it was the use of a tokenizer or lexer, like moo.

After nearly finishing an approach using moo and Grammar, a simplified version of TokenStack with built-in support of rules, something else came up and I dropped the subject for about a year. However, recently I started over, saw my old approach and copied some stuff from there. This resulted in a even more simplified Grammar, with only matchToken, consumeToken and consumeRule support — no backtracking was needed in the end. Also, the performance was pretty good, and it was easier to implement the new syntax.

nearley grammar diagram

To make sure I had good results, I took some other parsers: Fidus Writer’s biblatex-csl-converter package and the Zotero BibTeX translator. The former was easy to set up, as it was just an npm package, while the latter involved quite some tricks: installing the Translation Server directly from GitHub, pointing an ENV variable to its config directory and running a piece of setup code, collecting all the translators I presume. Neither seemed to perform well in comparison to either my old parser, my new parser or astrocite, and I stress-tested all of them in terms of syntax:

@String {maintainer = "Xavier D\\'ecoret"}

@
  %a
preamble
  %a
{ "Maintained by " # maintainer }

@String(stefan = "Stefan Swe{\\i}g")
@String(and = " and ")

@Book{sweig42,
  Author =	 stefan # and # maintainer,
  title =	 { The {impossible} TEL---book },
  publisher =	 { D\\"ead Po$_{e}$t Society},
  year =	 1942,
  month =        mar
}

One area of expansion is all the ways BibTeX has to escape Unicode characters. Besides diacritics, which I should support completely, I think Zotero and astrocite are ahead in terms of completeness of symbols like \copyright. Then again, there is a great, really big list of LaTeX symbols, and not everyone needs every symbol — nor is everything represented in Unicode. I think the best way to do this is to expose function in the configuration to expand the default supported macros, but let me know if something else comes to mind.

The new parser, in its current form, has been published as part of Citation.js v0.5.0-alpha.3.

Sunday, November 20, 2016

Citation.js Version 0.2.10: BibTeX and Travis

The new update (Citation.js v0.2.10) doesn't have a big impact on the API, but a lot has changed in the back end. ("back end" as in the helper functions that are called when the API is used.) First of all, there is Travis build tests now. It doesn't have edge test cases yet, but it covers the basics.

Testing the basic specs

The main thing is the new BibTeX I/O system. It isn't completely new, but a lot has changed. Before, BibTeX files were parsed with RegEx, you know, the thing you don't want to be parsing important things with. This worked fine for parsing the type of BibTeX I use in the test cases, but it failed in unexpected ways in case of a different syntax and it was a tedious piece of code as RegEx usually is.

Now I have a character-by-character parser (taking escaped character into consideration), that outputs clear error messages and recovers already parsed data when syntax errors occur. This parser converts BibTeX files into its JSON representation (which I call BibTeX-JSON), which then can be converted to CSL-JSON. There are some improvements here as well, but not as many.

The output is done by first converting CSL-JSON to BibTeX-JSON. This process is improved a bit as well. The BibTeX-JSON can then be converted to BibTeX. This now supports the escaping of syntax characters (e.g. '|' into '{\textbar}').

Tuesday, March 5, 2024

Citation.js: BibLaTeX Data Annotations support

References

Saturday, March 2, 2024

Including ORCID identifiers in BibLaTeX (and using them)

References

Friday, May 27, 2022

Citation.js Version 0.5 and a 2022 update

Version 0.5.0

BibTeX and BibLaTeX

Core changes

2022 update

v0.5.1–v0.5.7

New site design

New plugins

More coming

Friday, October 11, 2019

BibTeX Rework: Syntax Update

Sunday, November 20, 2016

Citation.js Version 0.2.10: BibTeX and Travis

`v0.5.1`–`v0.5.7`