Thursday, December 2, 2021

Re-implementing the upload of images for the LaTeX→HTML converter

The CDLI is developing a new website. That website’s admin interface for its journals contains a page where a LaTeX source file, following a specific template, is configured to an HTML page. For this, apart from the LaTeX file itself, two additional components are needed: a BibTeX file, containing metadata of the references; and image files.

Current implementation

The current implementation involves uploading images separately from the form that creates the article. This can lead to “orphan” images if such a form is abandoned after uploading the images. The current implementation has an additional problem, where files are saved in a single directory, so multiple images with the same file name (say, Figure1.jpg) will overwrite previous images.

ClientServerGET /admin/articles/add/cdlj200 (add page)POST /admin/articles/convert-latex200 (converted HTML, list of images)POST /admin/articles/image 'Figure1.jpg'Saves image in 'Figure1.jpg'200POST /admin/articles/add/cdlj300 /articles/<sequential ID>ClientServer

New implementation

With a new implementation of the rest of the forms, simplifying a lot of the code, the issues with the image uploads are also attempted to be resolved.

Saving images together with the metadata

To avoid the problem of the “orphan” images resulting from abandoned forms, one could submit the images in the same form that creates the article. If that form is not submitted, or contains invalid data so cannot be saved, the images will not be uploaded.

Saving images according to metadata

The second problem could be solved immediately by saving the images in subdirectories according to the metadata of the image, e.g. 2021-01/ where 2021 is the year the article is published and 01 is the article sequence number within that year. This however assumes that that metadata does not change after the initial submission.

Both these solutions however creates some constraints on other problems, because it means that the images can only be saved after the user submits the main form, so after the HTML containing <img> elements with references to the image locations is generated. Somehow, those image locations should be able to identify the article, before any information about that article is known:

ClientServerGET /admin/articles/add/cdlj200 (add page)POST /admin/articles/convert-latexAt this point the HTMLshould contain linksto the permanent locations of the images200 (converted HTML, list of images)POST /admin/articles/add/cdljAt this pointthe images are savedin a permanent location300 /articles/<sequential ID>ClientServer

So, what is the solution? I propose the following:

ClientServerGET /admin/articles/add/cdljGenerate random article ID200 (add page with embedded article id)POST /admin/articles/convert-latexGenerate image URLSaccording torandom article ID200 (converted HTML, list of images)POST /admin/articles/add/cdljVerify random article ID,save images,generate sequential ID300 /articles/<sequential ID>ClientServer