Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Saturday, August 12, 2023

Finding shield bug nymphs on iNaturalist

Working on translating a key to the European shield bug nymphs (Puchkov, 1961) I thought I would look for pictures of the earlier life stages (nymphs, Fig. 1) of shield bugs (Pentatomoidea) on iNaturalist and found few observations actually had the life stage annotation. I do not have the exact numbers of Europe as a whole at that point in time, but Denmark currently has around 19.8% and the United Kingdom has around 29.4% of the observations annotated (GBIF.org, 2023).

Figure 1: Fourth instar nymph of Nezara viridula (Linnaeus, 1758). 2023.vi.22, Bad Bellingen, Germany.

So I set out to add those annotations myself instead, starting with the Netherlands, followed by the rest of the Benelux, Germany, and Ireland. Last Monday, I finished the annotating the observations of France. These regions total to about 80 000 observations, of which I annotated a bit more than 40 000 (again, I do not have the exact numbers from before I started).

Methods

I made these annotations on the iNaturalist Identify tool which has plenty of keyboard shortcuts that I found after using the mouse for 2000 observations. This allowed me to develop some muscle memory, and I ended up annotating a single page of 30 observations in around 60 seconds, so 2 seconds per observation. Most of that time was usually spent waiting for the images to load, and there were plenty of small glitches in the interface to further slow me down (including a memory leak requiring me to reload every 10-ish pages).

I was not able to annotate 715 of the verifiable observations (i.e. those with pictures, a location, and a time). In some cases, the pictures were simply not clear enough (or taken too closely) for me to determine with certainty the life stage. Another issue I had to work around were observations of multiple individuals at different life stages. Common were observations of egg clusters and just-hatched nymphs of Halyomorpha halys (Stål, 1855); the “parent bug” Elasmucha grisea (Linnaeus, 1758) doing parenting; kale plants infested with adults and nymphs of Eurydema; and adults of various species in the process of laying eggs. However, there were also many observations containing multiple pictures where one was of an adult and a second of a nymph, with no indication that it was the same individual at different times. There is currently no way to annotate multiple life stages on a single observation on iNaturalist except through non-standard observation fields, which are a lot more laborious to use and can be disabled by users.

Results

Coloring the observations by life stage on a map clearly shows the effect of the work, with the aforementioned countries covered in red; and the most of the rest of Europe in blue (Fig. 2). (There are two other notable red patches, in Abruzzo, Italy and in Granada, Spain. These are not my doing, and seem to be caused by two prolific observers annotating their own observations, respectively esant and aggranada.)

Figure 2: Map of research-grade iNaturalist observations of Pentatomoidea in Europe, colored by whether or not they have a life stage annotation.

These annotations mean additional data is available on the seasonality of these species. For example, looking at the four most observed species already reveals that Pentatoma rufipes (Linnaeus, 1758) overwinters as nymphs, whereas the other three species overwinter as adults (Fig. 3). The larger volume of data also means that more detailed analyses with more explanatory variables can be carried out. For example, the effect of climate change on the life cycle of invasive species like H. halys.

Figure 3: Seasonality of nymphs and adults of the four most of observed shield bug species.

In addition, for less common species the classification of life stages makes it possible to find more about the morphology of the earlier life stages of these species. This is useful for individuals who are working on keys (such as me), but perhaps also for computer vision models. Classifying the not-yet identified observations of nymphs as such also allows for more targeted searches by identifiers, potentially leading to even more research-grade observations of rarer species.

It should be said though, that even Chlorochroa pinicola (Mulsant & Rey, 1852), which is not particularly common in West Europe, still has many more validated pictures on Waarneming.nl than on iNaturalist. In fact, nearly half (43.2%) of all observations with images of Pentatomoidea in Europe are in the Netherlands. These are not all annotated with a life stage though, and the Observation.org platform (which Waarneming.nl is part of) seemingly only allows curators and observers add life stage annotations to an observation.

Luckily, iNaturalist does allow for this and enables me to contribute hopefully valuable data to GBIF for further analysis, by myself or by others. I will continue adding annotations — I have now started on the observations from Switzerland, luckily a lot fewer than those from France. At the same time, I am maintaining the high rate of annotation in the countries I have already annotated. In August, this means annotating about 200 observations per day (10–15 minutes) which is entirely doable. It does quickly start to add up if you are on holiday for a week, as you do in August, but that is still fewer observations than the entirety of France. Still, for this reason I hope other identifiers (or even better, observers) start annotating more as well.

References


Written with StackEdit.

Wednesday, February 24, 2021

Mid-week effect of Dutch COVID-19 case reporting

Every week since the start of January I heard headlines like “3963 new infections, a bit more than average”. In context, that meant that the amount of positive COVID-19 that day was higher than the average daily amount of cases in the previous seven days. Even though we heard that every week, overall the cases were going down. What was going on? Well, somewhat unsurprisingly, the amount of reported cases was higher on weekdays than on Monday and in the weekend. The news site I linked above mentions the midweek-effect as well, but on Twitter and on the radio you mainly hear the headline, not the caveats.

Anyway, I wanted to see what the midweek-effect actually looked like. The daily infection data is available from the RIVM website, RIVM being the Dutch National Institute for Public Health and the Environment. I started by reproducing the graph they made, partly to check the file and partly for fun (see Fig. 1).

Amount of new cases on each GGD notification date
Figure 1: Amount of positive COVID-19 tests since the start of December. The date used is the date when the test result was reported to the GGD, the municipal health service. In yellow are the test results RIVM got this week, in purple previous data.

To decompose the various effects in this data, I used R’s stats::stl() function on a time series with frequency=7. This took a while to figure out, as a mis-configuration on my end of the time series led to STL interpreting each day as a season, instead of each week. Because the seasonal effect is multiplicative and not additive, I had to log-transform the data as well. This means the seasonal component has a higher amplitude when the overall amount of cases is higher (see Fig. 2).

Seasonal component and trend of data previously shown
Figure 2: Weekly component and general trend of positive COVID-19 tests, overlayed on daily data. The solid line indicates the overall trend, while the dashed line combines the overall trend with the seasonal component. Yellow still indicates test results from this week, purple the previous data.

In the end, this does not mean much at all. The accepted explanation of the midweek-effect has to do with when tests are carried out and reported; not when infections actually take place. If we look at the cases where the date of disease onset is available (source, Public Domain Mark 1.0), the weekly trend is less present (see Fig. 3).

Seasonal component and trend of new cases per day of disease onset
Figure 3: Number of new confirmed cases per day of disease onset. The solid line indicates the overall trend, while the dashed line combines the overall trend with the seasonal component. The gray area show daily cases.

Comparing the two weekly trends, the midweek-effect of reported cases is higher than the weekly effect of the disease onset of cases. In addition, the latter peaks on Mondays and declines throughout the week before as opposed to the midweek-effect (see Fig. 4). Whether this effect is actually present or another artifact of data reporting is unclear to me.

Weekly pattern of GGD test results and reported disease onsets
Figure 4: Weekly component of reported cases and disease onsets. The solid line shows the pattern for the day of disease onset, the dashed line for the day when the test result is reported.