Generate Dataset

The Mothbot_CreateDataset.py script creates a user-friendly, editable dataset for you. It uses a program called Fiftyone to create the User Interface.

Inputs

We generally will only edit one night at a time. image

Pre-processing Thumbnails

The first time you run the CreateDataset script on a night’s data, the first thing it needs to do is create thumbnails for each of the creatures it detected.

image

This can take a while, and the terminal will show a progress bar.

These thumbnail patch images will be stored in a little folder alongside that night’s data called “patches.”

image

image

Whenever you run this script again, however, it should go much faster, as it won’t have to create those thumbnails.

Results

After it completes all the processing, a couple things will happen.

Datasets stored to disk

First the script will save 3 files to that night’s folder. image

  • samples.json and metadata.json
    • these store a consolidated set of all your automated samples created
  • a .csv file with export date
  • This is a convenience file generated to have an easy way to look at all the data, 1 detection per line, in a format that things like GBIF like

Dataset Opens in Web Browser

You computer will then also launch an interface in your web browser. This is still reading your data locally (nothing is in the “cloud”), so you don’t need an internet connection. 93728753-8a70-4686-b493-1e3de177627e

Using the Interface

The interface lets you filter your detections by their identifications. This can let you see how good the automated detections were.

The most important part of this interface is that you can edit the tags on these datasets to:

  • Correct any mislabels
  • Note any errors (e.g. a raindrop mis-identified as an insect)
  • provide deeper labels

Editing Tags

When the interface first opens, you will probably see a view something like this:

image

It is already automatically sorted by image size, with the smallest detections shown first. This is because most errors tend to happen on the really small insects.

On the left side of the interface, you can filter detections. Click on “Sample Tags.” image You can see in this night, we detected about 5,000 Arthropod creatures!

You can type in the filter area to select on particular taxa. For instance Lepidoptera: image Note that for now, this filter may be case sensitive, ie “Lepi…” works, but not “lepi…”

Now the interface will show you only things that have been categorized as Lepidoptera: image

You can click the checkbox to toggle showing all the ID tags on a sample too: image

Changing Tags

You can select a set of samples. For instance, these grasshoppers were categorized incorrectly: image

You can click the checkbox in each sample, OR you can hold SHIFT+click to select a range.

Now we need to change the tags because these are not Lepidoptera. While those are selected, click the “Tag” button. image

Now, scroll through the tags, and UNCHECK the erroneus tags. (that is, it is still KINGDOM_Animalia, but not ORDER_Lepidoptera)

Next, we find the correct classification for these. I don’t know what family these crickets are, but I am pretty sure they are Order_Orthoptera image Then hit “Apply.”

Now if we change our view to “Orthoptera,” we can see our re-classified crickets there! image

Keep doing this for ALL incorrect labels!

Save the Corrected Dataset

We want to make sure all these edits do not disappear, so there’s a couple things we need to do to save this work.

Correct IDby Tag

By default, the creatures were all identified by Biolclip. But now you have gone and verified all the IDs, so we need to change this tag. First click on just this tag. It should select all the samples from your dataset. image

Then we are going to create a new tag showing what we have ID-ed. Make sure that no samples are selected (this makes whatever changes apply to ALL the samples). Click the “tag” button at the top, and type in your new IDby tag. Use your own name. For instance I write: “IDby_Quitmeyer”. image Hit “add…” Then click “apply.” image

Now we need to remove the “IDby_Bioclip” tag from all of them. This is easy. Again just make sure that no samples are selected, and click the “tag” button. Now unclick the checkbox next to “IDby_Bioclip.” This will remove the tag from all these samples.

image

Export the Dataset

Now we need to save this corrected dataset. Click the “Browse Operations” button. image

Select “Export Samples.” image

Now make sure to select:

  • Entire Dataset
  • Labels only
  • FiftyOne Dataset

image

Finally we need to put in a filepath for a new folder of where to save this. If you are on a Mac or Linux, this is easy. Just paste a file path.

If you are on Windows, it’s a little trickier because of the silly way that Windows saves file paths.

If you copy a Windows path from a Windows panel, it will look something like this:

C:\Users\andre\Desktop\Mothbox data\PEA_PeaPorch_AdeptTurca_2024-09-01\2024-09-01\QuitmeyerID

However, if you paste that into Fiftyone, there’s a bug where it still cannot handle a Windows-style file path. image

So you can paste in a file path, plus the name of a new folder you want to create to store your new dataset in (for instance, my new folder is “QuitmeyerID”), but then you need to change all the “" to “/”.

C:/Users/andre/Desktop/Mothbox data/PEA_PeaPorch_AdeptTurca_2024-09-01/2024-09-01/QuitmeyerID

And now you can see it will let you click “execute.” image

Now you have a new folder with your new data in it! image

Export a CSV file of your new Dataset

Finally, if you want a new CSV file of this corrected data, there’s just one more script to run! Open Mothbot_ConvertDatasettoCSV.py Change the input path to your new dataset folder. image

Hit the Run button.

And now you have a new CSV file in that folder too of all your data! image


Table of contents