Machine learning challenge

The first step for an automatic system watching the sky for meteors is to look for moving objects. But meteors are not the only objects that move across the sky. Quite often aircrafts, birds, clouds, snow flurries, insects and so on trigger a detection. Some of these detections are immediately discarded if the speed doesn’t match that typical of a meteor, of if it doesn’t travel in a straight path. Still, many false detections remain, in particular from aircrafts flying at low altitude and therefore at an apparent high speed like meteors. It’s often easy to discard such detections by visual inspection, but this is tedious work. It is now often said, however, that state of the art machine learning is capable to match or even surpass the skill of a human at image classification, and meteor detection is a quite simple binary classification problem: Either it’s a meteor, or it’s not.

The key to machine learning is training data. So in order to improve the detection process I’ve collected a dataset of 2726 meteors and 1131 false detections, hopefully sufficient for a useful classifier. My experience with the machine learning modules for Python is limited. My first approach was to use an example found on the Internet. That gave moderately promising results. But I’ve been unable to improve the initial results in any significant way. In particular, there are a bit too many false negatives, i.e. actual meteors classified as non-meteors. This is more serious than false positives. Ideally we want few false positives and few false negatives, but it’s much better to trade fewer false negatives with more false positives.

Since my experience with machine learning is limited, like the amount of time I can devote to learning it, my second approach is now to ask for assistance. I offer the training data to anyone who would like to have a go at making a classifier. It may an easy task for anyone with experience to improve what I have so far. And for those with less experience, the dataset might work as an exercise and good solutions will be very helpful!

The dataset is available for download here (146 MB).

It contains a Python program learn.py which is what I have so far and hope to improve. Then there are the directories training_set and test_set which both include the subdirectories meteors and wrongs containing pictures of meteors and non-meteors respectively. Roughly 80% of the images are in the training set, but all files in training_set and test_set have distinct names so the balance can be easily changed. Then there are another dataset training_set_raw and test_set_raw which contains (nearly) the same images, except that they lack some processing. I don’t know which dataset is the better for machine learning.

The images only contain the meteor, or what may be a meteor, rotated so that the meteor starts at the left edge and end at the right edge. This crop and rotate operation isn’t always successful, though, but should remove most of the irrelevant features of the original images.

Here are a few examples of actual meteors:

And here are some examples of false detections:

A human can easily classify these correctly (but there also exist images which are not so obvious, and again we can tolerate false positives, but less so false negatives).

If you think you’ve improved the classification, or need help with the data in any way, do not hesitate to e-mail me at steinar@norskmeteornettverk.no. I can only offer many thanks and an improved service at norskmeteornettverk.no in return.

3 kommentarer til «Machine learning challenge»

  1. Morsomt. Hvorfor er dette innlegget på engelsk? Så vidt jeg kunne se, er alle de andre på norsk.
    Aner dere muligens at dere vil trenge utenlandsk kompetanse, akkurat som med ALT som er anstrengende og litt krevende? Det å ikke få noe igjen for innsatsen er normalt for utlendinger i Norge, så dette går greit.

    1. Denne kommentaren fortjener egentlig ikke noe svar. Alt arbeid på norskmeteornettverk.no er gjort på frivillig, ubetalt basis og er en gratis tjeneste. Jeg skal gjerne utvikle alt som skal til for å gjøre denne tjenesten bedre, men da jeg har jobb, familie og andre interesser, vil det ta tid. I dette tilfellet har jeg valgt å gå ut med et spørsmål om noen ønsker å hjelpe og gjøre tjenesten bedre tidligere enn det jeg kan gjøre. Det har aldri vært et mål at utviklinga skal være et enmannsprosjekt, og dette nettstedet hadde vært langt mer primitivt uten hjelp fra både norske entusiaster og likesinna især fra våre naboland (Sverige, Danmark, Finland).

      Akkurat denne artikkelen er skrevet på engelsk fordi den er delt også i utlandet og det publiserte datamaterialet kan ha interesse ikke bare i Norge. Unntaksvis har artikler blitt publisert på engelsk når de kan ha internasjonal interesse.

Det er stengt for kommentarer.