Machine learning challenge

The first step for an automatic system watching the sky for meteors is to look for moving objects. But meteors are not the only objects that move across the sky. Quite often aircrafts, birds, clouds, snow flurries, insects and so on trigger a detection. Some of these detections are immediately discarded if the speed doesn’t match that typical of a meteor, of if it doesn’t travel in a straight path. Still, many false detections remain, in particular from aircrafts flying at low altitude and therefore at an apparent high speed like meteors. It’s often easy to discard such detections by visual inspection, but this is tedious work. It is now often said, however, that state of the art machine learning is capable to match or even surpass the skill of a human at image classification, and meteor detection is a quite simple binary classification problem: Either it’s a meteor, or it’s not.

The key to machine learning is training data. So in order to improve the detection process I’ve collected a dataset of 2726 meteors and 1131 false detections, hopefully sufficient for a useful classifier. My experience with the machine learning modules for Python is limited. My first approach was to use an example found on the Internet. That gave moderately promising results. But I’ve been unable to improve the initial results in any significant way. In particular, there are a bit too many false negatives, i.e. actual meteors classified as non-meteors. This is more serious than false positives. Ideally we want few false positives and few false negatives, but it’s much better to trade fewer false negatives with more false positives.

Since my experience with machine learning is limited, like the amount of time I can devote to learning it, my second approach is now to ask for assistance. I offer the training data to anyone who would like to have a go at making a classifier. It may an easy task for anyone with experience to improve what I have so far. And for those with less experience, the dataset might work as an exercise and good solutions will be very helpful!

The dataset is available for download here (146 MB).

It contains a Python program which is what I have so far and hope to improve. Then there are the directories training_set and test_set which both include the subdirectories meteors and wrongs containing pictures of meteors and non-meteors respectively. Roughly 80% of the images are in the training set, but all files in training_set and test_set have distinct names so the balance can be easily changed. Then there are another dataset training_set_raw and test_set_raw which contains (nearly) the same images, except that they lack some processing. I don’t know which dataset is the better for machine learning.

The images only contain the meteor, or what may be a meteor, rotated so that the meteor starts at the left edge and end at the right edge. This crop and rotate operation isn’t always successful, though, but should remove most of the irrelevant features of the original images.

Here are a few examples of actual meteors:

And here are some examples of false detections:

A human can easily classify these correctly (but there also exist images which are not so obvious, and again we can tolerate false positives, but less so false negatives).

If you think you’ve improved the classification, or need help with the data in any way, do not hesitate to e-mail me at I can only offer many thanks and an improved service at in return.