Lessons Learned Building a UFO Classifier

My first approach to analyzing 152,000 UFO sightings had a 34% accuracy rate. I almost published it anyway. Here's what went wrong with semantic embeddings, why I pivoted to LLM extraction, and the methodology that actually worked.

First, thank you Insiders for supporting my work and making this type of deep-dive technical content possible.

In my first article, I walked through patterns I found across 152,000 UFO sightings: silent triangles, disk-entity correlations, the emotional aspect of encounters. In my second article, I did a deep dive on entity encounters. In the third, I provided the sighting ID index so you could verify my claims against the original NUFORC data.

This article is for those who want the technical nitty gritty: the taxonomy, the prompts, the code, what worked and what failed along the way.

PSA: this is a long one AND technical. If that's not your idea of a good time, believe me I get it. But if you want to reproduce this type of analysis, change it up for your own research, or just understand the methodology behind the findings, read on.

Here's what you'll find:

The approach that didn't work (and why it failed)
The pivot that saved the project
The full taxonomy specification (35 dimensions for classifying sightings)
The extraction prompts I used with AI
Cost breakdowns and processing details
Code you can adapt for your own projects

What I'm NOT providing: Access to the source data. The NUFORC database is proprietary. If you want the raw sightings, I'd suggest reaching out to the great folks at NUFORC directly.

The full methodology, code, and lessons learned are only available to Insiders. If you're interested in reproducing this analysis or applying these techniques to your own research, consider joining to get access to this article and everything else behind the curtain.

Read the full story

Already have an account? Sign in

Do we really need a Valdivia Sphere Too?

Analyzing UFO Sightings Databases with Generated Code

Revisiting the Tinley Park Lights

Lessons Learned Building a UFO Classifier

Read the full story

Read next

Do we really need a Valdivia Sphere Too?

Analyzing UFO Sightings Databases with Generated Code

Revisiting the Tinley Park Lights

Lessons Learned Building a UFO Classifier

Read the full story

Read next

Do we really need a Valdivia Sphere Too?

Analyzing UFO Sightings Databases with Generated Code

Revisiting the Tinley Park Lights

Classifying the Unknown: A Taxonomy of UFO Sightings