Revisiting the Tinley Park Lights
175 strangers filed independent reports about the same thing. 96% agreed on the color. 100% agreed on the sound. I ran the data on the Tinley Park Lights.
My first approach to analyzing 152,000 UFO sightings had a 34% accuracy rate. I almost published it anyway. Here's what went wrong with semantic embeddings, why I pivoted to LLM extraction, and the methodology that actually worked.
First, thank you Insiders for supporting my work and making this type of deep-dive technical content possible.
In my first article, I walked through patterns I found across 152,000 UFO sightings: silent triangles, disk-entity correlations, the emotional aspect of encounters. In my second article, I did a deep dive on entity encounters. In the third, I provided the sighting ID index so you could verify my claims against the original NUFORC data.
This article is for those who want the technical nitty gritty: the taxonomy, the prompts, the code, what worked and what failed along the way.
PSA: this is a long one AND technical. If that's not your idea of a good time, believe me I get it. But if you want to reproduce this type of analysis, change it up for your own research, or just understand the methodology behind the findings, read on.
Here's what you'll find:
What I'm NOT providing: Access to the source data. The NUFORC database is proprietary. If you want the raw sightings, I'd suggest reaching out to the great folks at NUFORC directly.
The full methodology, code, and lessons learned are only available to Insiders. If you're interested in reproducing this analysis or applying these techniques to your own research, consider joining to get access to this article and everything else behind the curtain.