My first approach to analyzing 152,000 UFO sightings had a 34% accuracy rate. I almost published it anyway. Here's what went wrong with semantic embeddings, why I pivoted to LLM extraction, and the methodology that actually worked.
I claimed triangles are nocturnal. That disks are 5x more likely to involve entities. That shadow beings terrify while light beings inspire awe. Here's the sighting ID index so you can verify it yourself.
I analyzed entity encounters in the NUFORC sightings database. Shadowy beings: 68% of witnesses report fear. Greys: 38% involve abduction. Light beings: more awe than fear.