Finding Patterns in 152,000 UFO/UAP Sightings

I analyzed 152,000 UFO sighting reports. Some patterns: triangles almost never appear during the day. Disks are 5x more likely to involve entity encounters. And the majority are silent.

Finding Patterns in 152,000 UFO/UAP Sightings
Distribution of UFO/UAP sightings by shape and observed characteristics (sound, time of day, movement pattern). Larger boxes indicate more reports.

First-hand UFO (or UAP, if you prefer) sighting reports are great to read, but they're impossible to analyze at scale. Witness narratives are inherently unstructured and messy. People routinely describe the same experience in completely different ways, use different wording and focus on different details.

So when I got access to the National UFO Reporting Center's database of 152,000+ sighting reports, I wanted to find a systematic way to extract what witnesses actually described. I developed a system using AI to analyze each narrative and pull out specific features. This let me ask questions across the entire database: What shapes appear most often? Which sightings are silent? How often do witnesses report physical effects or see occupants?

I analyzed every report across 35 distinct dimensions and 8 distinct categories. Some of the patterns that emerged were genuinely surprising.

First off: Thank You NUFORC

Before I dive into the analysis, I wanted to thank the great folks at the National UFO Reporting Center (NUFORC) for graciously allowing me to access their data. For decades they've been maintaining one of the most comprehensive databases anywhere of UFO sightings reports. Without their meticulous work documenting these reports, this kind of analysis simply could not have been done. If you've never explored their database, I highly recommend checking out the NUFORC searchable report link.


Let's start with why I wanted to try find patterns and cluster the UFO sightings: As a researcher, the reality is that 152,000+ reports are impossibly time consuming for manual analysis. There are going to be patterns across these sightings reports, but grouping them is difficult because of the reality of unstructured text: Each one contains a first-person narrative describing something unusual in the sky, but there is zero consistency in the formatting and reporting. Some are just a sentence or two, others are full page accounts. Reading them individually might give you some one-off insight, but it's not going to reveal the big picture.

I tried a few approaches. Traditional text clustering using ML didn't work well, so I pivoted to using an AI model to reason through and categorize every single report. The result is a dataset where each sighting has a consistent profile that can be queried and compared. To my knowledge, this kind of systematic feature extraction hasn't been done on the NUFORC database at this scale before.

More details on methodology are below for those interested. For now, let's look at what the data reveals.


Here are the findings I found most interesting.

The Sound of Silence

The overwhelming majority of sightings are silent. When I looked at the five most common shapes, the silence rates varied across them:

ShapeTotal SightingsSilentPercentage
Light only22,96717,71977.1%
Cylinder/cigar7,5615,13567.9%
Triangle12,8168,63867.4%
Sphere/orb32,78620,69463.1%
Disk/saucer8,8465,01856.7%

Disk/saucer is the "noisiest" shape. But these types are still silent more than half the time. Not what you'd expect from conventional aircraft.

Disks are Different

More on classic disks/saucers - they stand out in several ways. They're more likely to be seen during the day than other shapes. They have the highest rate of entity encounters. And they're most associated with abduction reports.

Here are the top entity encounter rates by shape:

ShapeTotal SightingsEntity EncountersRate
Disk/saucer8,8463003.4%
Sphere/orb32,7862110.6%
Triangle12,8161251.0%

Disks are five to six times more likely to involve entity sightings than spheres. The same pattern holds for abductions: disks have a 1.3% abduction rate compared to 0.23% for spheres.

Why would this be? One possibility is that disk sightings tend to be observed "closer" than other shapes. Another is that the old-school saucer shape has been historically or culturally associated with occupants, which might influence how witnesses interpret and report their experiences. I'll leave it to readers to draw their own conclusions with these reports.

Triangles, Like Raccoons, are Nocturnal

Have you ever seen a triangle UFO during the day? Probably not. Triangle sightings are heavily skewed toward nighttime. The ratio is 8,849 night sightings to just 1,196 during the day. This one I found really interesting - that's a 7.4x ratio.

Compare this to spheres/orbs, which show a 3.9x night-to-day ratio, or disks at only 1.5x. Triangles are disproportionately a nighttime phenomenon.

Perhaps triangular shapes are just harder to detect in daylight (because you need contrast against a dark sky to make out the shape). Or perhaps this suggests there's something more to it.

Physical Effects from Sightings Are Rare - But They Happen

Physical effects on witnesses are uncommon, but they do appear in the data:

EffectCountPercentage of All Sightings
Electrical interference3,6452.4%
Paralysis1,4971.0%
Heat3680.2%
Nausea3080.2%

Electrical interference is the most commonly reported effect. This makes sense because it's also the easiest to notice. Car radios shutting off, phones going dead, or lights flickering are hard to miss. More subtle physical sensations might just go unreported.

Image Source: Midjourney

Entity Encounters = "Close" Encounters

When entities are reported, they tend to be up close and personal. Of the roughly 2,500 sightings that mention entities:

Interaction LevelCount
Direct contact781
Abduction609
Passive observation586
Physical effects359
Object reactive136

More than half of entity sightings involve direct contact or abduction. Passive observation from a distance is actually the minority. When people see beings, it's usually not from far away.


High Level Archetypes

When I looked for the most common combinations of features, distinct patterns emerged. I'm calling these "archetypes" because they represent recurring themes that appear again and again in the data.

The top archetypes, ranked by frequency of occurrence:

ArchetypeDescriptionCount
The Silent Night CruiserLight or sphere, silent, nighttime, straight-line movement~12,800
The Silent Night HovererLight or sphere, silent, nighttime, hovering~8,600
The Black TriangleTriangle, silent, nighttime, straight-line or hovering~5,600
The Daylight OrbSphere, silent, daytime, straight-line or hovering~2,600
The Night DiskDisk/saucer, silent, nighttime, hovering~1,300
The Erratic LightLight only, silent, nighttime, erratic movement~1,150
The Classic SaucerDisk/saucer, silent, daytime, hovering~720

These seven archetypes alone account for roughly 33,000 sightings, about 22% of the entire dataset. The remaining 78% are more varied combinations, but these are the core patterns that repeat most often.

Now for the technical part. Read on to learn how this analysis was done.


Analyzing the Data

For those curious about methodology: traditional machine learning approaches didn't work well here. I initially tried using semantic embeddings to cluster similar report narratives together, but the ML tended to cluster the sightings based on witness writing style rather than content. Reports ended up grouped because they used similar sentence structures, not because witnesses observed similar phenomena.

So I took a different approach. I defined 35 specific dimensions across eight categories (object characteristics, behavior, lights, sensory effects, physical effects, encounter details, witness information, and subsequent activity) and used Claude (Anthropic's AI) to read each of the 152,000 narratives to extract and populate these features systematically.

Running an LLM across 152,000 records did mean some compute time and cost. But the result is a more structured dataset where every sighting has a consistent profile that can be queried, compared, and analyzed using standard programming tools.

Ultimately, the extraction process was not perfect. Spot-checking showed a high level of accuracy (the validated samples matched the narratives closely), but there was a small percentage of records that contained errors. I'm confident that the aggregate patterns should be correct, but any individual record should be verified against the original narrative before drawing any conclusions.

Not surprisingly, the dominance of silent, nighttime sightings is the consistent theme. If someone were asked to describe the "typical" UFO sighting based on this data, the response would be: a silent light or sphere moving across the night sky in a straight line.

Having 152,000 sightings coded across 35 dimensions opens up research possibilities that weren't practical before:

Pattern queries. Want to find all triangle sightings that were silent, at low altitude, and involved hovering? That's now a simple database query.

Temporal analysis. Do certain shapes appear more in certain decades? Has the distribution of colors changed over time?

Geographic patterns. Are triangles concentrated in certain regions? Do coastal areas see more sphere sightings?

Correlation studies. Which shapes correlate with physical effects? Which are most likely to involve multiple witnesses?

Anomaly detection. Sightings that don't fit into any common archetype might be the most interesting ones to investigate.

A Note about Interpretation

I should be clear about what this analysis can and cannot tell us.

It tells us what a witnesses reported. It does not tell us what actually happened. There's a difference.

A witness might describe something as silent because they didn't notice a sound, not because there wasn't one. A shape might be coded as "unknown" because the witness focused on other details, not because the object had no discernible shape.

What was unique about this analysis is that I also used the AI to infer meaning where possible. For example, if someone said the object "was as large as a bus", I had the LLM categorize it as "medium" in size. Having structured taxonomy of dimensions makes future analysis easier at scale. To make sure the reporting remained accurate - I tagged the fields that were populated based on inference.

Also - this is descriptive analysis, not causal analysis. Finding that "disks are more associated with entity encounters" tells us something about the data, but it doesn't explain why. That interpretive work is left to the reader.


Up Next...

In my next article (available to free members), I'll share the mapping of sighting IDs to a high level set of extracted features. Think of it as an index — you can find sightings that match specific patterns, then look up the full narratives in the official NUFORC database.

For Insiders, I'll share the complete methodology: the feature taxonomy specification, the extraction prompts, the code used to process 152,000 records, and the lessons learned along the way. If you want to reproduce this analysis or adapt it for your own research, that article will give you everything you need.


Dear reader, as you can see from the types of topics we cover, we do not write for a search algorithm and we do NOT display ads of any kind. That's because with AI, the web business model that we've all known and loved in the past is broken.

We believe that being subscription based is a more equitable model - so If you like our content, please consider becoming an Insider to keep this site going! Also, if you enjoyed this article please consider leaving a tip or sharing on social media! Thank you again for your support and readership.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Enigmatic Ideas.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.