Analyzing UFO Sightings Databases with Generated Code

Editor's Note: Thom Hastings is a UFO researcher and data enthusiast who has been collaborating behind the scenes at Enigmatic Ideas, and I'm excited to share his first contribution to the site.

What can you learn when you use Generative AI to analyze UFO sightings?

I have been hoarding UFO sightings databases for the last several years. Getting all that data and converting the format of all of it to CSV could be its own post. Last week, I got sick, and being homebound provided the right conditions for sitting down to analyze all that data.

This story starts last year, when I read J. Allen Hynek's The UFO Experience (1972). In it, he laments not having sightings data in a "machine-readable" format. He presents several questions that could be answered with such a capability, one of which is "Do Daylight Discs and Nocturnal Lights have the same proportion of rapid takeoffs, hoverings, and sharp turns?" I had scrapes of MUFON and NUFORC, and used ChatGPT at the time to generate some Python code to answer that question by searching for synonyms of the three types of movement.

It looks like the answer to Hynek's question is, "yes!" This just gave me an idea of the type of analysis that was open to us now with large computer databases.

Enjoying this analysis? We dive into the data that others skip over. Join as a free member to get our research delivered directly to your inbox.

Join for Free

In 2023, there were 3 good meta-databases of UFO sightings:

UFOCAT from CUFOS (13 sources)
UFO-search.com (19 sources)
UPDB.app (12 sources)

Note: UPDB contains Capella!

UPDB includes most of the same sources as the AAWSAP BAASS "Capella" Data Warehouse compiled by Jacques Vallee. In Skinwalkers at the Pentagon p. 167:

The eleven databases included in the AAWSAP BAASS Data Warehouse were: (1) NIDS Database, (2) Airline and Military Pilot Database, (3) Project Sign/Grudge/Blue Book Database, (4) UFOCAT Database, (5) MUFON Case Management System Database, (6) Project Colares Database, (7) Canadian Government UAP Release Database, (8) United Kingdom Government UAP Release Database, (9) AAWSAP BAASS Database of cases investigated internally 2008-2010, (10) Skinwalker Ranch Database, and (11) An “eyes only” database documenting physiological effects from individuals who had spent time on the Skinwalker Ranch.

These are the sources of UPDB:

NUFORC, MUFON (5), UFODNA, BLUEBOOK (3), NICAP, UKTNA (8), CANADAGOV (7), NIDS (1), BRAZILGOV (6), SKINWALKER (10), PILOTS (2), BAASS (9)

As you can see, almost everything is accounted for. Only really the "eyes only" database on physiological effects is missing. You can get UFOCAT from CUFOS for $10.

In 2026, I decided to analyze those 3 meta-databases (UFOCAT, UFO-search, and UPDB) with Claude Code Sonnet 4.6.

The difference switching from ChatGPT to Claude Code was like night and day. The development loop with ChatGPT went back and forth with prompts in the browser, and every change resulted in a new Python script to download. Claude Code could now be done entirely in the CLI with multiple edits, re-running the analysis, and automatically debugging errors all much more streamlined.

Inspired by the work of a Reddit user by the name of VerbaGPT, I decided to examine the meta-databases for keywords that signify certain emotions.

This was my prompt to Claude:

Use Natural Language Processing and sentiment analysis to generate insights about what emotions people feel when they see a UFO, by analyzing the CSV datasets in this directory.

Despite the largely neutral, historical, tone in UFO-search and UFOCAT, overall sentiment analysis using a method called VADER indicates use of more positive language, most present in UPDB.

Knowing that UPDB largely consists of MUFON and NUFORC, I decided to focus on those two databases in particular, with up-to-date data. I used my latest scrapes and the same prompt.

I later shared this with Reddit. I found that report language was more than twice as likely to be positive than negative, and that emotions like curiosity and calm outweighed fear.

I had a brief excursion working with a dataset previously posted to HuggingFace, but upon learning that it was based on the ~80k NUFORC records that have been floating around for years, I lost interest. I did, however, learn something valuable. This time, when I used a variation of my prompt, Claude Code decided to use something called the NRC Lexicon to do the emotion analysis. I was unfamiliar with this prior work on Natural Language Processing (NLP) from the National Research Council Canada (NRC). Later, I went back to the 288k earlier analysis with a more formally defined emotion lexicon.

I then turned my attention back to UPDB, this time drawing from the original PostgreSQL database rather than an exported CSV, and looking for broader insight than just sentiment and emotions.

By this time, my prompt had evolved to:

Analyze the compressed PostgreSQL database in this folder. Break down the sources of the data. Do some basic analysis of this meta-database for geography and time distribution. Then, use Natural Language Processing (NRC Lexicon) and sentiment analysis (VADER) to generate insights about what emotions people feel when they see a UFO. Do not sample down any datasets and be sure to analyze the entire databases. Generate some graphs and diagrams to illustrate the resulting data. Look at the subset of reports that concern entities and analyze those. Do any other analysis that seems interesting.

I posted the results of this analysis to my GitHub. Some of the trends I've seen in UFO data for years are again present. California has the most sightings. July has the most sightings. Most sightings are evenings (9pm) and weekends (Saturday) when people are outside looking up.

Despite the U.S. and English-speaking bias in report gathering (National UFO Reporting Center), it's still apparent that this is a global phenomenon.

I had done some prior work with ChatGPT on UPDB and shapes, where it's interesting to see the highest reported shape change over time from disc, to briefly triangle, to sphere. I have to admit, I wish it didn't default to light mode.

Shapes as a proportion of total reports.

Discussing with Brian Decker, he mentioned another kind of sentiment analysis called BERT (Bidirectional Encoder Representations from Transformers). So, I asked Claude Code to compare this type of analysis with what I had been doing, VADER (Valence Aware Dictionary and sEntiment Reasoner). Back in Claude Code, it took several hours of 100% GPU usage on my Apple Silicon, and yielded stronger sentiment extremes with similar positive vs. negative on average.

I've found other models to try this type of analysis in the future, as well as additional emotion lexicons, listing up to 28 emotions.

In the meantime, I've begun working with a friend on merging and deduplicating all the databases and meta-databases that I have obtained over the years... stay tuned!

Dear reader, as you can see from the types of topics we cover, we do not write for a search algorithm and we do NOT display ads of any kind. That's because with AI, the web business model that we've all known and loved in the past is broken.

We believe that being subscription based is a more equitable model - so If you like our content, please consider becoming an Insider to keep this site going! Also, if you enjoyed this article please consider leaving a tip or sharing on social media! Thank you again for your support and readership.