⌃+I
⌃+TTEAM
⌃+WWORK
⌃+OOPEN SOURCE
⌃+BBLOG
⌃+HHIRE



Blog

Self-Custody Genealogy: How AI Unlocked 20 Years of Family History

Self-Custody Genealogy: How AI Unlocked 20 Years of Family History

My grandfather, a Presbyterian pastor from Independence, Missouri, incited an international diplomatic incident.

As part of the Sister City Committee of Independence, then sister-city with Blantyre, Malawi, he'd invited the President of Malawi to visit Independence. Without notifying the White House. Or, anybody, really.

President Johnson learned a foreign head of state was in America without his knowledge. Air Force One was dispatched to KCMO to collect the Malawian president AND the pastor who'd caused the mess.

LBJ and President Banda LBJ and President Banda, the latter holding a ceremonial staff—a lion's tail that he used to bless a group of onlookers at Stephenson's Apple Orchard in Independence, MO.

On June 8, 1967, my grandpa and grandma ate steak for breakfast on the presidential aircraft. My grandmother, who had worked as a telephone operator, remembers hearing a phone ringing and thinking "things have changed since I worked the switchboard!" After landing at Andrews AFB, they boarded Marine Force One and flew over hundreds of gathered protestors of the 6 Day War (on both sides) to the White House lawn. At the White House, they were treated to a luncheon. People asked my grandfather for autographs — they thought he was a Malawian official. He recalled being "the only person who wasn't black" in attendance. Thurgood Marshall was there, days before his appointment to the SC.

Letter from President Johnson A letter from President Johnson to my grandfather

I know this full story, today, because I spent the past two weeks digitizing 20 years of my mom's genealogy research.

Here's what I built, what I found, and why AI made it possible.

Where I Started

Over the past two decades, my mom put in work on our family's genealogy. Newspaper clippings, census records, family photos, oral histories recorded on cassette tapes.

Herman and Edith Schneider Herman and Edith Schneider

I think we all know the experience of wanting to organize our files and failing. My mother did her best but her files were everywhere. To start this project off, I installed Claude Code on her computer and was able to trawl (from a thousand miles away, using some RAT tooling) her file stores and find potentially related material, across local and external drives. I'd have my mother plug in a drive, set Claude loose on it, and come back to results.

Americans take 230 billion photos per year. Only a third back them up. Over half have deleted images due to storage limits. If there's a 10% annual chance of something going wrong, your photos are more likely to be lost than preserved after seven years.

Preservation calls for considered custodianship, not casual collection.

And, even when my mom was particularly organized, the tools she found herself forced into for this work are quietly hostile, particularly to aging boomers who find the gap between their models for computer interaction and the current computing experience widening as the years go on. Ancestry.com stinks.

  • No Public API: They promised to release one years ago. They never did. Everything goes through their UI. Want to download your photos? That's 2-3 clicks per file. For thousands of photos, you're looking at weeks of manual labor.
  • Your Data is Theirs: When you download your tree GEDCOM, you get the text. Names. Dates. Places. You don't get any of the images you uploaded. You don't get the source documents you attached. All that work to substantiate the "Names. Dates. Places.", locked behind a subscription.
  • Dead Links Abound: Ancestry doesn't archive the web sources you cite. When that site goes offline (and the average lifespan of a web page is 100 days), you're left with a broken URL and whatever text you copied. 49% of URLs in Supreme Court decisions are now dead. Your family tree citations fare no better.

I needed something I could own. Something where every document, every photo, every citation lived in a folder on a hard drive I controlled.

Self-Custody Genealogy

Two weeks. Git version control. Everything in a single repository on two physically distant hard drives, and a third deep storage service.

No subscription. No vendor. No API rate limits. If GitHub disappeared tomorrow, the archive would still work; it's just files.

The Anti-GEDCOM: Ancestry exports text. I export everything. 3,600 photos. 400+ OCR'd documents. 131 people tracked in a SQLite database with relationships, marriages, children. Every photo tagged with bounding boxes describing who's in it. Every claim linked to the source(s) that support(s) it. When I download my family tree, I get the actual family tree.

Dead-Link Insurance: Every source document lives in the archive. That obscure genealogy website from 2003 that had the one newspaper clipping proving great-great-grandpa was a beekeeper? I don't link to it. I saved the PDF. When that site dies (and it will) my citation still works.

The Bear Story

Growing up, there was this weird story in my family. A little girl (a Flynn, one of our ancestors) got lost in the Michigan woods in the 1800s. Search parties looked for days. When they found her, she was being guarded by a bear. She introduced the bear to the group, calling it "Doggy."

That's it. That's all we had. We didn't know which Flynn. We didn't know when. We didn't even know if it was true. It sounded like the kind of thing families make up.

But I wanted to know.

I'm interested in paranormal reporting. Not as a believer necessarily (or maybe as one, but that's none of your business) as someone fascinated by the patterns in what people report. And there's this thread running through American wilderness history that our family story connected to.

David Paulides, a former police officer, has spent years documenting what he calls "Missing 411" — cases of people, often children, who vanish in national parks and wilderness areas under strange circumstances. One recurring pattern: children who go missing and are later found alive sometimes report being protected or cared for by animals, often describing a "bear" or a "big furry friend."

  • Ida Mae Curtis, two years old, disappeared in Kootenai National Forest in 1955. Her mother saw her carried off by what looked like a bear "cradling" her. Found two days later in a small shelter of sticks, the girl claimed the bear had fed her.
  • Casey Hathaway, three years old, went missing in North Carolina and was found two days later. He told his aunt he'd "spent two days with a bear that was taking care of him." It was winter. Bears in that region should have been hibernating.

Anyway, here's what happened: I had set Claude off using Claude in Chrome doing click-by-click downloading of everything in my mom's Ancestry tree. Autonomously navigating through her photos, documents, sources. When Claude hit that dead link, it went to the Wayback Machine, found the right archived version, and asked me to print it as a PDF.

That PDF is now in the archive. The original site is dead. My citation still works.

The story in the Wayback snapshot matched our family legend. Six-year-old Katie Flynn, circa 1880, lost near Custer, Michigan. Found after two days, guarded by a great black bear she called "Doggy."

Now, when I run a semantic search in my personal family history for a Paulides-esque string, I get this:

$ python3 vector_index.py search "little girl lost in woods protected by bear"
[0.9776] family-fable-katie-flynn-bear-story.md
Title: Family Fable...More Evidence — Katie Flynn Bear Story
Source: Screenshot from Ancestry.com family tree

[0.8127] 1940-02-15-mason-county-press-amber-township-history.md
Title: Amber Township History — As Described by Old Settlers
Source: Mason County Press, February 15, 1940

[0.7721] 1942-02-26-beatrice-ohearn-scottville-history.md
Title: How Scottville Came to Be
Source: Mason County Public Library local history collection

Our family story and two independent sources.

All three tell the same story. Same girl. Same age. Same town. Same bear. Same "Doggy."

The vector search finds them because the concepts matched — "little girl lost protected by bear" surfaces documents about "six-year-old Katie Flynn guarded by a great black bear" even though the words are different.

The story was real enough that multiple independent sources documented it at the time. People in 1940 remembered it. A historian in 1942 wrote it down.

I'm not saying my great-great-great aunt was protected by Bigfoot. I'm not saying anything about what happened in those Michigan woods in 1880.

I'm saying our family had a story. The original source link died. The archive preserved it anyway. And when I searched, I found corroboration I never expected.

That's what a searchable archive does. It turns "huh, weird family legend" into "holy shit, this is documented, and there's a whole genre of this."

OCR for the Otherwise Unsearchable

An 1820 will, handwritten in faded cursive. "I James Ivey of S. State and District being weak in Body but of Sound and Disposing Mind & Memory..." Mistral read it. Now I can search for "Ivey bequeath three Cows & Calves" and find the exact page. Ten years ago, that document was locked behind human eyes and paleography skills.

James Ivey Will The James Ivey Will — an 1820 handwritten document, now fully searchable

Family oral histories on cassette tapes. My grandmother talking about her mother in 2016. AssemblyAI transcribed them with speaker diarization. I can search "Isaac teach children German" and find: "Isaac would always speak to his children in German, but insisted they respond in English." That quote came from a cassette tape in a shoebox.

Built for Mom

Grandma, Grandpa and Mom Grandma, Grandpa and Mom

My mom spent twenty years doing the hard work — finding the documents, conducting the interviews, preserving the photos. But she's not technical. She doesn't know what "OCR" means. She doesn't need to.

I configured Claude with split personalities. When my mom opens it, Claude is a patient guide. It has instructions on how to handle edges. It knows how to move windows around on her computer so she has her attention drawn to what she's looking for. It browses the web with her.

When I open it, Claude switches to technical mode. Same archive, same tools, different interface.

AI didn't do the genealogy. AI made the genealogy usable for the person who actually did the work. And, some day, for my children.

The Melton Family Collection

A React app. 18 chapters (and growing) spanning 1871-2026. Every photo tagged, the individuals bounding boxed and either labeled or awaiting their labels. Every claim sourced, linked, preserved. A museum interface for a lifetime of research.

The footer says: "Made with love for Nancy."

My mom can finally explore her own research.


If You Need Something Like This

We build AI-powered tools at Martian Engineering. This was personal, but the techniques transfer:

  • Corporate archives that nobody can search
  • Legal discovery across thousands of documents
  • Research databases where context matters
  • Any domain where scattered information needs to become coherent knowledge

If you have a mess that needs to make sense, reach out.