Using AI to OCR and Transcribe a 1935 typed travel diary

In 1935 my maternal grandfather, Reginald Boston (1909-1970), at the age of 25, sailed on the R.M.S. “Orontes” from London, England to Fremantle, Western Australia, to be with his fiance and to start a new life there.

He was a professional “corrector of the press”, i.e. a copy writer, and so was great writer.

I recently came across his typed 24 page account of the voyage.

Here’s how I went from those 24 pages, to this Google document presentation (below) which includes a map, AI-generated images, and a photo of the man himself and his wife-to-be.

https://docs.google.com/document/d/16Bds62Z6kD_nKJi2l3GMdd6DnQTunCFaUFvvNqQfXGQ/edit?usp=sharing

Please note that:

  • I’m not going to spend much time explaining what DIDN’T work, and why.
  • Given the rate at which AI is improving, I don’t doubt that there will be faster, easier ways in the future – probably already!

Here we go:

  1. Epson ET-2850 multifunction printer/scanner. I used the Epson 2 scanner software of my Windows 11 Pro PC to scan all the pages into a single PDF, which you can see here: https://drive.google.com/file/d/1eJVe8gcbA-0iMT8q64kPC_06RWVZZGhD/view?usp=sharing. At this stage the pages are just a sequence of images in the PDF, one image per page.
  2. Thanks to the Facebook group called Genealogy and Artificial Intelligence (AI) I got referred to an AI-powerd OCR (optical character recognition) online service with a generous free allowed called Transkribus. I uploaded my PDF here.

From the ‘Jobs’ page I could see my uploaded document:

Click on the Title, and now I can see all the iindividual page. Select them all, then click on ‘Recognize’

Many different ‘Language Models’ become available at this stage, each one optimised for different specialities. I narrowed my search down to ‘English Printed’, chose ‘Transkribus Print M1’, then clicked ‘Start Recognition.

This created another ‘Job’ with a type ‘Text Recognition’. Open that up, select all, click on the 3 horizontal dots and choosed ‘Export’

I chose to export to ‘Docx files’, but I’m guessing that ‘Text Files’ might have worked just as well.

When it’s done, Transkribus sends you and email with a link to download the document (as as .zip file).

  1. I opened ChatGPT 4 (the paid version), and copied this prompt (as suggested by someone else in the Facebook group), followed by a full cut & paste of all the text from the .docx file.

    You are an expert transcriptionist and editor. Your goal is to create great writing. Find below the raw OCR text from 24 pages of text; the OCR quality was poor; the file needs proofreading. Act as a smart OCR chatbot; convert the bad input text below as a rough OCR scan; correct the raw text to instead reflect the most likely original text. As your conversation goes further, consider the context of the OCR text in your considerations. Correct and proofread the entire journal segment.
  2. ChatGPT didn’t want to do this all in one go, so I had to click on ‘Continue Generating’ multiple times.
  3. After ChatGPT had worked its magic, I cut & pasted the (amazing!) text into a Google doc.
  4. In the same ChatGPT chat as the above, I asked the following:
    For each of the days listed in the diary, please create an appropriate graphic using a style consistent with the 1930s.
  5. ChatGPT wasn’t able to obey the above command, but it did give me text like this:
  1. In an new ChatGPT chat, I then asked ChatGPT to generate the images, as per the suggested text (above):
  1. And that, ladies and gentlemen, is about it! Oh, except for one more thing: the map!

The map was a little more involved. In brief, I manually created a spreadsheet with the key places and dates, cut & pasted that sheet into ChatGPT, and asked it to create a map, with labels and lines joining each place. It took a bit of backwards and forwards chat to get it right. The end result was actually an HTML page which I could open in my browser, and take a screenshot of.

Reflections on the movie “Planet of the Humans”

I’ve been thinking hard on the new environmental movie “Planet of the Humans“, which the acclaimed director Michael Moore put his name to as Executive Producer. You can watch it at the foot of this post (if it’s still free). The wife and I watched it shortly after its release on Earth Day, 22 April 2020.

I was really excited to watch the movie. The older I get, the more of a tree-hugger I’m becoming. I have enormous respect for Michael Moore as a documentary maker, fighter for human rights, defender of truth and justice, and lifelong Bernie Sanders supporter. He really hyped up the movie in his podcast series, Rumble.

My initial impressions of the movie were:

  1. Damn it! Renewable energy is not nearly as good (green) as I thought it was. I love my solar panels a bit less.
  2. Jeez, the director’s narrative voice is as dull as dishwater. He’s attempted to adopt the same style as Michael Moore, but doesn’t pull it off.
  3. The subject matter felt too narrow, and shallow in the science department. More ground could and should have been covered.
  4. It’s almost completely devoid of hope, solutions. If you were depressed about the state of the environment before, then you might be be upping your Prozac dosage afterwards.

I kept my eyes and ears open for the reviews to roll in and, generally speaking, they weren’t that great.

Yes, the movie is a good discussion starter. Yes, we do all need to continue to stay focused and work hard to improve the state of the planet for ourselves and the future of humanity. But there IS hope. The future is NOT as bleak as the movie would have you believe. Jeff Gibbs’ heart is in the right place, but he lets himself and the green energy movement down with errors of omission, oversight, and outdatedness.

As a Director, Michael Moore has given us documentary classics such as Sicko, Bowling for Columbine, Capitalism: A Love Story, and Fahrenheit 9/11. “Planet of the Humans” is the weakest movie that MM has lent his name to.

There’s an Australian website called SolarQuotes. Its primary commerical function is to earn commissions by introducing people to 3 reputable solar panel retailers/installers. But over the years I’ve learnt that the main contributors have strong science backgrounds. They absolutely know their stuff on all things solar, and renewables in general. Their review of the movie sums up my feelings, and with a lot more science and evidence than I could muster. Definitely worth a read.

by Ronald Brakels of SolarQuotes, 1 May 2020

Top 10 Reasons To Buy An Electric Bike

A Leitner E-bike pictured in front a section of the Yarra River in Melbourne, Australia.
E-biking by the Yarra River, Melbourne, Victoria

#1 Don’t arrive at work all sweaty. Let the e-bike take the strain.

#2 Save money. Driving your car less means less spending on fuel and car maintenance. Better still, if it allows you to sell a car, then you’ve just saved on tax and insurance too, plus the cash you got for the car.

#3 Help the environment. Less cars on the road means less pollution. And if you’re charging your battery from clean energy sources (e.g. your own solar panels) – even better!

#4 Enjoy the great outdoors. Less time panting and puffing, more time enjoying the scenery.

#5 Go riding with your family and friends who wouldn’t normally go biking. That’s right, lend them your e-bike! Now they can keep up with you on your non-electric bike. The other day I cycled 80km on a rail trail, with my 11 year old daughter riding my e-bike. Great memories, that wouldn’t have been possible otherwise.

#6 Explore more and further. The other day I explored my suburb far and wide, up and down streets I’d never normally travel.

#7 Get fit. You may not be burning as many calories as you would do on a non-electric bike over the same distance. But you’re probably going to be cycling more often, and longer distances.

#8 Climb hills with ease, laugh at head winds. Unless the hill is extremely steep, the chances are you’ll never have to get off and push again. Head wind? No problem!

#9 Have fun. E-bikes are genuinely a pleasure to ride.

#10 Fall (back) in love with your NON e-bike. This happened to me! After falling in love with e-biking, I fell back in love with cycling in general, so much so that I recently (Jan 2020) spoiled myself with the purchase of a new Giant mountain bike (MTB). It replaced my “old faithful” GT MTB, purchased way back in Florida in ’97. I’ve never cycled so much as I do now. Sweet!

A picture of my daughter next to my e-bike, mid-way on the Lilydale to Warburton Rail Trail.
Riding the Lilydale to Warburton Rail Trail with my 11 year old, there and back!

Why you should use FamilySearch.org – and not just because it’s free

If you’re interesting in researching and building your family tree online (the only place to do it!), but can’t justify the cost of ancestry.com or other commercial sites, I recommend you take a good look at https://familysearch.org, which is 100% free as far as I can tell, thank you Mormons!

One of the best features of familysearch.org, that ancestry.com could learn a thing or two from, is that it forces more rigour into your ability to add people and information to you family tree.

Unlike Ancestry, when you go to add another person to your tree in FamilySearch, it actively attempts to find a match for that individual for you in its existing database of millions (billions?) of people. If you decide that none of the matches are the person you want to add, then you can go ahead and add them, in the knowledge that you are creating a brand new entry, with a unique ID, for that person in their database. So don’t do it lightly!

Ancestry will let you add a new person very easily, with no cross-checking with the trees of other members. Sure, it provides an excellent hint system for you to match your ancestors with those of others, but this happens after the event, i.e. after you’ve already added that person to you personal family tree. Which means that there is likely a much larger amount of data duplication and errors in Ancestry compared with FamilySearch.

Enjoy!