Wednesday, August 24, 2016

Theology, Rheology and some freaky strange search results

Dan Lowry (@DrFriction) tweeted last night "Whenever life seems devoid of meaning or humor, just do a web search on 'theological properties'" (referring to the fact that spellcheckers typically attempt to change "rheological" into "theological"). So I did just that.

Wow. Wow. Wow. Look at this screenshot:
The spellcheckers are winning far more often than I would have ever imagined.

But a little bit of digging suggests that there may be a far more sinister plot, one of revisionist history. I clicked on the first link and found this:
while at the bottom of the page there was this:
So what gives? Was the title later fixed? (That doesn't seem possible as it looks like an image capture, but I'm no expert in these areas.)

But weirder yet is what I found at the fourth hit:
Clearly an image of an original document, with a correct title. But that is not the weird part. It's when I searched the rest of the document for "theol" with the crtl-F key. Every single return (31 total) pointed to a word correctly spelled as rheol...For instance:

What is going on? I know and expect that Google would return a search for most people "rheology" (no quotes) as "theology", but for a word finder in a .pdf document to do that?

Again, I am swimming in the deep end. Any insight that someone could offer would be most helpful as there things here that are disturbing. I know my google search results are not neutral and haven't been for years, but for the text search in a pdf to be like that is not good.


Previous Years

August 24, 2011 - Review: "Social Marketing to the Business Customer"

August 24, 2010 - The Deborah and Weissenberg Numbers

August 24, 2009 - BASF as a hostile takeover target?



1 comment:

Peter said...

PDF's can have a text layer and an image layer. You see the image, but search (of course) in the text layer (characters). The same with copy/paste. If an image does not have a text layer, you can not copy/paste text from it in PDF. The text in the text layer was obtained with OCR from the scanned book. That's statistics and presumably the spell check predominated in this case. Only a text layer might also be possible, I believe, if you save a .docx document as PDF.