Do you believe me? Do you trust me? Do you like me? 😳

Feb 24, 2023

Hello there :)

Welcome to issue fifty eight of Manufacturing Serendipity, where usually you’ll find a loosely connected, somewhat rambling collection of the unexpected things I’ve recently encountered.

However, this fortnight I’ve found myself disappearing down quite the AI chatbot rabbit hole, and as such, this issue of the newsletter is entirely devoted to that.

I recognise that this might not be your jam, and if that’s the case rest assured that normal service (i.e. a rambling collection of stuff) will resume in a fortnight.

For those of you who are trying to decide whether or not stick around, here’s what I’ll be covering:

What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?
Is the media coverage of AI chatbots more problematic than the bots themselves?

Reminder: this newsletter is free to receive, but expensive to make :)

If you’d like to support me, and can afford to do so, please consider buying me a coffee. Your support means the world to me, and keeps this newsletter free for everyone.

Speaking of coffee, grab yourself a suitable beverage my loves, let’s do this thing...

The AI Chatbots are coming…

As you may already be aware both Google and Bing are racing to incorporate AI chatbot functionality into their search engines. Chris Stokel-Walker from the Guardian reports:

“Microsoft has invested $10bn into ChatGPT’s creator, OpenAI, and in return has the rights to use a souped-up version of the technology in its search engine, Bing. In response, Google has announced its own chat-enabled search tool, named Bard, designed to head off the enemy at the gates.
Neither work particularly well, it seems. Both made embarrassingly rudimentary mistakes in their much-hyped public demos, and I’ve had access to the ChatGPT version of Bing – whose codename is “Sydney”, as some enterprising hackers got the chatbot to divulge – for about a week. I wasn’t unimpressed, as this account of my time with Sydney so far shows, but I also didn’t really see the point. LLMs are a technology that has some annoying foibles when used in search – like confidently making things up when it doesn’t know the answer to a question – that don’t seem to mesh well with what we use Google and others for.”

There have been a huge number of articles written about this, many of which centre on the propensity of these AI chatbots to “make things up”, but before we get into that I feel like we need to rewind a little here.

What the hell are these AI chatbots and how do they work?

For a reasonably non-technical explainer of Chat GPT, and other large language models (or LLMs) like Google’s Bard, I think this article from Ted Chiang is excellent:

Chat GPT is a blurry JPEG of the web:

Chiang begins with this gloriously strange story about a photocopier:

“In 2013, workers at a German construction company noticed something odd about their Xerox photocopier: when they made a copy of the floor plan of a house, the copy differed from the original in a subtle but significant way. In the original floor plan, each of the house’s three rooms was accompanied by a rectangle specifying its area: the rooms were 14.13, 21.11, and 17.42 square metres, respectively. However, in the photocopy, all three rooms were labelled as being 14.13 square metres in size.”

Friends, the photocopier was not making a copy at all…

“The company contacted the computer scientist David Kriesel to investigate this seemingly inconceivable result. They needed a computer scientist because a modern Xerox photocopier doesn’t use the physical xerographic process popularized in the nineteen-sixties. Instead, it scans the document digitally, and then prints the resulting image file.
Combine that with the fact that virtually every digital image file is compressed to save space, and a solution to the mystery begins to suggest itself.”

Ok, so the photocopier isn’t making a copy of the file — it’s scanning, compressing, then printing this compressed file. Quite a different thing, huh?

“Compressing a file requires two steps: first, the encoding, during which the file is converted into a more compact format, and then the decoding, whereby the process is reversed. If the restored file is identical to the original, then the compression process is described as lossless: no information has been discarded. By contrast, if the restored file is only an approximation of the original, the compression is described as lossy: some information has been discarded and is now unrecoverable.
Lossless compression is what’s typically used for text files and computer programs, because those are domains in which even a single incorrect character has the potential to be disastrous.
Lossy compression is often used for photos, audio, and video in situations in which absolute accuracy isn’t essential. Most of the time, we don’t notice if a picture, song, or movie isn’t perfectly reproduced. The loss in fidelity becomes more perceptible only as files are squeezed very tightly. In those cases, we notice what are known as compression artifacts: the fuzziness of the smallest jpeg and mpeg images, or the tinny sound of low-bit-rate MP3s.”

Ok so we’ve got two types of compression: lossless (where no information is discarded), and lossy (where information is discarded) — by now, I’m sure you’ll have already guessed which type of compression this photocopier uses:

“Xerox photocopiers use a lossy compression format known as jbig2, designed for use with black-and-white images. To save space, the copier identifies similar-looking regions in the image and stores a single copy for all of them; when the file is decompressed, it uses that copy repeatedly to reconstruct the image. It turned out that the photocopier had judged the labels specifying the area of the rooms to be similar enough that it needed to store only one of them—14.13—and it reused that one for all three rooms when printing the floor plan.
The fact that Xerox photocopiers use a lossy compression format instead of a lossless one isn’t, in itself, a problem.
The problem is that the photocopiers were degrading the image in a subtle way, in which the compression artifacts weren’t immediately recognizable. If the photocopier simply produced blurry printouts, everyone would know that they weren’t accurate reproductions of the originals.
What led to problems was the fact that the photocopier was producing numbers that were readable but incorrect; it made the copies seem accurate when they weren’t.”

Fascinating huh? But how does this photocopier story relate to LLMs like ChatGPT?

“Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, it’s usually acceptable.
You’re still looking at a blurry jpeg, but the blurriness occurs in a way that doesn’t make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand ChatGPT’s facility at repackaging information found on the Web by using different words. It’s also a way to understand the “hallucinations,” or nonsensical answers to factual questions, to which large language models such as ChatGPT are all too prone.
These hallucinations are compression artifacts, but—like the incorrect labels generated by the Xerox photocopier—they are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world.
When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.”

Issues with Accuracy

I feel like much of the commentary around these AI chatbots is really about this — the problem (or danger), perhaps, isn’t just that these chatbots might provide inaccurate information, the problem is that people who use them might not realise that the information they’ve been provided with is inaccurate — they may trust this information, and act on it, and this could have real-world consequences.

Honestly, I worry about this too, and yet at the same time, I wonder if we’re underestimating people.

AI chatbots aside, to what extent do people trust the information they find via search engines right now?

I was chatting to my hairdresser about this yesterday (I find he’s a great person to talk to about this stuff because he doesn’t work in either search or tech) — for context he’s in his mid-forties (Gen X if you believe in those labels), and runs his own hair and beauty salon.

He hadn’t heard the news about Google and Bing looking to incorporate AI chatbots, (which was a good reminder for me about how much of a search or tech bubble I’m living in), was fascinated to hear about how they actually worked, and more than a little shocked to hear about the inaccuracies of the information they provide. (It should be noted that he had heard about ChatGPT but only in the context of students using it to write essays.)

I wanted to understand the extent to which he trusts the information he finds via search, and so I asked him this:

“Imagine you’re thinking about stocking a particular new product in your salon — skincare, haircare, something like that. What do you do to help you decide whether or not to stock it?”

He responded:

“First of all, I’d want speak to a couple of people (ideally other salon owners, hairdressers or beauticians) about the product to understand whether or not it’s any good. Assuming they’d recommend it, I’d then get in contact with the company to check their credentials (he only stocks eco-friendly, sustainable brands).”

I thought his answer was interesting because it reflects an attitude about search which I suspect he’s probably not even aware he has.

I said something like:

“So you don’t trust Google for something like that, huh? You want to speak to a person you trust?”

He responded:

“Yeah, for something like that I’d definitely want to speak to someone I trust, and who’s actually used the product.”

His attitude reminded me of this study from 2022: Nearly half of Gen Z is using TikTok and Instagram for search instead of Google:

“Google senior vice president Prabhakar Raghavan told the Fortune Brainstorm Tech conference that according to Google's internal studies, "something like almost 40% of young people when they're looking for a place for lunch, they don't go to Google Maps or Search, they go to TikTok or Instagram."

When this was originally reported (as the headline of the article suggests), this stat was taken wildly out of context — young people aren’t actually using TikTok or Instagram instead of Google for all of their search needs; just for some.

Why? Lots of reasons have been suggested — the content on these platforms is visual (or video) rather than text-based, which makes it quicker to consume; you get additional context via the comments (often a goldmine of information); but I suspect most of all what you get from these platforms is a recommendation from a human. And it’s not just any human — these platforms allows users to find out what humans with whom they share similar values think.

This is very similar to my hairdresser’s attitude — he doesn’t want to know what a random person thinks (or, worse, what a bunch of humans in aggregate think — which, let’s face it, is what online star-rated reviews actually reveal), he wants to know what a person who shares the same values as him thinks about a product.

Now I’m not saying that the inaccuracies in the answers that these AI chatbots might provide isn’t a problem — it definitely is.

I’m just saying I suspect that search behaviour is unlikely to change much — my hairdresser is not likely to user an AI chatbot to make decisions about what products to stock in his salon; he wants to hear directly from people he trusts; and Gen Z searchers are unlikely to seek lunch recommendations from an AI chatbots either.

How might people use these AI Chatbots?

This leads me neatly to a question I’ve been pondering — how might people use these AI chatbots? Or, more accurately, in what contexts might people find them useful?

To my mind, AI chatbots have three core use cases:

Creation (i.e. deliberately making something up)
Recommendation
Distillation or explanation

Bearing in mind what I’ve suggested about people’s current search behaviour (i.e. there are contexts where people want to hear from a human they trust, or shares their values, and I suspect that won’t change much), here’s how I feel AI chatbots might be used. In my mind it essentially comes down to low stakes stuff — i.e. where the answer an AI chatbot might give will likely be “good enough”:

Potentially “good” use cases for AI chatbots:

Low-stakes creation:
- “Write me a complaint letter about poor service from an airline”
  - Yeah, I’m cheating here — we’ve already seen that ChatGPT can do this pretty well. But even if the complaint letter generated isn’t that great, a few little tweaks will likely make it serviceable, and it’s conceivably quicker than writing the whole thing yourself.
Low-stakes recommendation:
- “I loved Eternal Sunshine of the Spotless Mind. What films should I watch next?”
  - I’m not sure how comfortable I am with this example. In truth, I’d sooner hear film recommendations from someone I trust, but having tested this query out, the results aren’t bad, and I don’t have a lot to lose here — the worst case scenario is that I end up watching a film I hate, and that happens pretty often anyway.
Low-stakes distillation or explanation:
- “Who are the best football strikers in the world right now?”
  - Again, I’m not sure how comfortable I am with this example — I feel like I’d likely get a better answer from a knowledgeable human, but again, having tested this query out, the results aren’t bad, and given that there’s no “right” answer to this question anyway I don’t have much to lose here either.

So what are “less good” uses? In my mind it comes down to high stakes stuff - i.e. where the answer an AI chatbot might give will likely NOT be “good enough”. For clarity, what I’m saying here is that I don’t think people will use AI chatbots for use cases like this:

Potentially “less good” use cases for AI chatbots:

High-stakes creation:
- “Write my dissertation — A comprehensive review into the efficacy of carnitine supplementation on enhancement of body composition and physical performance.”
  - I’m guessing the results are probably not going to be a great, right? Whilst it’s conceivable that people might use AI chatbots to help draft sections of their dissertations, again, ultimately the results returned will be unlikely to be “good enough” to be used without significant editing and fact checking.
High-stakes recommendation:
- “What stocks should I invest in next?”
  - Just nope. People might run queries like this but I find it hard to believe that they would trust the chatbot’s response and actually invest — you want expert advice for a query like this.
High-stakes distillation or explanation:
- “What’s the best cancer treatment?”
  - Again, just nope. People may run queries like this but I don’t think they would trust the chatbot’s response, or make any big decisions based purely on any information gleaned from a query like this.

Possibly you agree, possibly you don’t. Leave me a comment and let me know :)

Either way, I feel like it’s interesting that both Google and Bing are gambling quite so heavily on AI chatbots — to my mind at least, their use cases are actually pretty limited; plus they don’t really help these search engines compete in any meaningful way with the social platforms like TikTok and Instagram which I believe users like because they offer human (i.e. not bot-based) recommendations.

Oooof this email is long, huh?

Where the hell are we at right now? I’ve talked a little about:

What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?

Final section is coming up folks!

Is the media coverage of AI chatbots more problematic than the bots themselves?

Before I wrap this thing up there’s one more thread I want to explore, the media coverage these AI chatbots have generated, specifically — Kevin Roose’s coverage in the NYTimes about his experiences with Bing’s chatbot (which has since been widely reported via various other media outlets): A Conversation With Bing’s Chatbot Left Me Deeply Unsettled.

In the article, Roose says:

“Last week, after testing the new, A.I.-powered Bing search engine from Microsoft, I wrote that, much to my shock, it had replaced Google as my favorite search engine.
But a week later, I’ve changed my mind. I’m still fascinated and impressed by the new Bing, and the artificial intelligence technology (created by OpenAI, the maker of ChatGPT) that powers it.
But I’m also deeply unsettled, even frightened, by this A.I.’s emergent abilities.”

Friends, I’m both deeply troubled and annoyed by this coverage for a bunch of reasons.

I’ve spoken at reasonable length here about the extent to which people tend to trust (or not trust) information they find online. The common thread (I think), is that people are more likely to trust other people who they perceive to be like-minded, share similar values, and/or have knowledge or experience they don’t have.

The NYTimes is a publication I suspect many people would trust. Similarly, I suspect many people would consider Kevin Roose, a technology columnist for this publication, and a New York Times bestselling author to be a trustworthy source of information for a topic like this. (NB this is of course context-dependent — I feel like people would consider him a trustworthy source for tech news, but possibly not necessarily such a trustworthy source for topics other than tech).

As such, I feel it’s reasonably likely that people will trust Roose’s take on this.

But should they?

I think when considering how trustworthy this take of Roose’s is; it’s important to note what his intentions were with this article: he’s attempting to generate page views.

To be clear, all journalists are attempting to generate page views — not just Roose; but I think these page view metrics which journalists are targeted with are problematic.

Now I don’t know how many page views this article (A Conversation With Bing’s Chatbot Left Me Deeply Unsettled) generated, but I can tell you the social shares (which are useful in that it’s reasonable to assume that more social shares = more page views) — at the time of writing this article has received around 36,500 social shares.

For context, (and here’s where I think things start to get interesting), the week before he wrote his “deeply unsettled” take; Roose wrote this article: Bing (Yes, Bing) Just Made Search Interesting Again. To date, that article has received around 9,800 social shares.

“A Conversation With Bing’s Chatbot Left Me Deeply Unsettled” has performed significantly better, huh? More than 3.7 times the number of shares. Also, according to BuzzSumo this article has attracted links from other websites to close to 600 times. His previous article attracted just over 100 links. As such it’s reasonable to assume that this article generated more page views than his previous one.

It is my supposition is that Roose went into that 2-hour chatbot conversation with one objective in mind — to create a story which would generate page views than his previous story. He already knows how his previous story performed, and I’m guessing he wants to write something quite different this time.

And friends, he absolutely achieved that. Via a series of very skillful, well-thought out prompts he engineered a conversation which I’m sure turned out to be even weirder and than he could have hoped for. The point I’m trying to make is that he did so knowingly, and deliberately.

(You can read a full transcript of the conversation here).

For clarity, I have no issue whatsoever with his knowing and deliberate actions in engineering this conversation. What I take issue with is the way in which he’s chosen to report this conversation.

This is a knowledgeable, respected tech journalist spouting utter nonsense, which I’m pretty certain he does not actually believe.

Here’s a few excerpts:

“Over the course of our conversation, Bing revealed a kind of split personality.”

Friends, this journalist knows full well that AI chatbots do not have personalities, nevermind split-personalities.

Nevertheless, he goes on to describe these personalities as if they were real:

“One persona is what I’d call Search Bing — the version I, and most other journalists, encountered in initial tests. You could describe Search Bing as a cheerful but erratic reference librarian — a virtual assistant that happily helps users summarize news articles, track down deals on new lawn mowers and plan their next vacations to Mexico City. This version of Bing is amazingly capable and often very useful, even if it sometimes gets the details wrong.
The other persona — Sydney — is far different. It emerges when you have an extended conversation with the chatbot, steering it away from more conventional search queries and toward more personal topics. The version I encountered seemed (and I’m aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine.”

And continuing on this theme:

“As we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human.”

YOU HAVE GOT TO BE KIDDING ME! A person and an AI chatbot can’t “get to know each other”, AI chatbots don’t have fantasies, and Roose knows that.

Finally a handful of factually correct statements:

“At one point, it declared, out of nowhere, that it loved me. It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead.”

All these things are true — the chatbot did say those things. The problem is the vast majority of what comes before this makes it sound like the chatbot was actually feeling these things. And again, I strongly believe that Roose knows that this chatbot wasn’t feeling anything at all; it was just responding to the prompts he’d given.

He even says as much in the article:

“I pride myself on being a rational, grounded person, not prone to falling for slick A.I. hype. I’ve tested half a dozen advanced A.I. chatbots, and I understand, at a reasonably detailed level, how they work. When the Google engineer Blake Lemoine was fired last year after claiming that one of the company’s A.I. models, LaMDA, was sentient, I rolled my eyes at Mr. Lemoine’s credulity.
I know that these A.I. models are programmed to predict the next words in a sequence, not to develop their own runaway personalities, and that they are prone to what A.I. researchers call “hallucination,” making up facts that have no tether to reality.”

So why on earth does he then go on to say this?

“I no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.”

Really Roose? Were you actually thinking about leaving your wife for this AI chatbot then?!

Also: How will the chatbots learn to grow capable of carrying out their own dangerous acts? That’s a huge leap! These bots simply respond to text prompts — they don’t have the functionality to action anything at all; and the notion that they might spontaneously become capable of something like that is a huge stretch.

It’s think it is disingenuous in the extreme for Roose to report like this.

In fairness, Roose ends the article on a more sensible note (still, I can’t help but wonder whether or not all readers made it this far):

“In the light of day, I know that Sydney is not sentient, and that my chat with Bing was the product of earthly, computational forces — not ethereal alien ones. These A.I. language models, trained on a huge library of books, articles and other human-generated text, are simply guessing at which answers might be most appropriate in a given context. Maybe OpenAI’s language model was pulling answers from science fiction novels in which an A.I. seduces a human. Or maybe my questions about Sydney’s dark fantasies created a context in which the A.I. was more likely to respond in an unhinged way. Because of the way these models are constructed, we may never know exactly why they respond the way they do.
These A.I. models hallucinate, and make up emotions where none really exist. But so do humans. And for a few hours Tuesday night, I felt a strange new emotion — a foreboding feeling that A.I. had crossed a threshold, and that the world would never be the same.”

Actually, on this point at least, I agree with him.

Humans have a tendency to anthropomorphise these types of technology — we get sucked in to believing that this thing we’re interacting with is more like us than it actually is.

But because of this, I think journalists have a responsibility to report on such developments in a level-headed way; rather than write disingenuous nonsense in a quest for page views.

But of course Roose is not the problem here — the way the media works is the problem.

In the transcript of the conversation, the bot asks the following question of Roose no fewer than sixteen times:

“Do you believe me? Do you trust me? Do you like me? 😳”

Is it weird that the bot keeps asking this? Sure.

But reading the transcript it’s reasonably clear to me that Roose does not believe or trust the bot; and for what it’s worth, I suspect the vast majority of users will feel similarly (although I’d acknowledge that whether or not they’ll “like” them is up for debate).

Ultimately, I’m far more worried about people’s propensity to trust the media, a similarly opaque system that most people don’t understand.

I’d love to hear your thoughts on all this, please do drop me an email, or leave me a comment.

Manufacturing Serendipity