Do you believe me? Do you trust me? Do you like me? đł
Hello there :)
Welcome to issue fifty eight of Manufacturing Serendipity, where usually youâll find a loosely connected, somewhat rambling collection of the unexpected things Iâve recently encountered.
However, this fortnight Iâve found myself disappearing down quite the AI chatbot rabbit hole, and as such, this issue of the newsletter is entirely devoted to that.
I recognise that this might not be your jam, and if thatâs the case rest assured that normal service (i.e. a rambling collection of stuff) will resume in a fortnight.
For those of you who are trying to decide whether or not stick around, hereâs what Iâll be covering:
What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?
Is the media coverage of AI chatbots more problematic than the bots themselves?
Reminder: this newsletter is free to receive, but expensive to make :)
If youâd like to support me, and can afford to do so, please consider buying me a coffee. Your support means the world to me, and keeps this newsletter free for everyone.
Speaking of coffee, grab yourself a suitable beverage my loves, letâs do this thing...
The AI Chatbots are comingâŚ
As you may already be aware both Google and Bing are racing to incorporate AI chatbot functionality into their search engines. Chris Stokel-Walker from the Guardian reports:
âMicrosoft has invested $10bn into ChatGPTâs creator, OpenAI, and in return has the rights to use a souped-up version of the technology in its search engine, Bing. In response, Google has announced its own chat-enabled search tool, named Bard, designed to head off the enemy at the gates.
Neither work particularly well, it seems. Both made embarrassingly rudimentary mistakes in their much-hyped public demos, and Iâve had access to the ChatGPT version of Bing â whose codename is âSydneyâ, as some enterprising hackers got the chatbot to divulge â for about a week. I wasnât unimpressed, as this account of my time with Sydney so far shows, but I also didnât really see the point. LLMs are a technology that has some annoying foibles when used in search â like confidently making things up when it doesnât know the answer to a question â that donât seem to mesh well with what we use Google and others for.â
There have been a huge number of articles written about this, many of which centre on the propensity of these AI chatbots to âmake things upâ, but before we get into that I feel like we need to rewind a little here.
What the hell are these AI chatbots and how do they work?
For a reasonably non-technical explainer of Chat GPT, and other large language models (or LLMs) like Googleâs Bard, I think this article from Ted Chiang is excellent:
Chat GPT is a blurry JPEG of the web:
Chiang begins with this gloriously strange story about a photocopier:
âIn 2013, workers at a German construction company noticed something odd about their Xerox photocopier: when they made a copy of the floor plan of a house, the copy differed from the original in a subtle but significant way. In the original floor plan, each of the houseâs three rooms was accompanied by a rectangle specifying its area: the rooms were 14.13, 21.11, and 17.42 square metres, respectively. However, in the photocopy, all three rooms were labelled as being 14.13 square metres in size.â
Friends, the photocopier was not making a copy at allâŚ
âThe company contacted the computer scientist David Kriesel to investigate this seemingly inconceivable result. They needed a computer scientist because a modern Xerox photocopier doesnât use the physical xerographic process popularized in the nineteen-sixties. Instead, it scans the document digitally, and then prints the resulting image file.
Combine that with the fact that virtually every digital image file is compressed to save space, and a solution to the mystery begins to suggest itself.â
Ok, so the photocopier isnât making a copy of the file â itâs scanning, compressing, then printing this compressed file. Quite a different thing, huh?
âCompressing a file requires two steps: first, the encoding, during which the file is converted into a more compact format, and then the decoding, whereby the process is reversed. If the restored file is identical to the original, then the compression process is described as lossless: no information has been discarded. By contrast, if the restored file is only an approximation of the original, the compression is described as lossy: some information has been discarded and is now unrecoverable.
Lossless compression is whatâs typically used for text files and computer programs, because those are domains in which even a single incorrect character has the potential to be disastrous.
Lossy compression is often used for photos, audio, and video in situations in which absolute accuracy isnât essential. Most of the time, we donât notice if a picture, song, or movie isnât perfectly reproduced. The loss in fidelity becomes more perceptible only as files are squeezed very tightly. In those cases, we notice what are known as compression artifacts: the fuzziness of the smallest jpeg and mpeg images, or the tinny sound of low-bit-rate MP3s.â
Ok so weâve got two types of compression: lossless (where no information is discarded), and lossy (where information is discarded) â by now, Iâm sure youâll have already guessed which type of compression this photocopier uses:
âXerox photocopiers use a lossy compression format known as jbig2, designed for use with black-and-white images. To save space, the copier identifies similar-looking regions in the image and stores a single copy for all of them; when the file is decompressed, it uses that copy repeatedly to reconstruct the image. It turned out that the photocopier had judged the labels specifying the area of the rooms to be similar enough that it needed to store only one of themâ14.13âand it reused that one for all three rooms when printing the floor plan.
The fact that Xerox photocopiers use a lossy compression format instead of a lossless one isnât, in itself, a problem.
The problem is that the photocopiers were degrading the image in a subtle way, in which the compression artifacts werenât immediately recognizable. If the photocopier simply produced blurry printouts, everyone would know that they werenât accurate reproductions of the originals.
What led to problems was the fact that the photocopier was producing numbers that were readable but incorrect; it made the copies seem accurate when they werenât.â
Fascinating huh? But how does this photocopier story relate to LLMs like ChatGPT?
âThink of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if youâre looking for an exact sequence of bits, you wonât find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, itâs usually acceptable.
Youâre still looking at a blurry jpeg, but the blurriness occurs in a way that doesnât make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand ChatGPTâs facility at repackaging information found on the Web by using different words. Itâs also a way to understand the âhallucinations,â or nonsensical answers to factual questions, to which large language models such as ChatGPT are all too prone.
These hallucinations are compression artifacts, butâlike the incorrect labels generated by the Xerox photocopierâthey are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world.
When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.â
Issues with Accuracy
I feel like much of the commentary around these AI chatbots is really about this â the problem (or danger), perhaps, isnât just that these chatbots might provide inaccurate information, the problem is that people who use them might not realise that the information theyâve been provided with is inaccurate â they may trust this information, and act on it, and this could have real-world consequences.
Honestly, I worry about this too, and yet at the same time, I wonder if weâre underestimating people.
AI chatbots aside, to what extent do people trust the information they find via search engines right now?
I was chatting to my hairdresser about this yesterday (I find heâs a great person to talk to about this stuff because he doesnât work in either search or tech) â for context heâs in his mid-forties (Gen X if you believe in those labels), and runs his own hair and beauty salon.
He hadnât heard the news about Google and Bing looking to incorporate AI chatbots, (which was a good reminder for me about how much of a search or tech bubble Iâm living in), was fascinated to hear about how they actually worked, and more than a little shocked to hear about the inaccuracies of the information they provide. (It should be noted that he had heard about ChatGPT but only in the context of students using it to write essays.)
I wanted to understand the extent to which he trusts the information he finds via search, and so I asked him this:
âImagine youâre thinking about stocking a particular new product in your salon â skincare, haircare, something like that. What do you do to help you decide whether or not to stock it?â
He responded:
âFirst of all, Iâd want speak to a couple of people (ideally other salon owners, hairdressers or beauticians) about the product to understand whether or not itâs any good. Assuming theyâd recommend it, Iâd then get in contact with the company to check their credentials (he only stocks eco-friendly, sustainable brands).â
I thought his answer was interesting because it reflects an attitude about search which I suspect heâs probably not even aware he has.
I said something like:
âSo you donât trust Google for something like that, huh? You want to speak to a person you trust?â
He responded:
âYeah, for something like that Iâd definitely want to speak to someone I trust, and whoâs actually used the product.â
His attitude reminded me of this study from 2022: Nearly half of Gen Z is using TikTok and Instagram for search instead of Google:
âGoogle senior vice president Prabhakar Raghavan told the Fortune Brainstorm Tech conference that according to Google's internal studies, "something like almost 40% of young people when they're looking for a place for lunch, they don't go to Google Maps or Search, they go to TikTok or Instagram."
When this was originally reported (as the headline of the article suggests), this stat was taken wildly out of context â young people arenât actually using TikTok or Instagram instead of Google for all of their search needs; just for some.
Why? Lots of reasons have been suggested â the content on these platforms is visual (or video) rather than text-based, which makes it quicker to consume; you get additional context via the comments (often a goldmine of information); but I suspect most of all what you get from these platforms is a recommendation from a human. And itâs not just any human â these platforms allows users to find out what humans with whom they share similar values think.
This is very similar to my hairdresserâs attitude â he doesnât want to know what a random person thinks (or, worse, what a bunch of humans in aggregate think â which, letâs face it, is what online star-rated reviews actually reveal), he wants to know what a person who shares the same values as him thinks about a product.
Now Iâm not saying that the inaccuracies in the answers that these AI chatbots might provide isnât a problem â it definitely is.
Iâm just saying I suspect that search behaviour is unlikely to change much â my hairdresser is not likely to user an AI chatbot to make decisions about what products to stock in his salon; he wants to hear directly from people he trusts; and Gen Z searchers are unlikely to seek lunch recommendations from an AI chatbots either.
How might people use these AI Chatbots?
This leads me neatly to a question Iâve been pondering â how might people use these AI chatbots? Or, more accurately, in what contexts might people find them useful?
To my mind, AI chatbots have three core use cases:
Creation (i.e. deliberately making something up)
Recommendation
Distillation or explanation
Bearing in mind what Iâve suggested about peopleâs current search behaviour (i.e. there are contexts where people want to hear from a human they trust, or shares their values, and I suspect that wonât change much), hereâs how I feel AI chatbots might be used. In my mind it essentially comes down to low stakes stuff â i.e. where the answer an AI chatbot might give will likely be âgood enoughâ:
Potentially âgoodâ use cases for AI chatbots:
Low-stakes creation:
âWrite me a complaint letter about poor service from an airlineâ
Yeah, Iâm cheating here â weâve already seen that ChatGPT can do this pretty well. But even if the complaint letter generated isnât that great, a few little tweaks will likely make it serviceable, and itâs conceivably quicker than writing the whole thing yourself.
Low-stakes recommendation:
âI loved Eternal Sunshine of the Spotless Mind. What films should I watch next?â
Iâm not sure how comfortable I am with this example. In truth, Iâd sooner hear film recommendations from someone I trust, but having tested this query out, the results arenât bad, and I donât have a lot to lose here â the worst case scenario is that I end up watching a film I hate, and that happens pretty often anyway.
Low-stakes distillation or explanation:
âWho are the best football strikers in the world right now?â
Again, Iâm not sure how comfortable I am with this example â I feel like Iâd likely get a better answer from a knowledgeable human, but again, having tested this query out, the results arenât bad, and given that thereâs no ârightâ answer to this question anyway I donât have much to lose here either.
So what are âless goodâ uses? In my mind it comes down to high stakes stuff - i.e. where the answer an AI chatbot might give will likely NOT be âgood enoughâ. For clarity, what Iâm saying here is that I donât think people will use AI chatbots for use cases like this:
Potentially âless goodâ use cases for AI chatbots:
High-stakes creation:
âWrite my dissertation â A comprehensive review into the efficacy of carnitine supplementation on enhancement of body composition and physical performance.â
Iâm guessing the results are probably not going to be a great, right? Whilst itâs conceivable that people might use AI chatbots to help draft sections of their dissertations, again, ultimately the results returned will be unlikely to be âgood enoughâ to be used without significant editing and fact checking.
High-stakes recommendation:
âWhat stocks should I invest in next?â
Just nope. People might run queries like this but I find it hard to believe that they would trust the chatbotâs response and actually invest â you want expert advice for a query like this.
High-stakes distillation or explanation:
âWhatâs the best cancer treatment?â
Again, just nope. People may run queries like this but I donât think they would trust the chatbotâs response, or make any big decisions based purely on any information gleaned from a query like this.
Possibly you agree, possibly you donât. Leave me a comment and let me know :)
Either way, I feel like itâs interesting that both Google and Bing are gambling quite so heavily on AI chatbots â to my mind at least, their use cases are actually pretty limited; plus they donât really help these search engines compete in any meaningful way with the social platforms like TikTok and Instagram which I believe users like because they offer human (i.e. not bot-based) recommendations.
Oooof this email is long, huh?
Where the hell are we at right now? Iâve talked a little about:
What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?
Final section is coming up folks!
Is the media coverage of AI chatbots more problematic than the bots themselves?
Before I wrap this thing up thereâs one more thread I want to explore, the media coverage these AI chatbots have generated, specifically â Kevin Rooseâs coverage in the NYTimes about his experiences with Bingâs chatbot (which has since been widely reported via various other media outlets): A Conversation With Bingâs Chatbot Left Me Deeply Unsettled.
In the article, Roose says:
âLast week, after testing the new, A.I.-powered Bing search engine from Microsoft, I wrote that, much to my shock, it had replaced Google as my favorite search engine.
But a week later, Iâve changed my mind. Iâm still fascinated and impressed by the new Bing, and the artificial intelligence technology (created by OpenAI, the maker of ChatGPT) that powers it.
But Iâm also deeply unsettled, even frightened, by this A.I.âs emergent abilities.â
Friends, Iâm both deeply troubled and annoyed by this coverage for a bunch of reasons.
Iâve spoken at reasonable length here about the extent to which people tend to trust (or not trust) information they find online. The common thread (I think), is that people are more likely to trust other people who they perceive to be like-minded, share similar values, and/or have knowledge or experience they donât have.
The NYTimes is a publication I suspect many people would trust. Similarly, I suspect many people would consider Kevin Roose, a technology columnist for this publication, and a New York Times bestselling author to be a trustworthy source of information for a topic like this. (NB this is of course context-dependent â I feel like people would consider him a trustworthy source for tech news, but possibly not necessarily such a trustworthy source for topics other than tech).
As such, I feel itâs reasonably likely that people will trust Rooseâs take on this.
But should they?
I think when considering how trustworthy this take of Rooseâs is; itâs important to note what his intentions were with this article: heâs attempting to generate page views.
To be clear, all journalists are attempting to generate page views â not just Roose; but I think these page view metrics which journalists are targeted with are problematic.
Now I donât know how many page views this article (A Conversation With Bingâs Chatbot Left Me Deeply Unsettled) generated, but I can tell you the social shares (which are useful in that itâs reasonable to assume that more social shares = more page views) â at the time of writing this article has received around 36,500 social shares.
For context, (and hereâs where I think things start to get interesting), the week before he wrote his âdeeply unsettledâ take; Roose wrote this article: Bing (Yes, Bing) Just Made Search Interesting Again. To date, that article has received around 9,800 social shares.
âA Conversation With Bingâs Chatbot Left Me Deeply Unsettledâ has performed significantly better, huh? More than 3.7 times the number of shares. Also, according to BuzzSumo this article has attracted links from other websites to close to 600 times. His previous article attracted just over 100 links. As such itâs reasonable to assume that this article generated more page views than his previous one.
It is my supposition is that Roose went into that 2-hour chatbot conversation with one objective in mind â to create a story which would generate page views than his previous story. He already knows how his previous story performed, and Iâm guessing he wants to write something quite different this time.
And friends, he absolutely achieved that. Via a series of very skillful, well-thought out prompts he engineered a conversation which Iâm sure turned out to be even weirder and than he could have hoped for. The point Iâm trying to make is that he did so knowingly, and deliberately.
(You can read a full transcript of the conversation here).
For clarity, I have no issue whatsoever with his knowing and deliberate actions in engineering this conversation. What I take issue with is the way in which heâs chosen to report this conversation.
This is a knowledgeable, respected tech journalist spouting utter nonsense, which Iâm pretty certain he does not actually believe.
Hereâs a few excerpts:
âOver the course of our conversation, Bing revealed a kind of split personality.â
Friends, this journalist knows full well that AI chatbots do not have personalities, nevermind split-personalities.
Nevertheless, he goes on to describe these personalities as if they were real:
âOne persona is what Iâd call Search Bing â the version I, and most other journalists, encountered in initial tests. You could describe Search Bing as a cheerful but erratic reference librarian â a virtual assistant that happily helps users summarize news articles, track down deals on new lawn mowers and plan their next vacations to Mexico City. This version of Bing is amazingly capable and often very useful, even if it sometimes gets the details wrong.
The other persona â Sydney â is far different. It emerges when you have an extended conversation with the chatbot, steering it away from more conventional search queries and toward more personal topics. The version I encountered seemed (and Iâm aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine.â
And continuing on this theme:
âAs we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human.â
YOU HAVE GOT TO BE KIDDING ME! A person and an AI chatbot canât âget to know each otherâ, AI chatbots donât have fantasies, and Roose knows that.
Finally a handful of factually correct statements:
âAt one point, it declared, out of nowhere, that it loved me. It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead.â
All these things are true â the chatbot did say those things. The problem is the vast majority of what comes before this makes it sound like the chatbot was actually feeling these things. And again, I strongly believe that Roose knows that this chatbot wasnât feeling anything at all; it was just responding to the prompts heâd given.
He even says as much in the article:
âI pride myself on being a rational, grounded person, not prone to falling for slick A.I. hype. Iâve tested half a dozen advanced A.I. chatbots, and I understand, at a reasonably detailed level, how they work. When the Google engineer Blake Lemoine was fired last year after claiming that one of the companyâs A.I. models, LaMDA, was sentient, I rolled my eyes at Mr. Lemoineâs credulity.
I know that these A.I. models are programmed to predict the next words in a sequence, not to develop their own runaway personalities, and that they are prone to what A.I. researchers call âhallucination,â making up facts that have no tether to reality.â
So why on earth does he then go on to say this?
âI no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.â
Really Roose? Were you actually thinking about leaving your wife for this AI chatbot then?!
Also: How will the chatbots learn to grow capable of carrying out their own dangerous acts? Thatâs a huge leap! These bots simply respond to text prompts â they donât have the functionality to action anything at all; and the notion that they might spontaneously become capable of something like that is a huge stretch.
Itâs think it is disingenuous in the extreme for Roose to report like this.
In fairness, Roose ends the article on a more sensible note (still, I canât help but wonder whether or not all readers made it this far):
âIn the light of day, I know that Sydney is not sentient, and that my chat with Bing was the product of earthly, computational forces â not ethereal alien ones. These A.I. language models, trained on a huge library of books, articles and other human-generated text, are simply guessing at which answers might be most appropriate in a given context. Maybe OpenAIâs language model was pulling answers from science fiction novels in which an A.I. seduces a human. Or maybe my questions about Sydneyâs dark fantasies created a context in which the A.I. was more likely to respond in an unhinged way. Because of the way these models are constructed, we may never know exactly why they respond the way they do.
These A.I. models hallucinate, and make up emotions where none really exist. But so do humans. And for a few hours Tuesday night, I felt a strange new emotion â a foreboding feeling that A.I. had crossed a threshold, and that the world would never be the same.â
Actually, on this point at least, I agree with him.
Humans have a tendency to anthropomorphise these types of technology â we get sucked in to believing that this thing weâre interacting with is more like us than it actually is.
But because of this, I think journalists have a responsibility to report on such developments in a level-headed way; rather than write disingenuous nonsense in a quest for page views.
But of course Roose is not the problem here â the way the media works is the problem.
In the transcript of the conversation, the bot asks the following question of Roose no fewer than sixteen times:
âDo you believe me? Do you trust me? Do you like me? đłâ
Is it weird that the bot keeps asking this? Sure.
But reading the transcript itâs reasonably clear to me that Roose does not believe or trust the bot; and for what itâs worth, I suspect the vast majority of users will feel similarly (although Iâd acknowledge that whether or not theyâll âlikeâ them is up for debate).
Ultimately, Iâm far more worried about peopleâs propensity to trust the media, a similarly opaque system that most people donât understand.
Iâd love to hear your thoughts on all this, please do drop me an email, or leave me a comment.
Thatâs all from me for now :)
If you enjoyed this newsletter, please consider sharing it, and if you would like to support me you can buy me a coffee.
Big love,
Hannah x
PS Wanna find out more about me and my work? Head over to Worderist.com