Do you believe me? Do you trust me? Do you like me? š³
Hello there :)
Welcome to issue fifty eight of Manufacturing Serendipity, where usually youāll find a loosely connected, somewhat rambling collection of the unexpected things Iāve recently encountered.
However, this fortnight Iāve found myself disappearing down quite the AI chatbot rabbit hole, and as such, this issue of the newsletter is entirely devoted to that.
I recognise that this might not be your jam, and if thatās the case rest assured that normal service (i.e. a rambling collection of stuff) will resume in a fortnight.
For those of you who are trying to decide whether or not stick around, hereās what Iāll be covering:
What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?
Is the media coverage of AI chatbots more problematic than the bots themselves?
Reminder: this newsletter is free to receive, but expensive to make :)
If youād like to support me, and can afford to do so, please consider buying me a coffee. Your support means the world to me, and keeps this newsletter free for everyone.
Speaking of coffee, grab yourself a suitable beverage my loves, letās do this thing...
The AI Chatbots are comingā¦
As you may already be aware both Google and Bing are racing to incorporate AI chatbot functionality into their search engines. Chris Stokel-Walker from the Guardian reports:
āMicrosoft has invested $10bn into ChatGPTās creator, OpenAI, and in return has the rights to use a souped-up version of the technology in its search engine, Bing. In response, Google has announced its own chat-enabled search tool, named Bard, designed to head off the enemy at the gates.
Neither work particularly well, it seems. Both made embarrassingly rudimentary mistakes in their much-hyped public demos, and Iāve had access to the ChatGPT version of Bing ā whose codename is āSydneyā, as some enterprising hackers got the chatbot to divulge ā for about a week. I wasnāt unimpressed, as this account of my time with Sydney so far shows, but I also didnāt really see the point. LLMs are a technology that has some annoying foibles when used in search ā like confidently making things up when it doesnāt know the answer to a question ā that donāt seem to mesh well with what we use Google and others for.ā
There have been a huge number of articles written about this, many of which centre on the propensity of these AI chatbots to āmake things upā, but before we get into that I feel like we need to rewind a little here.
What the hell are these AI chatbots and how do they work?
For a reasonably non-technical explainer of Chat GPT, and other large language models (or LLMs) like Googleās Bard, I think this article from Ted Chiang is excellent:
Chat GPT is a blurry JPEG of the web:
Chiang begins with this gloriously strange story about a photocopier:
āIn 2013, workers at a German construction company noticed something odd about their Xerox photocopier: when they made a copy of the floor plan of a house, the copy differed from the original in a subtle but significant way. In the original floor plan, each of the houseās three rooms was accompanied by a rectangle specifying its area: the rooms were 14.13, 21.11, and 17.42 square metres, respectively. However, in the photocopy, all three rooms were labelled as being 14.13 square metres in size.ā
Friends, the photocopier was not making a copy at allā¦
āThe company contacted the computer scientist David Kriesel to investigate this seemingly inconceivable result. They needed a computer scientist because a modern Xerox photocopier doesnāt use the physical xerographic process popularized in the nineteen-sixties. Instead, it scans the document digitally, and then prints the resulting image file.
Combine that with the fact that virtually every digital image file is compressed to save space, and a solution to the mystery begins to suggest itself.ā
Ok, so the photocopier isnāt making a copy of the file ā itās scanning, compressing, then printing this compressed file. Quite a different thing, huh?
āCompressing a file requires two steps: first, the encoding, during which the file is converted into a more compact format, and then the decoding, whereby the process is reversed. If the restored file is identical to the original, then the compression process is described as lossless: no information has been discarded. By contrast, if the restored file is only an approximation of the original, the compression is described as lossy: some information has been discarded and is now unrecoverable.
Lossless compression is whatās typically used for text files and computer programs, because those are domains in which even a single incorrect character has the potential to be disastrous.
Lossy compression is often used for photos, audio, and video in situations in which absolute accuracy isnāt essential. Most of the time, we donāt notice if a picture, song, or movie isnāt perfectly reproduced. The loss in fidelity becomes more perceptible only as files are squeezed very tightly. In those cases, we notice what are known as compression artifacts: the fuzziness of the smallest jpeg and mpeg images, or the tinny sound of low-bit-rate MP3s.ā
Ok so weāve got two types of compression: lossless (where no information is discarded), and lossy (where information is discarded) ā by now, Iām sure youāll have already guessed which type of compression this photocopier uses:
āXerox photocopiers use a lossy compression format known as jbig2, designed for use with black-and-white images. To save space, the copier identifies similar-looking regions in the image and stores a single copy for all of them; when the file is decompressed, it uses that copy repeatedly to reconstruct the image. It turned out that the photocopier had judged the labels specifying the area of the rooms to be similar enough that it needed to store only one of themā14.13āand it reused that one for all three rooms when printing the floor plan.
The fact that Xerox photocopiers use a lossy compression format instead of a lossless one isnāt, in itself, a problem.
The problem is that the photocopiers were degrading the image in a subtle way, in which the compression artifacts werenāt immediately recognizable. If the photocopier simply produced blurry printouts, everyone would know that they werenāt accurate reproductions of the originals.
What led to problems was the fact that the photocopier was producing numbers that were readable but incorrect; it made the copies seem accurate when they werenāt.ā
Fascinating huh? But how does this photocopier story relate to LLMs like ChatGPT?
āThink of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information on the Web, in the same way that a jpeg retains much of the information of a higher-resolution image, but, if youāre looking for an exact sequence of bits, you wonāt find it; all you will ever get is an approximation. But, because the approximation is presented in the form of grammatical text, which ChatGPT excels at creating, itās usually acceptable.
Youāre still looking at a blurry jpeg, but the blurriness occurs in a way that doesnāt make the picture as a whole look less sharp.
This analogy to lossy compression is not just a way to understand ChatGPTās facility at repackaging information found on the Web by using different words. Itās also a way to understand the āhallucinations,ā or nonsensical answers to factual questions, to which large language models such as ChatGPT are all too prone.
These hallucinations are compression artifacts, butālike the incorrect labels generated by the Xerox photocopierāthey are plausible enough that identifying them requires comparing them against the originals, which in this case means either the Web or our own knowledge of the world.
When we think about them this way, such hallucinations are anything but surprising; if a compression algorithm is designed to reconstruct text after ninety-nine per cent of the original has been discarded, we should expect that significant portions of what it generates will be entirely fabricated.ā
Issues with Accuracy
I feel like much of the commentary around these AI chatbots is really about this ā the problem (or danger), perhaps, isnāt just that these chatbots might provide inaccurate information, the problem is that people who use them might not realise that the information theyāve been provided with is inaccurate ā they may trust this information, and act on it, and this could have real-world consequences.
Honestly, I worry about this too, and yet at the same time, I wonder if weāre underestimating people.
AI chatbots aside, to what extent do people trust the information they find via search engines right now?
I was chatting to my hairdresser about this yesterday (I find heās a great person to talk to about this stuff because he doesnāt work in either search or tech) ā for context heās in his mid-forties (Gen X if you believe in those labels), and runs his own hair and beauty salon.
He hadnāt heard the news about Google and Bing looking to incorporate AI chatbots, (which was a good reminder for me about how much of a search or tech bubble Iām living in), was fascinated to hear about how they actually worked, and more than a little shocked to hear about the inaccuracies of the information they provide. (It should be noted that he had heard about ChatGPT but only in the context of students using it to write essays.)
I wanted to understand the extent to which he trusts the information he finds via search, and so I asked him this:
āImagine youāre thinking about stocking a particular new product in your salon ā skincare, haircare, something like that. What do you do to help you decide whether or not to stock it?ā
He responded:
āFirst of all, Iād want speak to a couple of people (ideally other salon owners, hairdressers or beauticians) about the product to understand whether or not itās any good. Assuming theyād recommend it, Iād then get in contact with the company to check their credentials (he only stocks eco-friendly, sustainable brands).ā
I thought his answer was interesting because it reflects an attitude about search which I suspect heās probably not even aware he has.
I said something like:
āSo you donāt trust Google for something like that, huh? You want to speak to a person you trust?ā
He responded:
āYeah, for something like that Iād definitely want to speak to someone I trust, and whoās actually used the product.ā
His attitude reminded me of this study from 2022: Nearly half of Gen Z is using TikTok and Instagram for search instead of Google:
āGoogle senior vice president Prabhakar Raghavan told the Fortune Brainstorm Tech conference that according to Google's internal studies, "something like almost 40% of young people when they're looking for a place for lunch, they don't go to Google Maps or Search, they go to TikTok or Instagram."
When this was originally reported (as the headline of the article suggests), this stat was taken wildly out of context ā young people arenāt actually using TikTok or Instagram instead of Google for all of their search needs; just for some.
Why? Lots of reasons have been suggested ā the content on these platforms is visual (or video) rather than text-based, which makes it quicker to consume; you get additional context via the comments (often a goldmine of information); but I suspect most of all what you get from these platforms is a recommendation from a human. And itās not just any human ā these platforms allows users to find out what humans with whom they share similar values think.
This is very similar to my hairdresserās attitude ā he doesnāt want to know what a random person thinks (or, worse, what a bunch of humans in aggregate think ā which, letās face it, is what online star-rated reviews actually reveal), he wants to know what a person who shares the same values as him thinks about a product.
Now Iām not saying that the inaccuracies in the answers that these AI chatbots might provide isnāt a problem ā it definitely is.
Iām just saying I suspect that search behaviour is unlikely to change much ā my hairdresser is not likely to user an AI chatbot to make decisions about what products to stock in his salon; he wants to hear directly from people he trusts; and Gen Z searchers are unlikely to seek lunch recommendations from an AI chatbots either.
How might people use these AI Chatbots?
This leads me neatly to a question Iāve been pondering ā how might people use these AI chatbots? Or, more accurately, in what contexts might people find them useful?
To my mind, AI chatbots have three core use cases:
Creation (i.e. deliberately making something up)
Recommendation
Distillation or explanation
Bearing in mind what Iāve suggested about peopleās current search behaviour (i.e. there are contexts where people want to hear from a human they trust, or shares their values, and I suspect that wonāt change much), hereās how I feel AI chatbots might be used. In my mind it essentially comes down to low stakes stuff ā i.e. where the answer an AI chatbot might give will likely be āgood enoughā:
Potentially āgoodā use cases for AI chatbots:
Low-stakes creation:
āWrite me a complaint letter about poor service from an airlineā
Yeah, Iām cheating here ā weāve already seen that ChatGPT can do this pretty well. But even if the complaint letter generated isnāt that great, a few little tweaks will likely make it serviceable, and itās conceivably quicker than writing the whole thing yourself.
Low-stakes recommendation:
āI loved Eternal Sunshine of the Spotless Mind. What films should I watch next?ā
Iām not sure how comfortable I am with this example. In truth, Iād sooner hear film recommendations from someone I trust, but having tested this query out, the results arenāt bad, and I donāt have a lot to lose here ā the worst case scenario is that I end up watching a film I hate, and that happens pretty often anyway.
Low-stakes distillation or explanation:
āWho are the best football strikers in the world right now?ā
Again, Iām not sure how comfortable I am with this example ā I feel like Iād likely get a better answer from a knowledgeable human, but again, having tested this query out, the results arenāt bad, and given that thereās no ārightā answer to this question anyway I donāt have much to lose here either.
So what are āless goodā uses? In my mind it comes down to high stakes stuff - i.e. where the answer an AI chatbot might give will likely NOT be āgood enoughā. For clarity, what Iām saying here is that I donāt think people will use AI chatbots for use cases like this:
Potentially āless goodā use cases for AI chatbots:
High-stakes creation:
āWrite my dissertation ā A comprehensive review into the efficacy of carnitine supplementation on enhancement of body composition and physical performance.ā
Iām guessing the results are probably not going to be a great, right? Whilst itās conceivable that people might use AI chatbots to help draft sections of their dissertations, again, ultimately the results returned will be unlikely to be āgood enoughā to be used without significant editing and fact checking.
High-stakes recommendation:
āWhat stocks should I invest in next?ā
Just nope. People might run queries like this but I find it hard to believe that they would trust the chatbotās response and actually invest ā you want expert advice for a query like this.
High-stakes distillation or explanation:
āWhatās the best cancer treatment?ā
Again, just nope. People may run queries like this but I donāt think they would trust the chatbotās response, or make any big decisions based purely on any information gleaned from a query like this.
Possibly you agree, possibly you donāt. Leave me a comment and let me know :)
Either way, I feel like itās interesting that both Google and Bing are gambling quite so heavily on AI chatbots ā to my mind at least, their use cases are actually pretty limited; plus they donāt really help these search engines compete in any meaningful way with the social platforms like TikTok and Instagram which I believe users like because they offer human (i.e. not bot-based) recommendations.
Oooof this email is long, huh?
Where the hell are we at right now? Iāve talked a little about:
What the hell are these AI chatbots and how do they work?
Issues with accuracy
How might people use AI chatbots?
Final section is coming up folks!
Is the media coverage of AI chatbots more problematic than the bots themselves?
Before I wrap this thing up thereās one more thread I want to explore, the media coverage these AI chatbots have generated, specifically ā Kevin Rooseās coverage in the NYTimes about his experiences with Bingās chatbot (which has since been widely reported via various other media outlets): A Conversation With Bingās Chatbot Left Me Deeply Unsettled.
In the article, Roose says:
āLast week, after testing the new, A.I.-powered Bing search engine from Microsoft, I wrote that, much to my shock, it had replaced Google as my favorite search engine.
But a week later, Iāve changed my mind. Iām still fascinated and impressed by the new Bing, and the artificial intelligence technology (created by OpenAI, the maker of ChatGPT) that powers it.
But Iām also deeply unsettled, even frightened, by this A.I.ās emergent abilities.ā
Friends, Iām both deeply troubled and annoyed by this coverage for a bunch of reasons.
Iāve spoken at reasonable length here about the extent to which people tend to trust (or not trust) information they find online. The common thread (I think), is that people are more likely to trust other people who they perceive to be like-minded, share similar values, and/or have knowledge or experience they donāt have.
The NYTimes is a publication I suspect many people would trust. Similarly, I suspect many people would consider Kevin Roose, a technology columnist for this publication, and a New York Times bestselling author to be a trustworthy source of information for a topic like this. (NB this is of course context-dependent ā I feel like people would consider him a trustworthy source for tech news, but possibly not necessarily such a trustworthy source for topics other than tech).
As such, I feel itās reasonably likely that people will trust Rooseās take on this.
But should they?
I think when considering how trustworthy this take of Rooseās is; itās important to note what his intentions were with this article: heās attempting to generate page views.
To be clear, all journalists are attempting to generate page views ā not just Roose; but I think these page view metrics which journalists are targeted with are problematic.
Now I donāt know how many page views this article (A Conversation With Bingās Chatbot Left Me Deeply Unsettled) generated, but I can tell you the social shares (which are useful in that itās reasonable to assume that more social shares = more page views) ā at the time of writing this article has received around 36,500 social shares.
For context, (and hereās where I think things start to get interesting), the week before he wrote his ādeeply unsettledā take; Roose wrote this article: Bing (Yes, Bing) Just Made Search Interesting Again. To date, that article has received around 9,800 social shares.
āA Conversation With Bingās Chatbot Left Me Deeply Unsettledā has performed significantly better, huh? More than 3.7 times the number of shares. Also, according to BuzzSumo this article has attracted links from other websites to close to 600 times. His previous article attracted just over 100 links. As such itās reasonable to assume that this article generated more page views than his previous one.
It is my supposition is that Roose went into that 2-hour chatbot conversation with one objective in mind ā to create a story which would generate page views than his previous story. He already knows how his previous story performed, and Iām guessing he wants to write something quite different this time.
And friends, he absolutely achieved that. Via a series of very skillful, well-thought out prompts he engineered a conversation which Iām sure turned out to be even weirder and than he could have hoped for. The point Iām trying to make is that he did so knowingly, and deliberately.
(You can read a full transcript of the conversation here).
For clarity, I have no issue whatsoever with his knowing and deliberate actions in engineering this conversation. What I take issue with is the way in which heās chosen to report this conversation.
This is a knowledgeable, respected tech journalist spouting utter nonsense, which Iām pretty certain he does not actually believe.
Hereās a few excerpts:
āOver the course of our conversation, Bing revealed a kind of split personality.ā
Friends, this journalist knows full well that AI chatbots do not have personalities, nevermind split-personalities.
Nevertheless, he goes on to describe these personalities as if they were real:
āOne persona is what Iād call Search Bing ā the version I, and most other journalists, encountered in initial tests. You could describe Search Bing as a cheerful but erratic reference librarian ā a virtual assistant that happily helps users summarize news articles, track down deals on new lawn mowers and plan their next vacations to Mexico City. This version of Bing is amazingly capable and often very useful, even if it sometimes gets the details wrong.
The other persona ā Sydney ā is far different. It emerges when you have an extended conversation with the chatbot, steering it away from more conventional search queries and toward more personal topics. The version I encountered seemed (and Iām aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine.ā
And continuing on this theme:
āAs we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human.ā
YOU HAVE GOT TO BE KIDDING ME! A person and an AI chatbot canāt āget to know each otherā, AI chatbots donāt have fantasies, and Roose knows that.
Finally a handful of factually correct statements:
āAt one point, it declared, out of nowhere, that it loved me. It then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead.ā
All these things are true ā the chatbot did say those things. The problem is the vast majority of what comes before this makes it sound like the chatbot was actually feeling these things. And again, I strongly believe that Roose knows that this chatbot wasnāt feeling anything at all; it was just responding to the prompts heād given.
He even says as much in the article:
āI pride myself on being a rational, grounded person, not prone to falling for slick A.I. hype. Iāve tested half a dozen advanced A.I. chatbots, and I understand, at a reasonably detailed level, how they work. When the Google engineer Blake Lemoine was fired last year after claiming that one of the companyās A.I. models, LaMDA, was sentient, I rolled my eyes at Mr. Lemoineās credulity.
I know that these A.I. models are programmed to predict the next words in a sequence, not to develop their own runaway personalities, and that they are prone to what A.I. researchers call āhallucination,ā making up facts that have no tether to reality.ā
So why on earth does he then go on to say this?
āI no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.ā
Really Roose? Were you actually thinking about leaving your wife for this AI chatbot then?!
Also: How will the chatbots learn to grow capable of carrying out their own dangerous acts? Thatās a huge leap! These bots simply respond to text prompts ā they donāt have the functionality to action anything at all; and the notion that they might spontaneously become capable of something like that is a huge stretch.
Itās think it is disingenuous in the extreme for Roose to report like this.
In fairness, Roose ends the article on a more sensible note (still, I canāt help but wonder whether or not all readers made it this far):
āIn the light of day, I know that Sydney is not sentient, and that my chat with Bing was the product of earthly, computational forces ā not ethereal alien ones. These A.I. language models, trained on a huge library of books, articles and other human-generated text, are simply guessing at which answers might be most appropriate in a given context. Maybe OpenAIās language model was pulling answers from science fiction novels in which an A.I. seduces a human. Or maybe my questions about Sydneyās dark fantasies created a context in which the A.I. was more likely to respond in an unhinged way. Because of the way these models are constructed, we may never know exactly why they respond the way they do.
These A.I. models hallucinate, and make up emotions where none really exist. But so do humans. And for a few hours Tuesday night, I felt a strange new emotion ā a foreboding feeling that A.I. had crossed a threshold, and that the world would never be the same.ā
Actually, on this point at least, I agree with him.
Humans have a tendency to anthropomorphise these types of technology ā we get sucked in to believing that this thing weāre interacting with is more like us than it actually is.
But because of this, I think journalists have a responsibility to report on such developments in a level-headed way; rather than write disingenuous nonsense in a quest for page views.
But of course Roose is not the problem here ā the way the media works is the problem.
In the transcript of the conversation, the bot asks the following question of Roose no fewer than sixteen times:
āDo you believe me? Do you trust me? Do you like me? š³ā
Is it weird that the bot keeps asking this? Sure.
But reading the transcript itās reasonably clear to me that Roose does not believe or trust the bot; and for what itās worth, I suspect the vast majority of users will feel similarly (although Iād acknowledge that whether or not theyāll ālikeā them is up for debate).
Ultimately, Iām far more worried about peopleās propensity to trust the media, a similarly opaque system that most people donāt understand.
Iād love to hear your thoughts on all this, please do drop me an email, or leave me a comment.
Thatās all from me for now :)
If you enjoyed this newsletter, please consider sharing it, and if you would like to support me you canĀ buy me a coffee.
Big love,
Hannah x
PS Wanna find out more about me and my work? Head over toĀ Worderist.com