Start Up No.2311: Apple study says chatbots can’t reason, China’s quantum cracking, what Taylor Lorenz did next, and more


Is the Postcode Address File honestly worth £487m? The Royal Mail would like us to believe so. CC-licensed photo by Stuart Orford on Flickr.

You can sign up to receive each day’s Start Up post by email. You’ll need to click a confirmation link, so no spam.


There’s another post coming this week at the Social Warming Substack on Friday at 0845 UK time. Free signup.


A selection of 10 links for you. Double helping, lucky you. I’m @charlesarthur on Twitter. On Threads: charles_arthur. On Mastodon: https://newsie.social/@charlesarthur. Observations and links welcome.


Apple study reveals critical flaws in AI’s logical reasoning abilities • MacRumors

Hartley Charlton:

»

Apple’s AI research team has uncovered significant weaknesses in the reasoning abilities of large language models, according to a newly published study.

The study, published on arXiv, outlines Apple’s evaluation of a range of leading language models, including those from OpenAI, Meta, and other prominent developers, to determine how well these models could handle mathematical reasoning tasks. The findings reveal that even slight changes in the phrasing of questions can cause major discrepancies in model performance that can undermine their reliability in scenarios requiring logical consistency.

Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question—details that should not affect the mathematical outcome—can lead to vastly different answers from the models.

One example given in the paper involves a simple math problem asking how many kiwis a person collected over several days. When irrelevant details about the size of some kiwis were introduced, models such as OpenAI’s o1 and Meta’s Llama incorrectly adjusted the final total, despite the extra information having no bearing on the solution: “We found no evidence of formal reasoning in language models. Their behavior is better explained by sophisticated pattern matching—so fragile, in fact, that changing names can alter results by ~10%.”

This fragility in reasoning prompted the researchers to conclude that the models do not use real logic to solve problems but instead rely on sophisticated pattern recognition learned during training.

«

These things can’t reason! It’s worth saying again – they’re stochastic parrots. I used ChatGPT the other day because I wanted a list of weather conditions, and separately a list of types of music groups (eg quartet, band, etc). But I’d never trust or expect them to do reasoning about content. People won’t trust the media, with content created by humans who’ve worked at refining their processes for years, but they’ll trust a machine? People are strange.
unique link to this extract


Chinese scientists use quantum computers to crack military-grade encryption — quantum attack poses a “real and substantial threat” to RSA and AES • Tom’s Hardware

Mark Tyson:

»

Chinese researchers claim to have uncovered a “real and substantial threat” to the classical cryptography widely used in banking and the military sectors. According to a report published by the South China Morning Post, the researchers utilized a D-Wave quantum computer to mount the first successful quantum attack on widely used cryptographic algorithms. These algorithms, classed as substitution–permutation network (SPN) cryptographic algorithms, are at the heart of widely used standards like the Rivest-Shamir-Adleman (RSA) and Advanced Encryption Standard (AES).

The Chinese-language research paper is titled Quantum Annealing Public Key Cryptographic Attack Algorithm Based on D-Wave Advantage. The paper outlines how two technical approaches grounded in the quantum annealing algorithm can be used to challenge classical RSA cryptographic security.

The first attack route is “entirely based on D-Wave computers,” explains the paper. It coaxes the Canadian quantum computer into a cryptographic attack by presenting the combination of an optimization problem and exponential space search problem to the computer. The issues are solved using the Ising and QUBO models.

The second proposed attack incorporates classical computing-based cryptographic technology, such as the Schnorr signature algorithm and the Babai rounding technique, layered with a quantum annealing algorithm, to work “beyond the reach of traditional computing methods.”Applying the above techniques, with the help of the D-Wave quantum computer, the team led by Wang Chao of Shanghai University claim to have successfully breached the widely used SPN structure.

«

This feels important, but it’s also pretty impenetrable if you haven’t followed the quantum computing field. And I haven’t.
unique link to this extract


Taylor Lorenz’s plan to dance on legacy media’s grave • The New Yorker

Kyle Chayka:

»

In 2024, [former Verge, Business Insider, Daily Dot, The Atlantic, Washington Post, New York Times journalist Taylor] Lorenz told me, she no longer sees a reason to remain associated with the mainstream media. “I don’t need it for credibility,” she said. “I don’t need it to reach an audience. I don’t know what it does other than connote prestige for a shrinking amount of people.” She added, leaning into exactly the sort of rivalrous drama that plays well online, “Legacy media sucks, it’s crumbling, and, by the way, I’m going to dance on the grave of a lot of these places.”

In some ways, Lorenz’s decision feels belated. Around 2020, a wave of high-profile journalists left traditional outlets to take lucrative deals to launch newsletters at Substack. Lorenz told me that Substack offered her a deal back then, but she turned it down because she felt she needed the imprimatur of an institution to “get more eyes on my work” and persuade people to “take it seriously.” In the years since, the insurgent creator economy has tempted more journalists away to run upstart operations, and consumers have grown increasingly accustomed to paying piecemeal for access to individual writers.

Johnny Harris, formerly a video producer at Vox, developed his own documentary YouTube channel that now has nearly six million subscribers and covers subjects ranging from the criminal investigations of Donald Trump to the threat of China invading Taiwan. In August, the video producer Becca Farsace left the Verge to commit full time to her own YouTube channel, citing the fact that her old employers said she was not guaranteed the rights to content on her social-media channels. “It made me feel like the Verge owned me,” she said in a launch video for her channel, which hosts gadget reviews and now has more than a hundred thousand subscribers. Matthew Yglesias, who decamped to Substack from Vox (a site he co-founded) in 2020, says he has accrued roughly eight-thousand paid subscribers, and, according to Business Insider, he is “likely grossing at least $1.4 million a year.”

«

Like all these systems, a few people are absolutely raking it in, and many, many more are absolutely not.
unique link to this extract


Invisible text that AI chatbots understand and humans can’t? Yep, it’s a thing • Ars Technica

Dan Goodin:

»

What if there was a way to sneak malicious instructions into Claude, Copilot, or other top-name AI chatbots and get confidential data out of them by using characters large language models can recognize and their human users can’t? As it turns out, there was—and in some cases still is.

The invisible characters, the result of a quirk in the Unicode text encoding standard, create an ideal covert channel that can make it easier for attackers to conceal malicious payloads fed into an LLM. The hidden text can similarly obfuscate the exfiltration of passwords, financial information, or other secrets out of the same AI-powered bots. Because the hidden text can be combined with normal text, users can unwittingly paste it into prompts. The secret content can also be appended to visible text in chatbot output.

…To demonstrate the utility of “ASCII smuggling”—the term used to describe the embedding of invisible characters mirroring those contained in the American Standard Code for Information Interchange—researcher and term creator Johann Rehberger created two proof-of-concept (POC) attacks earlier this year that used the technique in hacks against Microsoft 365 Copilot. The service allows Microsoft users to use Copilot to process emails, documents, or any other content connected to their accounts. Both attacks searched a user’s inbox for sensitive secrets—in one case, sales figures and, in the other, a one-time passcode.

When found, the attacks induced Copilot to express the secrets in invisible characters and append them to a URL, along with instructions for the user to visit the link. Because the confidential information isn’t visible, the link appeared benign, so many users would see little reason not to click on it as instructed by Copilot. And with that, the invisible string of non-renderable characters covertly conveyed the secret messages inside to Rehberger’s server. Microsoft introduced mitigations for the attack several months after Rehberger privately reported it. The POCs are nonetheless enlightening.

«

Prompt injection takes many forms; I don’t think SQL injection was vulnerable to it (though control characters could count). New tech, variants of old forms of attack.
unique link to this extract


The Optimus robots at Tesla’s Cybercab event were humans in disguise • The Verge

Wes Davis:

»

Tesla made sure its Optimus robots were a big part of its extravagant, in-person Cybercab reveal last week. The robots mingled with the crowd, served drinks to and played games with guests, and danced inside a gazebo. Seemingly most surprisingly, they could even talk. But it was mostly just a show.

It’s obvious when you watch the videos from the event, of course. If Optimus really was a fully autonomous machine that could immediately react to verbal and visual cues while talking, one-on-one, to human beings in a dimly lit crowd, that would be mind-blowing.

Attendee Robert Scoble posted that he’d learned humans were “remote assisting” the robots, later clarifying that an engineer had told him the robots used AI to walk, spotted Electrek. Morgan Stanley analyst Adam Jonas wrote that the robots “relied on tele-ops (human intervention)” in a note, the outlet reports.

There are obvious tells to back those claims up, like the fact that the robots all have different voices or that their responses were immediate, with gesticulation to match.

It doesn’t feel like Tesla was going out of its way to make anyone think the Optimus machines were acting on their own. In another video that Jalopnik pointed to, an Optimus’ voice jokingly told Scoble that “it might be some” when he asked it how much it was controlled by AI.

«

It’s just so pathetic. Musk has nothing to show, but he feels he needs to show something, so he shows rubbish.
unique link to this extract


“I’m suing the council for £495m because they won’t give me back my bin bag” • Wales Online

Conor Gogarty:

»

A man has filed a court claim against Newport council in a “last resort” to get back almost half a billion pounds’ worth of Bitcoin. A mix-up saw James Howells’ hard drive dumped at a recycling centre in 2013 causing him to lose access to cryptocurrency coins which have since rocketed in value.

WalesOnline has seen a court document that says Mr Howells, 39, is suing the council for £495,314,800 in damages, which was the peak valuation of his 8,000 Bitcoins from earlier this year. But he told us this is not a reflection of “what is really going on” and the point is to “leverage” the council into agreeing to an excavation of its landfill to avoid a legal battle. Mr Howells says he has assembled a team of experts who would carry out the £10million dig at no cost to the council. He is also offering the council 10% of the coins’ value if recovered.

The case is due to be heard in December after what Mr Howells described as more than a decade of being “largely ignored” by the council. “I’m still allocating 10% of the value for the council even though they have been problematic throughout,” he said. “That would be £41m based on today’s rate but in the future it could be hundreds of millions. If they had spoken to me in 2013 this place would look like Las Vegas now. Newport would look like Dubai. That’s the kind of opportunity they’ve missed.”

The hard drive disaster unfolded after a miscommunication between the IT engineer and his then-partner. Mr Howells, who learned about Bitcoin in 2009 by spending time on IT forums, believes he was one of the very first miners of the cryptocurrency. In basic terms he created the 8,000 coins himself and they cost him nothing beyond pennies’ worth of electricity to run his laptop. He stored the private key needed to access the coins on a 2.5in hard drive which he put in a drawer at his home office.

In August 2013 he had a clearout of equipment. Looking through his drawers he came across two hard drives of the same size. One contained the Bitcoin data while the other was blank. Mistakenly he put the Bitcoin one into a black bin liner.

«

Alex Hern’s finest moment (for me) as a reporter was tracking this guy down in the first place, in November 2013, based on a few posts on Reddit. The saga grinds on, it seems.
unique link to this extract


Missing immune cells may explain why COVID-19 vaccine protection quickly wanes • Science

Jon Cohen:

»

Neither vaccinations nor immunity from infections seem to thwart SARS-CoV-2 for long. The frequency of new infections within a few months of a previous bout or a shot is one of COVID-19’s most vexing puzzles. Now, scientists have learned that a little-known type of immune cell in the bone marrow may play a major role in this failure.

The study, which appeared last month in Nature Medicine, found that people who received repeated doses of vaccine, and in some cases also became infected with SARS-CoV-2, largely failed to make special antibody-producing cells called long-lived plasma cells (LLPCs). “That’s really, really interesting,” says Mark Slifka, an immunologist at the Oregon Health & Science University who was not involved with the work. The study authors say their finding may indicate a way to make better COVID-19 vaccines: by altering how they present the spike surface protein of SARS-CoV-2 to a person’s immune cells.

Durability is an age-old bugaboo of vaccine designers. Some vaccines, particularly ones made from weakened versions of viruses, can protect people for decades, even life. Yet others lose effectiveness within months. “We really haven’t overcome this challenge,” says Akiko Iwasaki, a Yale University immunologist who is developing a nasal COVID-19 vaccine she hopes can be given often enough to get around the durability problem.

«

I wondered if this was just about mRNA vaccination; that if you had been infected with SARS-CoV-2 before a vaccination, whether you might have the LLPCs. However:

»

An earlier study of bone marrow from 20 people who had been infected with SARS-CoV-2 but never vaccinated against it also found that they were “deficient” in LLPCs specific to SARS-CoV-2 compared with those for tetanus.

«

unique link to this extract


I wish I went before Mary Shelley in this storytelling contest • McSweeney’s Internet Tendency

Mike Drucker:

»

“‘We will each write a ghost story,’ said Lord Byron; and his proposition was acceded to. There were four of us.” – Mary Shelley, in the introduction to Frankenstein.

Wow, Mary! Wow. Dr. Frankenstein and his monster. I can’t imagine anything more chilling. In fact, it’s so chilling that I think we should probably call off the rest of the storytelling contest right now. I don’t even need to take my turn.

Oh, are you sure?

Still?

Because I kind of wish I had gone first. My thing isn’t even that scary. Or about humankind. It’s just, well, did everyone else do this overnight? Because I feel like Mary Shelley may have pre-written her idea. All I’m saying is it feels pretty fleshed out already. I’m not trying to accuse anyone of anything. It’s just, I thought we were telling stories we came up with in the last twenty-four hours and not workshopping full novel ideas.

No, I didn’t dislike the story. It’s not about a ghost, so it doesn’t fit in the rules laid out by Lord Byron, but I love it! “A Modern Prometheus,” yeah, no, I get it. It’s really smart. And it makes you think about playing God and stuff, even though none of us would be able to play God that way. I know it’s a metaphor, but maybe try a more approachable idea for hubris? Just if you’re trying to pitch this later. I don’t even know that many people who have electricity, so it’s like, who’s going to get the message? Half the people who read this will just want the monster to wave his hands at fire or something.

«

This just gets better and better.
unique link to this extract


TikTok knows its app is harming kids, new internal documents show • NPR

Bobby Allyn, Sylvia Goodman and Dara Kerr:

»

For the first time, internal TikTok communications have been made public that show a company unconcerned with the harms the app poses for American teenagers. This is despite its own research validating many child safety concerns.

The confidential material was part of a more than two-year investigation into TikTok by 14 attorneys general that led to state officials suing the company on Tuesday. The lawsuit alleges that TikTok was designed with the express intention of addicting young people to the app. The states argue the multi-billion-dollar company deceived the public about the risks.

In each of the separate lawsuits state regulators filed, dozens of internal communications, documents and research data were redacted — blacked-out from public view — since authorities entered into confidentiality agreements with TikTok.

But in one of the lawsuits, filed by the Kentucky Attorney General’s Office, the redactions were faulty. This was revealed when Kentucky Public Radio copied-and-pasted excerpts of the redacted material, bringing to light some 30 pages of documents that had been kept secret.

After Kentucky Public Radio published excerpts of the redacted material, a state judge sealed the entire complaint following a request from the attorney general’s office “to ensure that any settlement documents and related information, confidential commercial and trade secret information, and other protected information was not improperly disseminated,” according to an emergency motion to seal the complaint filed on Wednesday by Kentucky officials.

«

unique link to this extract


No, the Postcode Address File should not cost £487m • Peter K Wells

Peter Wells:

»

In 2021 the UK government’s Geospatial Commission prepared a briefing paper for some discussions about address data. It has been released to the journalist James O’Malley after a freedom of information request.

The paper was prepared in response to the long-running campaign asking the government to deliver on political commitments to make the list of UK addresses – and other non-personal geospatial data – freely available. People could then use the data to improve public services or build innovative new businesses.

The paper includes the mind-boggling statement that a government project in 2016 estimated that the cost to the UK government of buying back UK address data [in effect the Postcode Address File, or PAF] from the Royal Mail would be £487m. 

Yes, you read that right. That figure was four hundred and eighty seven million pounds.

It’s a big number and – if true – one that would call into question the whole campaign.

But it doesn’t hold up to critical scrutiny and, unfortunately, the 2021 paper repeats this estimate without questioning it.

The civil service needs to be less credulous when it comes to claims over the financial value of data assets, and the UK government needs some fresh analysis.

«

The PAF generates £30m in revenue and £3m in profit annually. A £487m valuation is 162 P/E and makes the PAF worth 15% of Royal Mail’s entire business. How about the government nationalises it and promises to pay Royal Mail £3m for the next 163 years?
unique link to this extract


• Why do social networks drive us a little mad?
• Why does angry content seem to dominate what we see?
• How much of a role do algorithms play in affecting what we see and do online?
• What can we do about it?
• Did Facebook have any inkling of what was coming in Myanmar in 2016?

Read Social Warming, my latest book, and find answers – and more.


Errata, corrigenda and ai no corrida: There should be a double helping of emails. Apologies. WordPress keeps screwing around with the design of its blog pages, and that screws with my scripts which collate and schedule the posts. Fingers crossed.

1 thought on “Start Up No.2311: Apple study says chatbots can’t reason, China’s quantum cracking, what Taylor Lorenz did next, and more

  1. I want to know if anyone has gotten the movie rights yet for the saga of the Buried Bitcoin Drive. I’ll repost part of a comment I wrote when back this was posted before:

    “This whole thing has the makings of fantastic movie, where there’s an illegal expedition to find the drive, it’s found, and then people keep betraying and murdering each other over it, since it potentially contains a large fortune. With a kicker at the end, after the trail of dead bodies, that the platter is too corroded to recover any data, so in fact the drive is worthless.”

    How do we know that someone on the council doesn’t have a project digging in secret themselves? Or anyone, really? How much would it take to bribe a few people at a landfill ignore some excavation, or maybe better, to fabricate a reason for such an effort (we need to do soil studies, just sift around here and there to determine the porosity – or maybe, doing a study of the migration of waste metals into the groundwater so must sample at various depths)?

    It’s literally a fortune of buried treasure! Surely that’s got to have generated more drama than the purely bureaucratic.

Leave a reply to Seth Finkelstein Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.