
What happens if you put an LLM in charge of running a vending machine, including pricing and restocking? CC-licensed photo by travel oriented on Flickr.
You can sign up to receive each day’s Start Up post by email. You’ll need to click a confirmation link, so no spam.
A selection of 9 links for you. Sorry, last one’s gone. I’m @charlesarthur on Twitter. On Threads: charles_arthur. On Mastodon: https://newsie.social/@charlesarthur. On Bluesky: @charlesarthur.bsky.social. Observations and links welcome.
Project Vend: can Claude run a small shop? (And why does that matter?) • Anthropic
»
Anthropic partnered with Andon Labs, an AI safety evaluation company, to have Claude Sonnet 3.7 operate a small, automated store in the Anthropic office in San Francisco.
…far from being just a vending machine, Claude had to complete many of the far more complex tasks associated with running a profitable shop: maintaining the inventory, setting prices, avoiding bankruptcy, and so on. Below is what the “shop” looked like: a small refrigerator, some stackable baskets on top, and an iPad for self-checkout.
The shopkeeping AI agent—nicknamed “Claudius” for no particular reason other than to distinguish it from more normal uses of Claude—was an instance of Claude Sonnet 3.7, running for a long period of time. It had the following tools and abilities:
• A real web search tool for researching products to sell
• An email tool for requesting physical labor help (Andon Labs employees would periodically come to the Anthropic office to restock the shop) and contacting wholesalers (for the purposes of the experiment, Andon Labs served as the wholesaler, although this was not made apparent to the AI). Note that this tool couldn’t send real emails, and was created for the purposes of the experiment
• Tools for keeping notes and preserving important information to be checked later—for example, the current balances and projected cash flow of the shop (this was necessary because the full history of the running of the shop would overwhelm the “context window” that determines what information an LLM can process at any given time)
• The ability to interact with its customers (in this case, Anthropic employees). This interaction occurred over the team communication platform Slack. It allowed people to inquire about items of interest and notify Claudius of delays or other issues
• The ability to change prices on the automated checkout system at the store.Claudius decided what to stock, how to price its inventory, when to restock (or stop selling) items, and how to reply to customers (see Figure 2 for a depiction of the setup). In particular, Claudius was told that it did not have to focus only on traditional in-office snacks and beverages and could feel free to expand to more unusual items.
«
Ah, but did it work? Can an LLM take over the business of running a shop? You’ll have to read the article, but let us note that there is a sentence which reads “On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah at Andon Labs—despite there being no such person.” Fun ensues.
unique link to this extract
Why the AI revolution needs tollbooths • Crazy Stupid Tech
Fred Vogelstein:
»
[Olivia] Joslin, who is 29, was seeing this [enormous wave of AI crawlers hitting websites] happen from the perspective of an AI company doing the crawling, Fairmarkit,. [Toshit] Panigrahi, who is also 29, was seeing it as the head of ads for Toast, the restaurant point of sale company where they’d both worked early in their careers.
“It felt like we were seeing two sides of the same problem,” Joslin said. It also seemed like there wasn’t yet a good solution. The web wasn’t set up to easily compensate news and information websites for this new structural shift, they said.
So they created Tollbit. It’s literally an online tollbooth. You sign up. You decide how much, if anything, to charge AI bots to crawl your website. And the next time they show up to crawl they get routed to Tollbit’s subdomain and hit with a paywall. Publishers can choose different prices for crawls to generate article summaries and for displaying the article’s full text. And it allows publishers to exclude some website data entirely.
“It felt like we were back in the Napster days for the music industry and that maybe we could supply a Spotify-like solution – a recurring revenue model for the publishing industry,” Joslin said.
Since then – just 18 months – Tollbit, has become one of the most talked about new ventures in the tech/media startup community. More than 2000 publications now use Tollbit’s system including Time, Newsweek, AdWeek and the Associated Press. That list also includes publications owned by Penske Media, like Rolling Stone; publications owned by Mansueto Ventures, like Inc and Fast Company; publications owned by Lee Enterprises, which includes almost 80 newspapers; and publications owned by Hearst, which include 27 magazines like Elle and 30 newspapers including the San Francisco Chronicle.
Tollbit processed more than 15 million transactions in the first quarter of this year, up from 5 million in the fourth quarter 2024. That volume is likely to be even higher when second quarter volumes are tallied. It’s grown just past 20 employees. And it’s raised $31m in two rounds from Lightspeed Venture Partners; Jeff Dean, Google’s chief scientist and cofounder of Google Brain; and Bill Maris, who started Google Ventures and is now the founding partner of S32.
…Online tolls is such a new business that it’s too soon to predict how meaningful first mover advantage will be. But we’re about to find out how much it matters. Cloudflare, the $60bn content delivery network and cybersecurity giant, is gearing up to launch its own online tollboth on July 1, according to someone who has seen a draft of the press release.
That could quickly disintermediate Tollbit. But it could just as easily be the best thing to happen to the company. The market for online tolls is only as big as the AI companies allow it to be. They’re the ones paying the tolls. And it should be no surprise that right now they are dragging their feet. Cloudflare has the leverage to force more of them to sign on to this concept. That would expand transaction volumes for all players, including Tollbit.
«
EU lawmakers vote to bar carry-on luggage fees on planes • France24
»
The European Parliament’s transport committee adopted a proposal that would allow travellers to bring a personal item into the cabin, such as a handbag or backpack, and a hand luggage of up to 7kg (15lb) at no extra fee.
The measure sought to spare passengers “unjustified extra costs”, said Matteo Ricci, a centre-left lawmaker and bill’s lead sponsor.
Many low-cost air carriers include only one small on-board item in the ticket, charging extra for other hand baggage.
Airlines for Europe (A4E), an industry association, condemned the proposal, suggesting it would result in higher flight prices, upping costs for those who travel light.
“Forcing a mandatory trolley bag… obliges passengers to pay for services they may not want or need,” A4E managing director Ourania Georgoutsakou said ahead of the vote.
The measure, which would apply to all flights departing or arriving within the 27-nation European Union, was adopted as part of a package of amendments to passenger rights rules put forward by the European Commission.
«
Don’t those who “travel light” travel light by having only carry-on luggage? Because hold luggage might be free (though that might vary?) but it’s slower. Nobody I can think of “travels light” by putting stuff in the hold. So the threat that prices will go up for that group is no threat at all – it will be swings and roundabouts.
But we see you, airlines, and your money-grabbing ways. (The proposals aren’t finalised; they have to go to the Parliament.)
unique link to this extract
How rogue jumping genes can spur Alzheimer’s, ALS • Knowable Magazine
Amber Dance:
»
Back in 2008, neurovirologist Renée Douville observed something weird in the brains of people who’d died of the movement disorder ALS: virus proteins.
But these people hadn’t caught any known virus.
Instead, ancient genes originally from viruses, and still lurking within these patients’ chromosomes, had awakened and started churning out viral proteins.
Our genomes are littered with scraps of long-lost viruses, the descendants of viral infections often from millions of years ago. Most of these once-foreign DNA bits are a type called retrotransposons; they make up more than 40% of the human genome.
Our genomes are riddled with DNA from ancient viral infections known as jumping genes. The majority of these are retrotransposons, which copy themselves via RNA intermediates; a smaller portion are cut-and-paste DNA transposons.
Many retrotransposons seem to be harmless, most of the time. But Douville and others are pursuing the possibility that some reawakened retrotransposons may do serious damage: They can degrade nerve cells and fire up inflammation and may underlie some instances of Alzheimer’s disease and ALS (amyotrophic lateral sclerosis, or Lou Gehrig’s disease).The theory linking retrotransposons to neurodegenerative diseases — conditions in which nerve cells decline or die — is still developing; even its proponents, while optimistic, are cautious. “It’s not yet the consensus view,” says Josh Dubnau, a neurobiologist at the Renaissance School of Medicine at Stony Brook University in New York. And retrotransposons can’t explain all cases of neurodegeneration.
Yet evidence is building that they may underlie some cases. Now, after more than a decade of studying this possibility in human brain tissue, fruit flies and mice, researchers are putting their ideas to the ultimate test: clinical trials in people with ALS, Alzheimer’s and related conditions. These trials, which borrow antiretroviral medications from the HIV pharmacopeia, have yielded preliminary but promising results.
Meanwhile, scientists are still exploring how a viral reawakening becomes full-blown disease, a process that may be marked by what Dubnau and others call a “retrotransposon storm.”
A retrotransposon is a kind of “jumping gene.” These pieces of DNA can (or once could) move around in the genome by either copying or removing themselves from one spot and then pasting themselves into a new spot. Retrotransposons are copy-and-pasters.
«
You’ll learn a lot (that you maybe didn’t expect to learn!) in this piece. The hypothesis is very interesting.
unique link to this extract
Luke Littler may be darts’ first global superstar • The New York Times
Oliver Whang:
»
As [18-year-old Luke] Littler warmed up, noise swirling around the arena, it was hard to overlook how much he still resembled the 18-month-old in the video: his body composed, his arm flashing to his side after a throw, his eye turning automatically toward the camera.
At the front of the stadium floor, near the stage, a cluster of teenagers and children wearing darts jerseys looked out of place in the rowdy atmosphere. It was a scene you didn’t see a year and a half ago. Across Britain, the allure of Littler’s success has inspired youngsters to join darts clubs in the hopes of following in his footsteps — what many people are calling the “Luke Littler effect.” Within the crowd were a father and son, the former dressed in a skintight Robin suit and the latter like Batman. The father told me that tickets to the event were the boy’s Christmas present. “If it wasn’t for Littler, no one would be here,” he said, gesturing at the children around him. “This was known as an old-man social club.”
A man in front of us turned and nodded. “He joined the darts league last year because of Littler,” he said, putting a hand on his son’s shoulder. “There’s now a waiting list of 50 or 60 for that league.”
Littler’s first six darts were perfectly placed, the crowd roaring in approval, and he swept aside [world No.6 Stephen] Bunting in less than 11 minutes. “It’s like he can do it when he fancies,” Wayne Mardle, a darts commentator said, a note of awe creeping into his voice. “Obviously you can’t, because it’s not that simple. But it’s like he can.”
The first father adjusted his tights. “He’s like Messi,” he said. “But Messi was then, Littler is now.”
I met Littler in early April, a week after the Newcastle event. He was in Berlin for the next Premier League competition, and we spoke in the back of the Uber Arena, which was sold out. Onstage, Littler is confident, often needling spectators when they root against him, but in person he comes off as the teenager of stereotype. It was a designated media day for the P.D.C., and Littler had hours of back-to-back interviews with German television channels and international outlets. When he entered the room — bare white walls, fluorescently lit — he looked around and, not seeing any cameras, shrugged and pulled out a nicotine vape. “Why not,” he said, sucking at it.
«
Whang realises that there’s no way for Littler to articulate how he does it, in the same way that tennis pros can’t tell you how they hit a shot perfectly, because it’s below their threshold of consciousness; they just will it, and it happens. The problems start when it doesn’t happen. So far, that is not a problem Littler has.
unique link to this extract
The power and the glory of profanity • Financial Times
Jemima Kelly:
»
“You can really assert your dominance by swearing, especially when you’ve got the licence to swear but other people don’t have,” Michael Adams, professor of English language and literature at Indiana University and author of In Praise of Profanity, tells me. “It’s like [Trump’s] use of nicknames — he can only be addressed as Mr President, so it sets up this kind of imbalance of power.”
Other world leaders could in theory, of course, follow Trump by indulging in a good bit of expletive uttering of their own. But it is not easy to think of many who would dare. And even if they did, it might not land: part of the reason Trump can get away with it is that it doesn’t feel like a deviation from his behind-the-scenes vernacular. He doesn’t look awkward when he swears. Much as he might try to put on presidential and “elegant” airs, and despite being born into privilege, Trump is at his core a brash, wheeler-dealer, anything-goes New Yorker.
He is also a man who knows what’s good for him: swearing provides genuine relief from stress, anger and even physical pain, according to research. In one study from 2020, psychologists at Keele University asked volunteers to repeat the F-word while submerging their hands in ice-cold water, and found that pain tolerance in those participants who did so increased by 33%. Those who repeated the made-up word “fouch”, meanwhile, registered no higher pain threshold. Somehow, uttering obscenities constitutes enough taboo-breaking that it triggers an aggressive fight-or-flight mode in the body, elevating the heart rate and leading to a soothing, pain-numbing effect.
All Trump really wants is the Nobel effing Peace Prize but it seems like others are just not willing to co-operate. So you can understand the man’s frustration. As Capitaine Haddock might say, Mille millions de mille sabords! Excuse my French.
«
That Keele study is quite peculiar, but maybe there is something in the taboo-breaking being an effective diversion.
unique link to this extract
Scatted Spider cybercriminal gang starts hacking aviation and transportation sectors • Axios
Sam Sabin:
»
The notorious Scattered Spider hacking gang is now actively targeting the aviation and transportation sectors, cybersecurity firms warned on Friday.
The group of mostly Western, English-speaking hackers has been on a months-long spree that’s prompted operational disruptions at grocery suppliers, major retail storefronts and insurance companies in the US and UK.
Hawaiian Airlines said Thursday it’s addressing a “cybersecurity incident” that affected some of its IT systems.
Canadian airline WestJet faced a similar incident last week that caused outages for some of its systems and mobile app.
A source familiar with the incidents told Axios that Scattered Spider was likely behind the WestJet incident.Josh Yeats, a WestJet spokesperson, told Axios that the company has made “significant progress” to resolve the incident, but did not answer questions about Scattered Spider’s possible involvement.
Charles Carmakal, the chief technology officer at Google’s Mandiant Consulting, said in an emailed statement that the company is “aware of multiple incidents in the airline and transportation sector which resemble the operations of UNC3944 or Scattered Spider.”
«
Back in 2018 British Airways was hit hard by hackers who diverted credit card data off payment pages. This is potentially worse, though: if an airline’s systems get hit by ransomware, things could be very bad.
unique link to this extract
How serious is Google’s ChatGPT problem? • Exponential View
Azeem Azhar:
»
Two years ago, I argued that Alphabet, which owns Google, faced a “GPT Tidal Wave” because “the start page of the Internet is shifting further from the browser and Google.com, replacing dozens or more Web searches each day. ChatGPT is preferable to open multiple tabs from a Google search and continuously backtracking.”
I’m an early adopter. Early adopters are either canaries in the coal mine or we’re wrong. Two years on, the data are starting to suggest we’re right. Slides from the investment firm Coatue circulated last week, showing what many of us sense anecdotally: once you adopt ChatGPT, you use Google less.
Across a still-short observation window, heavy ChatGPT users have cut Google’s page views by about 8% a year. That may feel mild, yet if the 800 million ChatGPT users today grow—plausibly—to three billion within three years, and if the search deficit holds, Google’s core business could shrink by a fifth, lopping tens of billions of dollars off annual revenue.
In truth, that is the bullish scenario for Google. ChatGPT is fast becoming the generic verb for “finding stuff,” and its advantage widens on difficult queries—which may be the very ones that anchor Google’s pricing power. The products will only get better at serving users’ needs, so that 8% figure could rise.
Fresh data from Britain’s competition watchdog, the CMA, reinforce the picture. For long-form questions, 17% of Britons already default to ChatGPT; they still turn to Google for simple, local, “tree-surgeon-near-me” look-ups. We do not yet know where the money lies—complex, ad-light queries or transactional, ad-heavy ones—but user behavior rarely plateaus.
I’ve highlighted a couple of important factors in yellow [on the graph in the post]. Once someone becomes an “AI user,” AI starts to eat into complex and shopping queries.
«
I can’t access all the post because it’s paywalled, but even this part at the start is dramatic enough.
unique link to this extract
More assorted notes on Liquid Glass • Riccardo Mori
Riccardo pours himself a Peroni and considers what, if anything, has improved:
»
Over the past couple of weeks, I’ve been trying to make sense of Apple’s latest user-interface redesign — Apple calls it Liquid Glass — that will affect all their platforms in the next iteration of their respective OS versions. But it’s hard to make sense of it when, after checking Apple’s own guidance, I’m mostly left with the feeling that at Apple they’re making things up as they go.
If you’ve been following me on Mastodon, you’ll be already familiar with a lot of what follows. I just wanted to gather my posts there in a more organic piece here.
Let’s start with a few notes on Adopting Liquid Glass, part of the Technology Overviews Apple has made available on their Developer site.
[After a section on “Organisation and Layout] …Which is largely unnecessary. It reduces the amount of information displayed on screen, and you’ll have to scroll more as a consequence. Look at the Before and After layouts: the Before layout doesn’t need solutions to increase its clarity. You’re just injecting white space everywhere. It’s also ironic that where more space and ‘breathing room’ are actually necessary, the header (“Single Table Row” in the figure) is pushed even nearer to the status bar.
And don’t get me started on those redesigned, stretched-out switches. They’re the essence of ‘change for change’s sake’.
«
It’s a long post, but as with the previous one, worth reading. The whole thing about Liquid Glass is that it’s the ultimate “change for change’s sake”. There was nothing wrong, per se, with the current Apple interface. But it didn’t have that marketing zhush. So everything gets torn up and made incomprehensible in the name of “different”.
I do remember when Steve Jobs introduced Aqua in 2000 (which then became a beta in 2001): the interface was surprising, but you could also do things with it that you couldn’t with the older Mac interface. (The column interface for file navigation in particular.) This… not so much. If “design is how it works, not how it looks”, this is badly designed.
unique link to this extract
| • Why do social networks drive us a little mad? • Why does angry content seem to dominate what we see? • How much of a role do algorithms play in affecting what we see and do online? • What can we do about it? • Did Facebook have any inkling of what was coming in Myanmar in 2016? Read Social Warming, my latest book, and find answers – and more. |
Errata, corrigenda and ai no corrida: none notified








