2025 / Jan

> January 2025

2 posts

R1 writes an essay about `Strawberry`

I was setting up OpenRouter to benchmark a bunch of different LLMs on some specific tasks we have at Nunc.

I noticed there’s an option to chat with various models and it even comes with some demos. Naturally, I clicked on the Strawberry Test.

A new chat window. It has a text field, send button. Above there are three demo options “9.9 vs 9.11. Which one is larger?”, “Strawberry Test. How many r’s are in the word” and “Poem Riddle. Compose a 12-line poem”.

In the chat the question “How many r’s are in the word strawberry” has been sent. The model has reasoned for 58.1s and responds, mentioning there are 3 instances of the letter “r”.

It got the right answer, which is nice. But it took well over a minute, despite the model generating at 244 tokens per second. And I get only 84 output tokens, which by themselves should take a third of a second, not more than a minute. So yeah, as the UI shows, the model spent quite some time “reasoning” about what to answer. Luckily for us, the reasoning traces are public.

Lo and behold, the traces of R1, 1515 tokens or 894 words about Strawberry:

Okay, let me try to figure out how many times the letter “r” appears in the word “strawberry.” First, I need to make sure I remember the correct spelling of “strawberry.” Sometimes, people might misspell it, so I’ll take a moment to think. It starts with “straw,” like the thing you drink from, and then “berry.” So combining those, it’s “strawberry”.
Now, let’s break it down letter by letter. Let me write out each letter one by one and check for “r”s.
S-T-R-A-W-B-E-R-R-Y.
Wait, let me slow down. Starting with the first letter:
S - no
T - no
R - yes, that’s the first “r”
A - no
W - no
B - no
E - no
R - second “r”
R - third “r”
Y - no
Wait, does “strawberry” have three “r”s? Let me check again. Sometimes when I spell it, I might confuse the number of letters. Let me write it out again more carefully:
S, T, R, A, W, B, E, R, R, Y.
Wait, that’s 10 letters? Let’s count the letters again. S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). So according to this breakdown, there’s an “R” at the 3rd position, then again at the 8th and 9th positions. That would make three “r”s. Hmm, but that doesn’t seem right. Let me check another way.
The word “strawberry” is a compound of “straw” and “berry.” Let’s check each part separately.
”Straw” is spelled S-T-R-A-W. Here, there’s one “r”.
”Berry” is B-E-R-R-Y. In “berry,” there are two “r”s.
So combining them: “straw” + “berry” = strawberry. Adding the “r” from “straw” and the two “r”s from “berry” would give a total of three “r”s.
But wait, when combining “straw” and “berry,” do any letters get dropped? Sometimes, compound words might drop a letter, but in this case, “strawberry” is spelled S-T-R-A-W-B-E-R-R-Y. So between “straw” and “berry,” the letters are all retained. So “straw” ends with W, and “berry” starts with B. Therefore, all letters are present, so the total number of “r”s is indeed three.
But just to be absolutely sure, let me spell it out again: S-T-R (1)-A-W-B-E-R (2)-R (3)-Y. Yes, that’s three “r”s. Wait, but I think a lot of people might think there are two “r”s. Maybe I’m overcounting? Let me check a dictionary or think of how it’s pronounced. Sometimes the pronunciation can be misleading. “Strawberry” is pronounced “STRaw-ber-ee,” and sometimes the “r” in “berry” is doubled. But in spelling, “berry” has two “r”s. So when combined with “straw,” it’s still two “r”s in the “berry” part and one in the “straw” part, totaling three.
But hold on, when I actually write “strawberry,” is that the correct spelling? Let me think: S-T-R-A-W-B-E-R-R-Y. So after W comes B from “berry,” then E, then double R, and Y. So yes, there are two “r”s in the “berry” part. Together with the one “r” in “straw,” that’s three “r”s in total.
Alternatively, maybe “strawberry” is spelled with only one “r” in the “berry” part? Let me visualize the word. If I write it down: strawberry – s t r a w b e r r y. Yes, that’s right, “berr” part has two “r”s. So the “berry” component is spelled B-E-R-R-Y, so two “r”s. Therefore, in “strawberry,” combining “straw” (with one R) and “berry” (with two Rs) gives three Rs total.
But wait another second. Let me check an alternative approach by counting the letters step by step:
S
T
R (1)
A
W
B
E
R (2)
R (3)
Y
So yes, three Rs. Therefore, the answer should be three. Hmm. But I remember sometimes people miss it as two. Let me confirm with a different method. Maybe typing the word and checking each letter:
S T R A W B E R R Y
Breaking it into parts:
Straw: S T R A W
Berry: B E R R Y
Yes, “straw” has one R, “berry” has two. So combined, three Rs.
Alternatively, maybe in the actual spelling, there is only one R in “berry”? Let’s think. The fruit is “berry,” which is B-E-R-R-Y. So for example, “berry” has two Rs. So if that’s correct, then “strawberry” would have three Rs.
Alternatively, maybe it’s spelled “strawbery” – no, that’s incorrect. The correct spelling is with two Rs in “berry,” hence “strawberry” with three Rs.
So the conclusion is three Rs. Wait, but let me get an external confirmation in my mind. For example, if I think of the word “strawberry” and how it’s taught in schools, the correct spelling includes two Rs in “berry,” leading to three in total. Another way to remember is that “berry” is spelled with two Rs, so when you add “straw,” you get three Rs.
Alternatively, maybe I should check how many syllables are in “strawberry.” It’s pronounced as straw-ber-ry, which might be three syllables, but that may not directly relate to the number of letters.
Alternatively, using a mnemonic: “Straw has an R, berry has two Rs, so together three Rs.”
Hmm, yes, that seems to fit. So I think the correct answer is three. But just to make absolutely sure, I’ll write the word out again: S T R A W B E R R Y. Scanning each letter:
S - T - R (1) - A - W - B - E - R (2) - R (3) - Y
Yes, three Rs. Therefore, the answer is 3.

Did you read it all? It seems a bit repetitive at first glance, right? But then again, humans can kind of only count up to three or four, so this close to our limit before we’d have to enter reasoning mode ourselves. Maybe AI will also get up to four without reasoning, who knows.

The Benchmark

In totally unrelated news, our benchmark for the business task we want to solve, shows that in terms of utility we don’t really care too much about which model we actually use. The outliers are due to API issues, which makes the results all the more realistic, I’d say. So while AGI or whatever would sure be nice, I think most business tasks look something like this.

Benchmark showing very similar performance for around 15 different models. There are some outliers.

I'm too lazy to blog “properly”

“Shit blog”, that’s what he called it. I read about the concept on the blog of Henrik Karlsson.

a few times a week, when I felt too tired to do “real” work, I wrung the last fifteen minutes out of myself by writing something for what I’ve come to call “my shit blog.” I’ve been surprised by how useful and interesting this turned out to be.

What has delighted me about the shit blog is how abundant it has made me feel. I sit down and type as fast as I can, and the results—well, they suck, but they don’t suck that much.

Now Henrik is a professional writer, he also has has a “proper” blog and writes essays. I’m not, and with everything else going on in life, my research, our startup, it’s not really something I’m aspiring to be. But I think having a small platform to publish short-form stuff can be really useful. I hope it won’t exactly be a „shit blog„ and maybe more of a „20% effort, 80% quality„ blog, but we’ll see. I’m calling it a “Lazy Blog”.

Why?
Why have a blog in the first place? I like reading other people’s blogs. Also, I like sharing useful things, and I enjoy writing as a creative outlet. Maybe other people could like reading my blog then? My prediction is that this blog will be a weird combination of the greatest python packaging tools, rants and bad prose, which sounds like it might get interesting.

What?
I will mostly post short things, hence this’ll be more of a microblog (but hosted on proper blog software). Some vague topics I have in mind:

  • Personal growth and self reflection
  • AI and society
  • Academia
  • Cool tools and workflows

Where?
I long thought about where to start posting stuff. What options are there?

  • Twitter/X: Still the go-to way for many researchers, although the frequent changes in algorithm make it unpredictable whether my feed will be useful to me on any given day. It’s also the most controversial of the options on this list.
  • LinkedIn: Great because most of my couple hundred professional contacts are already there. It just doesn’t feel right to use because the posts are … so LinkedIn. It’s where you post a job change and that’s about it.
  • Bluesky: They have some interesting suggestions like implementing 3rd party feed algorithms (which has been announced in July 2023 with no news I know of since then). I found it a lot harder to find relevant stuff than on the good days of Twitter. But it feels like a much more “real” place than Twitter, with actual humans sometimes reacting to you, which I never experienced on Twitter.
  • Mastodon / Federated: As many others, I tried it during the Twitter/X transition peak but couldn’t find things relevant to me.
  • Threads: Oh right, there’s also Threads. Never tried it, sorry Meta!

So what’s the best option? The fact that I almost forgot to mention Threads made me realize again, that there are many options and that all these options are constantly changing in one way or another. It seems like a bad idea then to fully invest in any of these platforms, without confidence which one will be the best for you in the long-run. I see some people going to various lengths of cross-posting stuff to different platforms, but where’s the single source of truth?

Independent, single source of truth

I believe it’s best to have a single source of truth for all posts, this place right here, my blog. I’ll definitely crosspost to social media, but this is the home of everything I write. It’s not going to change unless I change it. It will be the single source of truth for what I’ve published. There’s no moderation that will change depending on politics, none of my posts will get removed for any reason, no federated instance can go down and take my profile with it, no enshittification. Just my blog. Welcome.