The Challenge

May 3, 2026

Introduction

Boy, you think you know where you're going when you sign off on a post! You don't. Things can always get more complicated.

Last time I defined a north star: Tools for thought that don't degrade our cognitive abilities. I also said I would identify the bottlenecks that might be preventing progress towards these tools. But the thing is, making any kind of significant progress towards your north star requires other people. Which means you don't just need to understand what the path to your star looks like -- you need to understand what other people think, too, especially if their own north stars have particular gravity that will hasten or warp your path. And even if you find allies whose stars align with yours, you might have very different ideas of the best way to reach them.

I suspect that figuring out this constellation of stars and paths is part of what people call "field-mapping." In that spirit, this post is more of a field report. I do, in fact, have one or two suggestions for levers that could catalyze innovation around poietic tools. But I also think the process I went through to design those suggestions might be interesting to others. So instead of just saying what my recommendations are, I'm going to start with how I got there.

What do you think the future looks like?

Here's a picture I've been thinking about a lot:

The y-axis is how good an LLM (or, let's say, AI capabilities in general) is at a thing. The x-axis is how good we are at a thing. The dots are things. Don't worry about what these are yet. Do worry about whether the lower triangle includes augmented humans. It does. Augmentation can mean everything from my notebook to my Claude terminal; the requirement for being below the line is that I'm in control and I know what's going on.

The dotted line is the line of parity, where AI and humans are equally good at a task. We can expect that as AI gets better, some things will move across the line towards the AI side. But that could happen for two reasons:

  1. AI gets better.
  2. We get worse.

Note that these outcomes aren't independent across things and they're not symmetric, either. As we saw in the first part of this series, AI getting better at one thing could potentially degrade our capabilities for others. And us getting worse at something (e.g., navigating conflict through discussion) doesn't mean AI will pick up the slack. If we get worse and AI does not get better, we could "lose" things entirely.

At the same time, whether and why a thing crosses the line doesn't always have a universal moral color, whether or not it is lost. Take memorizing epic poems as an example: Socrates thought writing would degrade our memories; most of us today don't care that we can't recite more than the first two lines of the good part of St Crispin's Day speech. Now take letter-writing as an example. Did we lose a depth of reflection and connection that was never recovered by email?

What I've realized over the last few weeks is that anything I write in this post will be colored by what things you think will cross the line and how you feel about that. Some people are very, very scared that AI will swallow all things. Some people are worried about Skynet scenarios or more insidious versions where machines creep into control simply by virtue of our inability to coordinate and notice on large scales. Some people are actually pretty into that. Some people don't think that AI will come for the things they value anytime soon and are much more worried about, say, mass layoffs, or the fact that building data centers degrades the environment faster than any AI-enabled maneuvering will fix it. I've read and talked to people about a lot of these takes and my head has gotten muddled. It's hard to have a conversation about what to "do" if no one agrees what the problem is.

So here's a clarifying exercise. The groundwork for any productive conversation about the picture above probably looks like this:

  1. How long, if ever, before a thing crosses the line?
  2. Why will the thing cross the line?
  3. Should the thing remain below the line?
  4. If so, how can we keep the thing below the line? If not, how can we kick it across faster?

Now, look, this set of questions is not unique. Plenty of think pieces are being published that answer them implicitly. I'm sure more collective efforts like conferences try to answer them more explicitly. There probably aren't even unanimous answers for most things. But I'm laying them out here so that a) you can see why I suggest the lever I do and b) if you disagree, I know whether you have predictive qualms (1 - 2), normative or moral qualms (3), or practical qualms about leverage (4).

Here's the interactive part. I've picked a thing. You pick the answers. (Well, actually, I pick one of the answers but you'll see why.) I propose the same lever, but the design changes depending on what you pick.

Ready?

The Thing

We're going to focus on a well-studied and pretty important thing: Sensemaking.

"Sensemaking" is a term used by Pirolli and Card to describe how intelligence analysts turn heaps of disconnected data into clear narratives. Take a look at their picture of what intelligence analysis looks like as a whole:

You have lower-order tasks like searching for information, filtering it, skimming it, and so on. Pirolli and Card term this the "foraging loop." Then you have higher-order tasks like schema building, hypothesis testing and storytelling. These parts are what Pirolli and Card call "sensemaking." Right now, LLMs live mostly in the bottom left part of the picture, and they're not even great at that. Humans are good at the top right part of the picture, with two caveats: 1) We need to go through the whole process to get good at the top bits, and 2) It can take us a very long time to get the right answers.

Ok, so sensemaking is our chosen thing. Take as given that we want to get better at it. Take also as a given that there is a lot of research on how humans learn and think, a lot of research on how humans do sensemaking specifically, a lot of research on how humans share mental models in teams, and even a lot of research on how technology can help us do these things faster. But LLMs are so new, so much better than past NLP, and sometimes still so much worse than humans that how to integrate them into sensemaking isn't clear. And that's before we start worrying about possible deleterious effects on human ability.

The Lever

Let me pause for a moment to tell you a story you have probably already heard. In 2004, the DoD wanted autonomous vehicles to be a thing, and the research world was moving too slowly. So DARPA offered $1 million to any team with a self-driving car that could drive 150 miles through the Mojave Desert. The DARPA Grand Challenge, as it was known, was very popular, super difficult and heralded as the beginning of the self-driving car industry. It sparked progress and innovation through the definition and incentivizing of a clear goal.

Challenges are useful when there is a) a clear objective and we b) don't know the best way to get to it. This is true of sensemaking: We have a bunch of ideas about how machines can help do it, but none has been proven the best and many of the most innovative approaches are still prototypes. So that's one box checked on the list of criteria for a challenge. Another thing about sensemaking: The success condition is clear. No matter how fuzzy and mysterious and remarkable the stuff inside our heads is, at the end of the day, we've either caught bin Laden faster or we haven't. Clear objective, many possible paths, and simple rules for winning: These, my friends, are good conditions for a challenge.

But hold on, you might be thinking -- you didn't actually state an objective. Do we want to get better at sensemaking, or do we want to get better at sensemaking in a way that doesn't damage our own skills and expertise? We can't avoid answering this, because it changes the design of the challenge. And, hey, you and I both know what I think.

Here's the super fun interactive part. What do you think?

The Questions

How long, if ever, until sensemaking crosses the line of parity?