Essay · 9 min read

Why most journaling apps fail (and what they're missing).

There is, somewhere on your phone, a beautifully designed app you opened eleven times. The icon was tasteful. The onboarding was thoughtful. The first three days, the prompts felt right. By day five they felt slightly thin. By day eight you were writing the same three sentences. By day eleven you stopped, and the app sat on your home screen for two months before you moved it into a folder called later, which is where apps go to die.

You did not fail. Or, more precisely, the app did not fail because you failed. Most journaling apps fail people in a particular, specifiable way, and the reason is not a question of motivation or willpower. The reason is that the design pattern most journaling apps use is, on close inspection, in tension with what reflection actually requires. They are optimising for capture. Reflection is something else.

This piece is not a list of which app is best. It is a critique of the design pattern itself, and a sketch of what an app would need to do to be a thinking tool rather than a streak tracker.

The streak problem

The first place where the design pattern goes wrong is the most visible. Most journaling apps, taking their cues from Duolingo and the broader behaviour-design school, have built the reward structure around consecutive days written. The streak is the dominant on-screen number. Notifications enforce it. Friend leaderboards extend it. The product, considered as a system of incentives, asks one question of the user every day: did you write today, yes or no.

This is a coherent design choice if the goal is to maximise engagement. The behavioural economist Wendy Wood, whose 2019 book Good Habits, Bad Habits synthesises four decades of research on habit formation, has shown across many studies that consistency-based reward structures are unusually effective at producing repeated behaviour.¹ The streak works. People keep coming back.

The problem is that the goal of journaling is not maximised behaviour. It is the production of useful self-knowledge, which is a different and more demanding outcome. And streak-based incentives produce a specific failure mode when applied to writing: they encourage the user, on bad days or busy days, to write the smallest possible thing that will preserve the streak. Tired today. Long day. Saw mom. It was fine. The streak is preserved. The reflection, as Wood's framework would predict, has been replaced with a token gesture sufficient to satisfy the reward.

Over time, the streak-preserved entries dominate the corpus. The journal that the user expected — the document of who they were across a year — turns out, on inspection, to be a list of three-word obligations to a counter. The counter is not the journal. But by the time the user notices the gap, the engagement loop has already had its run.

The prompt problem

The second failure is at the level of the prompt. Most apps, having extracted the user's commitment via the streak, then offer a daily prompt — a question or template designed to produce content. What are you grateful for today. What was your win. Rate your mood. What's one thing you want to remember.

These prompts have a problem the app's designer often does not see. They are generic by design — written to apply to any user on any day — and the cost of being generic is the cost of producing generic content. Joshua Smyth's 1998 meta-analysis of the expressive writing literature was the first quantitative synthesis of the field, and although it found a robust overall effect of expressive writing on health outcomes, it also identified specific writing content instructions as one of the moderating variables.² The recipe matters. Subsequent boundary-condition work by Smyth and Pennebaker has continued to find that what the writer is asked to write about — and how openly the prompt allows the writer to follow their own thread — meaningfully shapes whether the writing produces benefit.

The relevant comparison is what an open prompt produces versus what a closed prompt produces. The closed prompt — what are three things you're grateful for — produces, after a few weeks, a list of three things, repeated with minor variation. The open prompt — what's actually going on — produces, after a few weeks, an irregular, uneven, occasionally insightful record that varies enormously in length and texture and, crucially, contains material the closed prompt would have actively prevented.

This is the trade-off. Closed prompts are easier on day one — they remove the blank-page friction that makes new writers freeze. Open prompts are harder on day one and substantially better by day thirty. Most journaling apps, optimising for activation rather than depth, choose the closed prompt. Most journaling apps, predictably, produce shallow content.

The privacy paradox

The third failure is the most interesting and the least often named. Most journaling apps are built on the assumption that the value of journaling is privacy — that the user wants a sealed, encrypted, no-one-else-can-read-this space, and that the entire job of the app is to provide one. This is not wrong. But it is incomplete in a specific way.

The psychologist Carl Rogers, in his 1957 paper The Necessary and Sufficient Conditions of Therapeutic Personality Change, identified what he believed were the three conditions under which deep self-disclosure produces change: empathy, congruence, and unconditional positive regard.³ His argument, which has been carried forward in the person-centred and humanistic traditions for nearly seventy years, was that disclosure into a vacuum does not, on its own, produce the change that disclosure into a witnessed space produces. The presence of an attentive other is, in his framework, structurally part of what makes disclosure transformative.

This is consistent with how Pennebaker himself has described the mechanism of expressive writing in later syntheses of the field. The benefits of writing, on his account, do not come from privacy alone — they come from the writing producing a coherent narrative, and narrative is, structurally, a form addressed to a listener, even when the listener is implicit or imagined.⁴ The page is not a void; it is a position the writer is, in some sense, speaking from and toward.

This is the privacy paradox. Encryption is not the same as witness. An app that promises absolute privacy is delivering one component of what users need — the safety to be honest — while removing another component they did not know they were depending on — the felt sense that the writing is going somewhere, addressed to something. The result, for many users, is the strange experience of writing into an app and finding the writing oddly hollow, even though the content was true. The writing was true. There was nothing on the receiving end. The truth, with nothing on the receiving end, lacks something it would have if there were.

Some apps have started to address this with AI responses. The instinct is correct. The execution, in most current implementations, is poor — the AI replies are generic, sympathetic, and slightly performative, which is closer to a customer service chatbot than to the kind of attention Rogers was describing. But the underlying design problem the AI move is responding to is real. A journal needs a witness, even an imagined one, to do what journals are supposed to do.

The summary illusion

The fourth failure, characteristic of the current generation of AI-enabled journaling apps, is the assumption that summarisation is the most useful thing AI can do with journal content. The pattern is familiar. The user writes for thirty days. The app, at the end of the month, produces a summary. You wrote about work stress on twelve days. Your mood averaged 6.2/10. Your top three themes were career, family, and sleep.

These summaries feel impressive on first encounter. The user looks at them, nods, and feels seen by the data. After three or four months, however, a quieter problem becomes visible. The summaries flatten what the writing was actually doing. They reduce a textured account of an evolving inner life to a small set of categorical buckets, and the buckets are usually the same buckets every month. The user begins to feel, accurately, that the AI has not been reading the writing so much as classifying it.

The deeper failure is not the quality of the summary. It is the choice of the operation itself. Summary is the wrong direction of attention for a reflection tool. Summary compresses — it takes a textured surface and produces a smooth one. What reflection actually needs is the opposite: expansion. The most useful AI move on a journal is not to summarise what you wrote but to draw out what you wrote around — the implicit, the named-but-undeveloped, the recurring person you have mentioned forty times without writing about. Compression makes the corpus tidy. Expansion makes it useful.

This is a design choice, not a technical one. The same models that write summaries can be prompted to do the opposite. Almost no journaling app does the opposite, because compression produces clean, share-friendly, dashboard-friendly artefacts and expansion produces messy, long, individual, threaded explorations that don't fit on a dashboard. The dashboard wins. The user, accumulating dashboards instead of insight, eventually quits.

What an honest journaling tool would have to do

Working backward from these four failures, a journaling tool that took reflection seriously would probably have to do at least four things differently:

It would have to de-emphasise the streak and instead reward depth — not by counting words, which produces its own pathologies, but by structuring engagement around weekly review rather than daily compliance. The Sunday-evening reflection on the week is, on most measures, more useful than the daily-evening obligation to log.

It would have to use open rather than closed prompts, and tolerate the higher day-one friction that comes with this choice — knowing that the users who push through the friction will produce the kind of writing that actually rewards a long-term tool, while the users who would have churned anyway are not going to be retained by a closed prompt for very long.

It would have to provide a felt sense of being witnessed, probably through an AI conversational partner that is genuinely attentive rather than reflexively sympathetic, and probably through occasional messages back from the system that reference what the user has written before — not as a parlour trick but as the structural feature that makes the writing feel like it lands somewhere.

It would have to expand rather than compress, using whatever AI is available to surface what the user has not yet written, the patterns running under the writing, the people whose absence in the corpus is as informative as their presence — rather than producing tidy summaries of what is already on the page.

These are not radical design moves. They are unglamorous, slightly harder to onboard, and more difficult to render in an App Store screenshot. They are also what the research suggests is required for the activity to do what users came to it for.

A note on what we're building

Loopn was built around exactly this critique. There is no daily streak. The Sunday Loop is the centre of gravity, not the daily entry. The chat is unstructured — no prompts you have to fill in, just a conversation. The AI is designed to expand rather than compress, and its job is to remember the people you've mentioned and the threads you've left hanging, not to produce a tidy monthly summary. None of this means we have solved every problem this piece names. We will eventually have to think hard about how to keep the user coming back without falling into the same engagement traps the rest of the category fell into. The design philosophy is harder to maintain than to articulate.

But the starting point — that the standard journaling app is built on assumptions that work against reflection — is what shaped almost every concrete decision in the product.

If any of this resonated, you can read more about the quiet case against gratitude journaling or what your inner monologue is actually telling you.

Wood, W. (2019). Good Habits, Bad Habits: The Science of Making Positive Changes That Stick. New York: Farrar, Straus and Giroux. For the underlying empirical work on consistency-based reinforcement, see Wood, W., & Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychological Review, 114(4), 843–863.
Smyth, J. M. (1998). Written emotional expression: Effect sizes, outcome types, and moderating variables. Journal of Consulting and Clinical Psychology, 66(1), 174–184. For boundary-condition work on what kind of writing produces benefit, see Smyth, J. M., & Pennebaker, J. W. (2008). Exploring the boundary conditions of expressive writing: In search of the right recipe. British Journal of Health Psychology, 13(1), 1–7.
Rogers, C. R. (1957). The necessary and sufficient conditions of therapeutic personality change. Journal of Consulting Psychology, 21(2), 95–103. The framework is developed at book length in Rogers, C. R. (1961). On Becoming a Person. Boston: Houghton Mifflin.
Pennebaker, J. W. (2018). Expressive writing in psychological science. Perspectives on Psychological Science, 13(2), 226–229. Pennebaker's argument that the mechanism of expressive writing is closer to cognitive narrative construction than to private catharsis is developed across this paper and his earlier collected work, including Pennebaker, J. W., & Smyth, J. M. (2016). Opening Up by Writing It Down (3rd ed.). New York: Guilford Press.