How do startups get cited by LLMs like ChatGPT and Claude?

Not by polishing their own homepage, and increasingly not by seeding Reddit either. For a brand-new tool, the durable path is three moves. First, write long-tail comparison and use-case content for the exact questions buyers ask their AI, like 'X vs Y for a Next.js and Netlify project,' on intents where you genuinely fit and no competitor has written the page. Second, build product-bot fit so an agent that picks your tool can finish the task on the first try. Third, line up real customers and partners to vouch for you by name. The community layer that used to carry tool recommendations, meaning Stack Overflow, Reddit, and independent YouTube, is thinning out, so a new brand should own the content the model retrieves instead of waiting for a forum to talk about it.

Is Reddit still important for LLM citations?

It's the single most-cited domain, but that matters less than the popular chart suggests. Looking at the full citation set instead of the top-ten leaderboard, Reddit is roughly 2.5% of all citations and Wikipedia about half a percent. Ninety-five percent of cited domains sit outside the top ten. Reddit also faces a narrowing pipe to the models: Anthropic has reportedly lost access to it after a scraping dispute, and Reddit's robots.txt now disallows crawlers. The right use of Reddit for a startup is to mine it for the questions developers actually ask, then answer those questions on a surface you control.

Can a new startup get into LLM pre-training data?

No. Pre-training data is effectively locked. Major LLM providers update base model weights on 12 to 18 month cycles, and the training corpus for the next cycle closes months before release. A seed-stage startup launching today cannot meaningfully influence what the next GPT or Claude model learns about their brand. The winnable game for startups is retrieval, where LLMs pull live content from the web at query time through search grounding and RAG pipelines.

What is product-bot fit and why does it matter for AI visibility?

Product-bot fit means that when an AI agent picks your tool, it can complete the task on the first try with no obstacles. Task completion is becoming a stronger signal than informational content. Coding agents like Cursor and Claude Code decide which tool to use inside the repo, which is a different funnel stage than a chatbot answering a buyer's question, and the switching cost is high once your package is in the lockfile. Build for the agent: SDK names that encode the use case, docs an agent can execute step by step, and an honest llms.txt that describes what your product does.

Do case studies help with LLM visibility for startups?

They help most where humans make the final call, which they still do for a tool a buyer has never heard of. They may also nudge how a model frames you: if the model has absorbed a repeated narrative that your category does not scale, a first-party case study with real numbers gives it something concrete to weigh against that story. The evidence on the LLM side is thin, so treat social proof as a human-trust play first and an LLM play second.

How a new dev tool gets cited by AI when the community is fading

A founder asked me last week how his three-month-old startup could show up in ChatGPT and Claude the way the incumbents do. He had a product, a website, a GitHub repo, and one quarter to make something happen. The honest answer has two halves, and the second half has changed since the last time I wrote about this.

The first half: he can't win pre-training. When a model recommends a Postgres backend with built-in auth and vector search, it learned that association from years of developer-written artifacts that existed before the training cut-off. Show HN posts, Stack Overflow answers, GitHub issues that became reference threads, hundreds of tutorials on personal blogs. A company launching today can't reverse-engineer a decade of public goodwill in six months. Pre-training updates on 12-to-18-month cycles, and the corpus for the next one closed months ago.

The second half is the part that's shifted. The standard advice for the retrieval game, meaning the live web content LLMs pull at query time, is "seed Reddit, get on dev.to, collect a few G2 reviews." I've given a version of that advice myself. It rests on a developer community that is both smaller in the citation data than the charts suggest and shrinking in real time. For a brand-new tool, betting your visibility on that community is betting on the wrong horse.

This post is the corrected playbook.

The Reddit number everyone quotes is the wrong number

You've seen the stat: Reddit shows up in 40% of LLM answers, ahead of Wikipedia, YouTube, and LinkedIn. I quoted it myself in an earlier post. It's misleading, and it's worth seeing why.

That figure comes from a pie chart of the top ten cited domains. Inside that top ten, Reddit looks enormous. But the top ten is not the whole pie. Across the full set of citations, Reddit is about 2.5% and Wikipedia about half a percent. Ninety-five percent of cited domains sit outside the top ten entirely. Reddit is the single most-cited domain, which is real and worth something. A strategy built only on Reddit still ignores 95% of the places models actually pull from.

Then there's the pipe between Reddit and the models, which is narrowing. Anthropic has reportedly lost access to Reddit after a scraping dispute, and Reddit's own robots.txt now tells crawlers to stay out. I wrote a few days ago about how Reddit is walling off the last honest channel. The forum is as honest as ever. What changed is access. The connection between it and the models is getting thinner, because Reddit's data-licensing revenue is a rounding error next to its ad business, and it would rather sell that data than let it get scraped for free.

The community layer that recommended tools is thinning out

Reddit is one symptom. Stack Overflow is effectively dead. Independent YouTube tutorials are losing to AI-generated content and to viewers who'd rather ask a model than sit through a fifteen-minute walkthrough. Tim Ferriss, looking at his own book sales down 46% in 2025, made the case that prescriptive nonfiction is next, that anything whose value is transferring instructions from one mind to another now competes with a chatbot that does it instantly and for free.

Two kinds of source taught the models almost everything they know about dev tools: the community, meaning Stack Overflow, Reddit, blogs, and YouTube, and commercial entities, meaning vendor docs and official guides. The community wrote for an audience. Now that the audience asks the model directly instead of reading the forum, the community's reason to keep writing erodes. The vendor's reason does the opposite. When the reader is an agent deciding what to recommend, a vendor's incentive to publish only goes up.

So the community layer thins while the vendor layer grows. For a startup that never had a community presence to begin with, that isn't bad news. It means the game is moving toward content you can actually produce yourself.

Move one: write the comparison no one has written yet

Comparison content is wildly overrepresented in what LLMs cite, especially for dev tools. Ask a model what to use for a database, a vector store, or a backend platform, and the pages it leans on are overwhelmingly "X vs Y" and "best X for Z" write-ups. That's because a comparison page matches the shape of the question a buyer actually types.

The opening for a new tool is the long tail. About 60% of ChatGPT prompts run longer than ten words, against roughly three and a half for a Google search. People bring their whole situation to the model: "Postgres backend for a Flutter app with row-level security," not "Postgres backend." A model asked "best database for a Next.js and Netlify project that already uses Prisma" returns something different than it does for "best database." Most of those specific intersections have no dedicated content yet. That's your ground. Find the intents where you genuinely fit and where no competitor or neutral source has written the page, and write it.

Don't guess at the questions. Mine them. Reddit threads, support tickets, and sales-call recordings are where the real long-tail prompts live, the ones keyword tools miss because they only track short head terms. Webflow did exactly this. They scraped Reddit for the feature questions people kept asking, then answered those questions as structured FAQs on their own feature pages. Those FAQs drove 57% of their new LLM citations within a few weeks. The community surfaced the questions, and Webflow owned the answers. That's the move a startup should copy, and I wrote more about structuring that kind of content in making your content AI-friendly in 2026.

Move two: build product-bot fit

Getting cited is half the job. The other half is what happens after the agent picks you. If a developer's AI agent chooses your tool and then hits friction, a confusing SDK, docs it can't act on, a setup step that needs a human, the agent backs out and tries the next option. Task completion is becoming a stronger signal than any blog post you could write.

This is a different surface from the chatbot. When Cursor or Claude Code makes a decision inside a repo, it's conditioned on the framework, the deploy target, and the dependencies already in the lockfile, and the switching cost once your package is in there is high. Build for that agent first: SDK names that encode the use case rather than just the brand, docs written so an agent can execute them step by step, and an honest llms.txt that describes what your product does. I made the broader argument that your docs are for AI now. For a startup, the docs are also the thing that decides whether the agent finishes the job in your tool or someone else's.

Move three: get humans to vouch for you by name

The agent narrows the list. A human still makes the final call, especially on a tool they've never heard of. This is where partners, named customers, and case studies earn their keep. Someone has to be willing to risk a young product, and the thing that makes that risk feel survivable is another real company's name attached to it.

Case studies may also do quiet work on the model itself. I'm less sure about this one, so take it as a hypothesis: if a model has absorbed a repeated story that your category "doesn't scale," a first-party case study with real numbers gives it a counterweight to cite. The human-trust value is the reliable part. The model-perception value is a bet. Build the case study for the human first, and treat any LLM benefit as a bonus.

What still backfires

The shortcuts are the same traps they always were. Fake Reddit accounts, paid G2 reviews, AI-generated tutorials, and keyword-stuffed llms.txt files all hurt you, because models are increasingly trained to recognize content that was engineered to manipulate them. AI-generated spam gets treated more harshly than old-fashioned search spam. And once a model ties your brand to low-trust signals, that association outlasts whatever short-term lift the trick bought.

The llms.txt file deserves its own warning, because startups keep trying to use it as a billboard. Stuff it with superlatives and you get ignored at best and flagged at worst. Write it as an honest description of your product's structure, nothing more.

The audience is the agent now

The pre-training race is over for anyone launching today, and the community race was one a new brand was never going to win. What's left is the race the incumbents haven't finished either: producing the specific, current, citable content that agents retrieve, and shipping a product an agent can actually succeed with. The most durable version of this is original research, the kind that turns your startup into a source a model cites by name instead of only a product it lists. I made that case in proprietary research is the only content moat left.

I've written before that the next era of developer marketing treats AI agents as a real audience with its own reading habits. For a startup, that's the whole opportunity. You don't have a community yet, and you don't need to fake one. Write the page the agent retrieves and ship a tool the agent can finish a real task in. Then go find the humans willing to put their name next to yours.

For more on AI-era marketing for developer tools, visit the AI Marketing Hub.