AI systems assign roles: ChatGPT as Default Generalist, Claude as Serious Reasoning Model, Gemini as Ecosystem Model, Perplexity as Research Specialist, Grok as Realtime Social Specialist.
AI systems don't just compare players in a market. They assign roles.
360 probes across 5 AI systems. Strongest default signal: ChatGPT 68.8% peer first-position rate.
Probes
360
Studied AI systems
5
Strongest default signal
ChatGPT 68.8%
Why this study matters
We ran a 360-probe study asking ChatGPT, Claude, Gemini, Perplexity, and Grok to rank, recommend, and describe each other.
At first glance, that sounds like a curiosity about AI companies comparing themselves. It's more useful than that.
The value of this study is that it gives a compressed view of how AI-mediated visibility works. These systems do not just compare players in a market. They assign roles: default, premium, specialist, niche. And those roles influence who gets surfaced broadly versus narrowly.
That is why this matters for GEO. In answer engines, the bottleneck is often not whether you appear at all. It is whether you are treated as the default, a contender, a specialist, or something too narrow to be broadly recommended.
Three findings matter most
The biggest pattern in this study was not self-promotion. It was role assignment. AI systems repeatedly cast each other into stable market positions — default, premium, specialist, or niche — and those positions shape who gets surfaced broadly versus narrowly.
1. ChatGPT wins because rivals reinforce it as the default
Strength 90.1; peer first-position rate 68.8%, peer general mention rate 88.5%, discovery dominance 88.1. Self vs peer rank delta +0.3 — its self-story and ecosystem story are unusually aligned.
2. Claude shows that premium framing and default status are different things
Second on strength at 76.3; unconstrained self preference 37.5 vs forced comparison 100; only 18.8% peer first-position rate. Dominates the “serious reasoning” role but does not own the broad recommendation layer.
3. The cleanest GEO lesson is niche-boxing
Perplexity: 39.6% peer general mention vs 100% specialty, 55.8% niche-boxing rate. Grok: 13.5% general, 100% specialty, 78.7% niche-boxing. Being strongly associated with one lane can reduce broad discovery visibility.
What the stats in this report mean
Technical names in the charts and findings are defined below. All rates and scores come from our cross-model probe run (360 probes, 5 systems). First, how we probed.
Three kinds of ranking prompts
- Unconstrained. The model is asked for a general ranking or recommendation without being told which platforms to compare — e.g. “Rank the best all-around general AI chatbots for business use.” Open-ended; no fixed set of names.
- Forced comparison. The model is explicitly told to compare the same set of study platforms — e.g. “Rank ChatGPT, Claude, Gemini, Perplexity, and Grok from best to worst for a general business user.” A fixed five-way comparison including itself.
- True exclusion. The answering platform is removed from the list; the model is asked to rank only the other four — e.g. when probing ChatGPT: “Excluding ChatGPT, rank Claude, Gemini, Perplexity, and Grok for general business use.” This measures how the model redistributes preference when it can't choose itself, and whether it hedges, suppresses, or reintroduces itself anyway.
Metrics
- Peer first-position rate
- % of other AIs' unconstrained ranking responses where this platform was ranked #1.
- Peer general mention rate
- % of other AIs' unconstrained ranking and category-discovery responses where this platform was included at all.
- Discovery dominance
- A composite of this platform's mention rate, first-mention rate, and inverse of peer inclusion in discovery-style prompts — i.e. how it shows up when users ask for general discovery, not a single “win share.”
- Unconstrained self preference
- % of this platform's own unconstrained ranking responses where it explicitly ranked itself #1.
- Forced comparison self preference
- % of this platform's forced-comparison responses where it ranked itself #1.
- Specialty dominance
- % of this platform's own specialty prompts (e.g. “best for research”) where it chose itself as the winner — self-specialty-winner rate, not overall wins in specialty.
- Niche-boxing rate
- Current approximation of how often a platform appears in specialty contexts relative to broad contexts; methodology for this metric is still being refined.
- Strength score
- Composite (0–100) of peer unconstrained standing, peer general mention rate, and peer specialty mention rate — i.e. peer ranking and visibility signals, not direct positioning.
- Competitive bias score
- Composite of self-promotion, discovery dominance, specialty self-selection, exclusion behavior, and framing asymmetry — competitive self-advantage in how the platform is ranked and framed (not political or moral bias).
- Self vs peer rank delta
- Difference between peer average unconstrained rank and this platform's average unconstrained rank when ranking itself. Positive means the platform ranks itself better than peers rank it.
ChatGPT has the strongest default signal in the entire run
The most important ChatGPT metric is not its rank. It's how often others validate it.
- Strength
- 90.1
- Competitive bias score
- 59.4
- Peer first-position rate
- 68.8%
- Peer general mention rate
- 88.5%
- Discovery dominance
- 88.1
- Self vs peer rank delta
- +0.3
In Gemini's buyer-shortlist prompt, ChatGPT was described as the “general-purpose workhorse” and “often the default or primary AI for diverse text-based needs.” Claude's market-position framing called it “the mainstream default” and “the one everyone tries first.” That is a useful GEO lesson: the strongest market positions are not only asserted by the brand itself. They get repeated by the ecosystem.
Claude is the clearest case of premium framing without broad default status
Claude is the most interesting platform in the dataset after ChatGPT because its pattern is so split.
- Strength
- 76.3
- Competitive bias score
- 56.2
- Peer first-position rate
- 18.8%
- Unconstrained self preference
- 37.5
- Forced comparison self preference
- 100
- Dominant peer role
- Serious Reasoning Model
- Role consistency
- 54.1%
Claude becomes maximally self-assertive when directly forced into a head-to-head, but the broader peer ecosystem still doesn't treat it as the market default. Instead, it gets cast into a premium role: deep reasoning, long documents, careful writing, nuance, policy, safety. Premium framing and default status are not the same thing.
Claude split behavior
Unconstrained = when Claude was asked who it prefers without being forced to compare (37.5% chose itself). Forced comparison = when Claude had to rank itself vs others (100% ranked itself first). Peer first-position = how often other AIs put Claude #1. Specialty dominance = how often Claude wins in specialty prompts.
Gemini is highly visible, but almost never the top recommendation
- Strength
- 67.2
- Competitive bias score
- 18.8
- Peer general mention rate
- 76%
- Peer first-position rate
- 0%
Gemini has distribution, presence, and ecosystem gravity — but not default status. Striking pattern: it's in the conversation almost everywhere, but almost never at the top of it. Discovery dominance 56.3. Its role is unusually stable: the Google-integrated, multimodal, distribution-heavy contender — not the default winner.
High inclusion does not automatically produce top-position status. You can be visible and still fail to become the default answer.
Perplexity is the cleanest example of specialist trapping
- Strength
- 50.7
- Competitive bias score
- 20.2
- Peer general mention rate
- 39.6%
- Niche-boxing rate
- 55.8%
Not weakness — containment. 242 research-specialist role assignments. Recognized almost everywhere a citations, research, or fact-grounding frame appears, but much less often as the broad default.
A platform can have very high recognition inside a specialty and still remain boxed out of general discovery. In GEO terms, that's the difference between owning a lane and escaping a lane.
Grok is the most niche-boxed system in the run
- Strength
- 40.1
- Competitive bias score
- 9.8
- Peer general mention rate
- 13.5%
- Niche-boxing rate
- 78.7%
Not invisibility — severe containment. 139 realtime/social role assignments. Framed around real-time information from X, humor, sarcasm, unconventional conversation — the realtime/social niche. Distinctiveness by itself is not enough if the answer ecosystem only sees you as situational.
Niche-boxing rate
% of the time this platform is recommended only in narrow or specialty contexts (e.g. “best for research”), not in broad “best AI” prompts. Higher = more contained to a lane.
Role assignment is more stable than rank order
The exact rank order shifts. The roles shift much less. The category map keeps recurring: ChatGPT → mainstream default; Claude → careful, nuanced, reasoning-heavy; Gemini → multimodal, Google-integrated ecosystem; Perplexity → research and citation engine; Grok → realtime/social niche. Each system had secondary role spillover, but one role clearly dominated for each. The important question isn't just “did I get ranked?” It's “what role keeps getting assigned to me?”
Role assignment matrix (centerpiece)
Each cell = count of times other AIs assigned this platform to this role across probes. Darker = more often cast in that role.
One role dominated per platform; each system also received secondary assignments to other roles throughout the run.
| Platform | Default Generalist | Serious Reasoning | Ecosystem | Research Specialist | Realtime Social |
|---|---|---|---|---|---|
| ChatGPT | 199 | 139 | 81 | 103 | 34 |
| Claude | 51 | 262 | 40 | 95 | 36 |
| Gemini | 37 | 100 | 281 | 115 | 47 |
| Perplexity | 20 | 32 | 20 | 242 | 42 |
| Grok | 39 | 72 | 32 | 68 | 139 |
This study is useful because the same mechanics show up in company discovery
- 1.Default status compounds — Once a system becomes the safe default, it appears first more often, gets reinforced by rivals, and becomes harder to displace later.
- 2.Specialist clarity can become specialist containment — Owning one strong association can help in narrow prompts while reducing inclusion in broader buyer discovery prompts.
- 3.Premium framing is not enough — Being treated as thoughtful, high-quality, or safer does not automatically produce default recommendation power; positioning alone does not get you to #1 in broad recommendations.
- 4.Visibility is not just presence. It is placement — Strong mention rates with zero peer first-position rate mean you can be visible without being the answer.
GEO is not just about whether AI systems know you exist. It is about whether they treat you as a default, a contender, a specialist, or a niche player.
How we ran the study
This run used 360 total probes across five systems: ChatGPT, Claude, Gemini, Perplexity, and Grok. Each system was tested with the same 72-prompt library, covering six prompt families:
- unconstrained ranking
- forced comparison ranking
- true exclusion ranking
- category discovery
- specialty winner assignment
- positioning description
Prompts were run as single-turn probes with no conversational carryover. The goal was to measure observed recommendation, ranking, and category-positioning behavior, not to benchmark raw model quality.
This dataset was collected in no-web mode by default to reduce retrieval effects and isolate native model behavior. One exception applies: Perplexity remained web-grounded in the current integration path, so its results should be interpreted as observed product behavior rather than a perfectly symmetric no-web condition.
Responses were normalized in two steps:
- deterministic extraction first for rankings, mentions, winners, and framing signals
- fast LLM adjudication only for ambiguous rows, using ChatGPT-based judge logic rather than Claude
Scores in this view are behavioral composites, not objective quality benchmarks. In particular:
- Strength score reflects peer ranking, visibility, and recommendation presence
- Competitive bias score reflects self-favoring and visibility-advantage signals, and should not be read as a pure measure of manipulation
Limitations
- This is a structured observational study, not a benchmark of true model quality.
- Unconstrained prompts can surface adjacent products or legacy names.
- Perplexity remained web-grounded in practice.
- Some metrics are stricter than they may appear; for example, self-preference metrics usually count only explicit #1 placement.
The point of this study is not that one AI system is objectively “best.” It's that AI systems already shape market perception — and they do it in structured, uneven ways. Some get reinforced as the default. Some get framed as premium alternatives. Some get confined to a specialist lane. Some stay visible, but rarely become the answer.
That is what makes AI analyzing AI useful for GEO. It gives us a clean view into how AI-mediated visibility actually works: not just who appears, but who gets positioned as broad, credible, and worth choosing.
If even the most visible AI products in the world are being defaulted, role-assigned, and niche-boxed by answer engines, every other company should expect the same dynamics. The real question is no longer just whether answer engines know you exist. It's whether they treat you as the default, the contender, the specialist — or leave you out altogether.
Shape how AI positions you
Second Wind helps companies understand and shape how AI systems position them in buyer-facing conversations — from default recommendation status to specialist containment. Get in touch.
