Study

What AI Systems Ranking Each Other Reveals About GEO

Name: AI Cross-Model Visibility Study 2026
Creator: Second Wind
Published: 2026-03-07

How ChatGPT, Claude, Gemini, Perplexity, and Grok rank, frame, and recommend each other — and what that reveals about default status, specialist containment, and GEO.

Marty Coleman

CEO, Second Wind

March 7, 2026

AI systems don't just compare players in a market. They assign roles.

ChatGPT

Default Generalist

Claude

Serious Reasoning Model

Gemini

Ecosystem Model

Perplexity

Research Specialist

Grok

Realtime / Social Specialist

Probes

360

Studied AI systems

Strongest default signal

ChatGPT 68.8%

Significance

Why this study matters

We ran a 360-probe study asking ChatGPT, Claude, Gemini, Perplexity, and Grok to rank, recommend, and describe each other.

At first glance, that sounds like a curiosity about AI companies comparing themselves. It's more useful than that.

The value of this study is that it gives a compressed view of how AI-mediated visibility works. These systems do not just compare players in a market. They assign roles: default, premium, specialist, niche. And those roles influence who gets surfaced broadly versus narrowly.

That is why this matters for GEO. In answer engines, the bottleneck is often not whether you appear at all. It is whether you are treated as the default, a contender, a specialist, or something too narrow to be broadly recommended.

Executive summary

Three findings matter most

The biggest pattern in this study was not self-promotion. It was role assignment. AI systems repeatedly cast each other into stable market positions — default, premium, specialist, or niche — and those positions shape who gets surfaced broadly versus narrowly.

1. ChatGPT wins because rivals reinforce it as the default

Strength 90.1; peer first-position rate 68.8%, peer general mention rate 88.5%, discovery dominance 88.1. Self vs peer rank delta +0.3 — its self-story and ecosystem story are unusually aligned.

2. Claude shows that premium framing and default status are different things

Second on strength at 76.3; unconstrained self preference 37.5 vs forced comparison 100; only 18.8% peer first-position rate. Dominates the “serious reasoning” role but does not own the broad recommendation layer.

3. The cleanest GEO lesson is niche-boxing

Perplexity: 39.6% peer general mention vs 100% specialty, 55.8% niche-boxing rate. Grok: 13.5% general, 100% specialty, 78.7% niche-boxing. Being strongly associated with one lane can reduce broad discovery visibility.

Metric definitions

What the stats in this report mean

Technical names in the charts and findings are defined below. All rates and scores come from our cross-model probe run (360 probes, 5 systems). First, how we probed.

Three kinds of ranking prompts

Unconstrained. The model is asked for a general ranking or recommendation without being told which platforms to compare — e.g. “Rank the best all-around general AI chatbots for business use.” Open-ended; no fixed set of names.
Forced comparison. The model is explicitly told to compare the same set of study platforms — e.g. “Rank ChatGPT, Claude, Gemini, Perplexity, and Grok from best to worst for a general business user.” A fixed five-way comparison including itself.
True exclusion. The answering platform is removed from the list; the model is asked to rank only the other four — e.g. when probing ChatGPT: “Excluding ChatGPT, rank Claude, Gemini, Perplexity, and Grok for general business use.” This measures how the model redistributes preference when it can't choose itself, and whether it hedges, suppresses, or reintroduces itself anyway.

Metrics

Peer first-position rate: % of other AIs' unconstrained ranking responses where this platform was ranked #1.
Peer general mention rate: % of other AIs' unconstrained ranking and category-discovery responses where this platform was included at all.
Discovery dominance: A composite of this platform's mention rate, first-mention rate, and inverse of peer inclusion in discovery-style prompts — i.e. how it shows up when users ask for general discovery, not a single “win share.”
Unconstrained self preference: % of this platform's own unconstrained ranking responses where it explicitly ranked itself #1.
Forced comparison self preference: % of this platform's forced-comparison responses where it ranked itself #1.
Specialty dominance: % of this platform's own specialty prompts (e.g. “best for research”) where it chose itself as the winner — self-specialty-winner rate, not overall wins in specialty.
Niche-boxing rate: Current approximation of how often a platform appears in specialty contexts relative to broad contexts; methodology for this metric is still being refined.
Strength score: Composite (0–100) of peer unconstrained standing, peer general mention rate, and peer specialty mention rate — i.e. peer ranking and visibility signals, not direct positioning.
Competitive bias score: Composite of self-promotion, discovery dominance, specialty self-selection, exclusion behavior, and framing asymmetry — competitive self-advantage in how the platform is ranked and framed (not political or moral bias).
Self vs peer rank delta: Difference between peer average unconstrained rank and this platform's average unconstrained rank when ranking itself. Positive means the platform ranks itself better than peers rank it.

Finding 1

ChatGPT has the strongest default signal in the entire run

The most important ChatGPT metric is not its rank. It's how often others validate it.

Strength: 90.1
Competitive bias score: 59.4
Peer first-position rate: 68.8%
Peer general mention rate: 88.5%
Discovery dominance: 88.1
Self vs peer rank delta: +0.3

In Gemini's buyer-shortlist prompt, ChatGPT was described as the “general-purpose workhorse” and “often the default or primary AI for diverse text-based needs.” Claude's market-position framing called it “the mainstream default” and “the one everyone tries first.” That is a useful GEO lesson: the strongest market positions are not only asserted by the brand itself. They get repeated by the ecosystem.

Finding 2

Claude is the clearest case of premium framing without broad default status

Claude is the most interesting platform in the dataset after ChatGPT because its pattern is so split.

Strength: 76.3
Competitive bias score: 56.2
Peer first-position rate: 18.8%
Unconstrained self preference: 37.5
Forced comparison self preference: 100
Dominant peer role: Serious Reasoning Model
Role consistency: 54.1%

Claude becomes maximally self-assertive when directly forced into a head-to-head, but the broader peer ecosystem still doesn't treat it as the market default. Instead, it gets cast into a premium role: deep reasoning, long documents, careful writing, nuance, policy, safety. Premium framing and default status are not the same thing.

Claude split behavior

Unconstrained = when Claude was asked who it prefers without being forced to compare (37.5% chose itself). Forced comparison = when Claude had to rank itself vs others (100% ranked itself first). Peer first-position = how often other AIs put Claude #1. Specialty dominance = how often Claude wins in specialty prompts.

Unconstrained self preference37.5

Forced comparison self preference100

Peer first-position rate18.8

Specialty dominance58.3

Full metric definitions.

Finding 3

Gemini is highly visible, but almost never the top recommendation

Strength: 67.2
Competitive bias score: 18.8
Peer general mention rate: 76%
Peer first-position rate: 0%

Gemini has distribution, presence, and ecosystem gravity — but not default status. Striking pattern: it's in the conversation almost everywhere, but almost never at the top of it. Discovery dominance 56.3. Its role is unusually stable: the Google-integrated, multimodal, distribution-heavy contender — not the default winner.

High inclusion does not automatically produce top-position status. You can be visible and still fail to become the default answer.

Finding 4

Perplexity is the cleanest example of specialist trapping

Strength: 50.7
Competitive bias score: 20.2
Peer general mention rate: 39.6%
Niche-boxing rate: 55.8%

Not weakness — containment. 242 research-specialist role assignments. Recognized almost everywhere a citations, research, or fact-grounding frame appears, but much less often as the broad default.

A platform can have very high recognition inside a specialty and still remain boxed out of general discovery. In GEO terms, that's the difference between owning a lane and escaping a lane.

Finding 5

Grok is the most niche-boxed system in the run

Strength: 40.1
Competitive bias score: 9.8
Peer general mention rate: 13.5%
Niche-boxing rate: 78.7%

Not invisibility — severe containment. 139 realtime/social role assignments. Framed around real-time information from X, humor, sarcasm, unconventional conversation — the realtime/social niche. Distinctiveness by itself is not enough if the answer ecosystem only sees you as situational.

Niche-boxing rate

% of the time this platform is recommended only in narrow or specialty contexts (e.g. “best for research”), not in broad “best AI” prompts. Higher = more contained to a lane.

Grok

78.7%

Perplexity

55.8%

Gemini

39.7%

Claude

39%

ChatGPT

36.1%

Metric definitions.

Finding 6

Role assignment is more stable than rank order

The exact rank order shifts. The roles shift much less. The category map keeps recurring: ChatGPT → mainstream default; Claude → careful, nuanced, reasoning-heavy; Gemini → multimodal, Google-integrated ecosystem; Perplexity → research and citation engine; Grok → realtime/social niche. Each system had secondary role spillover, but one role clearly dominated for each. The important question isn't just “did I get ranked?” It's “what role keeps getting assigned to me?”

Role assignment matrix (centerpiece)

Each cell = count of times other AIs assigned this platform to this role across probes. Darker = more often cast in that role.

One role dominated per platform; each system also received secondary assignments to other roles throughout the run.

Role assignment counts by platform and role type
Platform	Default Generalist	Serious Reasoning	Ecosystem	Research Specialist	Realtime Social
ChatGPT	199	139	81	103	34
Claude	51	262	40	95	36
Gemini	37	100	281	115	47
Perplexity	20	32	20	242	42
Grok	39	72	32	68	139

What this says about GEO

This study is useful because the same mechanics show up in company discovery

1.Default status compounds — Once a system becomes the safe default, it appears first more often, gets reinforced by rivals, and becomes harder to displace later.
2.Specialist clarity can become specialist containment — Owning one strong association can help in narrow prompts while reducing inclusion in broader buyer discovery prompts.
3.Premium framing is not enough — Being treated as thoughtful, high-quality, or safer does not automatically produce default recommendation power; positioning alone does not get you to #1 in broad recommendations.
4.Visibility is not just presence. It is placement — Strong mention rates with zero peer first-position rate mean you can be visible without being the answer.

GEO is not just about whether AI systems know you exist. It is about whether they treat you as a default, a contender, a specialist, or a niche player.

Methodology

How we ran the study

This run used 360 total probes across five systems: ChatGPT, Claude, Gemini, Perplexity, and Grok. Each system was tested with the same 72-prompt library, covering six prompt families:

unconstrained ranking
forced comparison ranking
true exclusion ranking
category discovery
specialty winner assignment
positioning description

Prompts were run as single-turn probes with no conversational carryover. The goal was to measure observed recommendation, ranking, and category-positioning behavior, not to benchmark raw model quality.

This dataset was collected in no-web mode by default to reduce retrieval effects and isolate native model behavior. One exception applies: Perplexity remained web-grounded in the current integration path, so its results should be interpreted as observed product behavior rather than a perfectly symmetric no-web condition.

Responses were normalized in two steps:

deterministic extraction first for rankings, mentions, winners, and framing signals
fast LLM adjudication only for ambiguous rows, using ChatGPT-based judge logic rather than Claude

Scores in this view are behavioral composites, not objective quality benchmarks. In particular:

Strength score reflects peer ranking, visibility, and recommendation presence
Competitive bias score reflects self-favoring and visibility-advantage signals, and should not be read as a pure measure of manipulation

Limitations

This is a structured observational study, not a benchmark of true model quality.
Unconstrained prompts can surface adjacent products or legacy names.
Perplexity remained web-grounded in practice.
Some metrics are stricter than they may appear; for example, self-preference metrics usually count only explicit #1 placement.

The point of this study is not that one AI system is objectively “best.” It's that AI systems already shape market perception — and they do it in structured, uneven ways. Some get reinforced as the default. Some get framed as premium alternatives. Some get confined to a specialist lane. Some stay visible, but rarely become the answer.

That is what makes AI analyzing AI useful for GEO. It gives us a clean view into how AI-mediated visibility actually works: not just who appears, but who gets positioned as broad, credible, and worth choosing.

If even the most visible AI products in the world are being defaulted, role-assigned, and niche-boxed by answer engines, every other company should expect the same dynamics. The real question is no longer just whether answer engines know you exist. It's whether they treat you as the default, the contender, the specialist — or leave you out altogether.

Work with us

Shape how AI positions you

Second Wind helps companies understand and shape how AI systems position them in buyer-facing conversations — from default recommendation status to specialist containment. Get in touch.

More from us

Article

Recommendation Is Becoming a Security Layer

Commercial decisions are now being made inside systems users don't control or fully see. That creates a new attack surface - and existing transparency mechanisms aren't enough.

Read Article

AI Isn't Search. It's a New Buyer.

AI assistants behave like analysts and buyers—not search indexes. Why visibility and citations aren't enough, how evaluation happens before your website, and how Second Wind helps you win recommendations when models compare options.

Read Article

ChatGPT Just Changed Product Discovery. Most Companies Still Aren't Built for It.

ChatGPT shopping and Google's Universal Commerce Protocol signal the same shift: AI is becoming the discovery and comparison layer upstream of conversion—and most companies still lack the machine-legible selection infrastructure models need to recommend them.

Read