Who's Visiting the Web? A User Agent Distribution Analysis of 388,000 Requests

March 30, 2026 DomainsProject 14 min read

Every HTTP request carries a user agent string — a self-declared identity that tells the server what's asking. Most analyses of user agent data focus on browser market share. We went deeper: we classified 387,960 requests across 8,974 unique user agent strings into 10 categories, then broke the data down by platform, browser family, operating system, and crawler identity.

The headline finding: less than half the traffic comes from human browsers.

The Big Picture: What's Actually Hitting Your Server

Category	Requests	Share
Browser (Desktop)	190,145	49.0%
Crawler/Bot	98,551	25.4%
AI Crawler	58,185	15.0%
Unknown	18,196	4.7%
HTTP Library	13,583	3.5%
Browser (Mobile)	8,752	2.3%
Browser (Tablet)	324	0.1%
Social Media	187	0.0%
Email Client	46	0.0%
Media Player	3	0.0%

Human browser traffic — desktop, mobile, and tablet combined — accounts for 51.4% of all requests. The remaining 48.6% is bots, crawlers, libraries, and unidentifiable agents.

Traditional crawlers (Googlebot, AhrefsBot, SEO tools) make up 25.4%. AI crawlers — GPTBot, ClaudeBot, Bytespider, and their peers — add another 15.0%. Together, automated crawlers account for 40.4% of all traffic. If you're not seeing bots in your analytics, it's because your analytics tool is filtering them out, not because they aren't there — Google Analytics 4 automatically excludes known bot traffic with no option to disable the filter.

This aligns with a broader shift. Imperva's 2025 Bad Bot Report found automated traffic surpassed human activity for the first time, reaching 51% of all web traffic. Cloudflare's CEO predicts bot traffic will exceed human traffic permanently by 2027. The era of assuming "a request equals a user" is over.

Desktop Dominates — But Mobile Is Suspiciously Low

Among human browsers, the platform split is stark:

Platform	Requests	Share of Human Traffic
Desktop	190,145	95.4%
Mobile	8,752	4.4%
Tablet	324	0.2%

Desktop accounts for 95.4% of human browser traffic. This contrasts sharply with global web traffic norms, where mobile represents roughly 60% of all traffic. The explanation is audience composition: LLMSE is a developer-oriented tool. Its users are disproportionately on laptops and workstations, not phones.

The 4.4% mobile share isn't negligible — it represents developers checking results on the go and organic search traffic — but it confirms that technical tools attract a desktop-first audience.

The mobile browser landscape has its own structural problems. Every browser on iOS worldwide (outside the EU) must use Apple's WebKit engine — Chrome on iPhone is essentially a Safari skin. Even in the EU, where the Digital Markets Act mandated alternative engine support, zero browsers have shipped with non-WebKit engines two years after Apple provided the frameworks. Open Web Advocacy attributes this to Apple making alternative engines "financially unviable" through distribution requirements.

Browser Market Share: Chrome's Dominance

Among identified desktop browsers (by request volume):

Browser	Requests	Share
Chrome	135,775	71.4%
Firefox	31,319	16.5%
Safari	13,154	6.9%
CCleaner	2,313	1.2%
Avast Secure Browser	2,298	1.2%
Edge	1,953	1.0%
Apple Mail	583	0.3%
Opera	300	0.2%
Applebot	299	0.2%
HeadlessChrome	296	0.2%
Vivaldi	211	0.1%

Chrome commands 71.4% of desktop traffic — within range of global desktop averages (65-76% depending on the source), consistent with a developer audience that gravitates toward Chrome DevTools. Firefox's 16.5% is roughly triple its global ~6% desktop market share, reflecting the browser's enduring popularity among technical users and privacy-conscious developers.

Safari at 6.9% tracks with macOS usage in the developer community. Edge at just 1.0% is the sharpest deviation from global norms — Microsoft's browser holds ~12% of desktop traffic worldwide, but developers actively avoid it.

The appearance of CCleaner (1.2%) and Avast Secure Browser (1.2%) in the top ranks suggests a segment of security-conscious users running these tools alongside or instead of mainstream browsers.

The broader browser landscape is at a crossroads. 70-79% of browsers now run on Google's Chromium/Blink engine — only three browser engines remain (Blink, WebKit, Gecko), down from five in 2013. Chrome's Manifest V3 transition, which fully disabled MV2 extensions in July 2025, limits ad blockers to 30,000 static rules (most need 300,000+). Adblock Plus lost 7 million users immediately after the switch. Firefox committed to maintaining MV2 indefinitely, and Brave sidesteps the issue entirely with shields built directly into its Chromium fork. Firefox's over-representation in our data (16.5% vs 6% globally) may partly reflect developers migrating away from Chrome over these restrictions.

Privacy-focused browsers are no longer niche: Brave passed 100 million monthly active users in September 2025, with $100M+ in quarterly revenue. The EU's DMA browser choice screens boosted Brave's iOS installs by 50%. Among our audience, Brave (54 unique UAs) and Vivaldi (5 unique UAs) show meaningful adoption despite their small global share.

Mobile Browsers

Browser	Requests	Share
GoogleOther	3,772	43.1%
Mobile Safari	2,594	29.6%
Chrome Mobile	1,287	14.7%
Firefox Mobile	366	4.2%
Chrome Mobile iOS	178	2.0%
Chrome Mobile WebView	96	1.1%
Samsung Internet	78	0.9%
TikTokSpider	78	0.9%

GoogleOther leads mobile traffic at 43.1% — this is Google's supplementary crawler that identifies as a mobile user agent for mobile-first indexing. Excluding it, Mobile Safari (29.6%) and Chrome Mobile (14.7%) dominate actual human mobile traffic, which aligns with iOS's strong presence among developer audiences.

Operating System Distribution

OS	Requests	Share
Windows	147,566	38.0%
Other	123,248	31.8%
Android	62,376	16.1%
Mac OS X	28,715	7.4%
Linux	14,130	3.6%
Ubuntu	4,556	1.2%
iOS	3,089	0.8%
Fedora	1,141	0.3%
Kubuntu	987	0.3%
CentOS	808	0.2%
Debian	788	0.2%
Chrome OS	540	0.1%

Windows leads at 38.0%, but the "Other" category at 31.8% is almost entirely bots and crawlers that don't report a recognizable OS string. Android at 16.1% is inflated by Bytespider and other crawlers that masquerade as Android devices.

The Linux family deserves a closer look. Combining Linux, Ubuntu, Fedora, Kubuntu, CentOS, Debian, FreeBSD, OpenBSD, and Gentoo yields 22,422 requests (5.8%). That's notably above Linux's ~4% global desktop share, again reflecting the developer-heavy audience.

Mac OS X at 7.4% appears understated compared to its ~16% global share among desktop users. This is because the denominator includes all traffic — bots, crawlers, and libraries that inflate the total without contributing macOS user agents.

The AI Crawler Arms Race

AI crawlers generated 58,185 requests from just 15 unique user agent strings. That's an average of 3,879 requests per crawler variant — orders of magnitude more intensive than human browsing.

AI Crawler	Requests	Share
GPTBot (OpenAI)	23,510	40.4%
Bytespider (ByteDance)	10,886	18.7%
ClaudeBot (Anthropic)	9,211	15.8%
Amazonbot	7,562	13.0%
Meta-ExternalAgent (Meta)	6,211	10.7%
OAI-SearchBot (OpenAI)	287	0.5%
CCBot (Common Crawl)	270	0.5%
ChatGPT-User (OpenAI)	155	0.3%
PerplexityBot	89	0.2%

OpenAI is the most aggressive AI crawler operator, with GPTBot alone accounting for 40.4% of all AI crawling. Adding OAI-SearchBot and ChatGPT-User brings OpenAI's total to 41.2% of AI crawler traffic. ByteDance's Bytespider (18.7%) and Anthropic's ClaudeBot (15.8%) follow, with Amazon and Meta rounding out the top five.

Combined, these five companies generate 98.6% of all AI crawler traffic. PerplexityBot, despite Perplexity's growing visibility as an AI search product, represents just 0.2%.

The Cost of AI Crawling

The numbers behind these crawlers tell a story of extraction without reciprocity. Cloudflare's "crawl-to-click gap" analysis found that for every visitor Anthropic refers back to a website, ClaudeBot has already visited 286,930 pages. Perplexity's ratio worsened 257% over six months, climbing from 54 bot visits per human referral to 195. Training now drives 80% of AI bot activity, up from 72% a year prior — these crawlers are taking content, not sending users.

The financial toll is real. One site operator reported $5,000 in bandwidth charges in a single day from AI bot traffic. Read the Docs reduced traffic by 75% and saved $1,500/month simply by blocking AI crawlers. In shared hosting environments, AI bots can consume up to 70% of dynamic resource costs, and unlike traditional crawlers, modern AI bots execute JavaScript and follow deep links — consuming far more server resources per request.

robots.txt: Voluntary Compliance at Best

Five major AI crawlers — Amazonbot, Applebot, CCBot, ClaudeBot, and GPTBot — honor robots.txt directives. Bytespider fetches the file but has been documented ignoring its contents. Yet 79% of top news sites now block AI training bots via robots.txt, making it the primary defense even though compliance is entirely voluntary.

The EU AI Act's Code of Practice may change this calculus: signatories must respect robots.txt to qualify for the statutory "Text and Data Mining" copyright carve-out, giving the file quasi-legal force in the EU for the first time. Meanwhile, Cloudflare changed its default to block all AI crawlers for every new domain in July 2025 and introduced AI Labyrinth — a system that serves AI-generated fake content to crawlers that ignore robots.txt, wasting their resources and poisoning training data.

The Legal Frontier

Over 50 AI-related copyright lawsuits have been filed against major tech companies. A federal judge rejected OpenAI's fair use defense in the NYT v. OpenAI case in early 2025. Anthropic reached a $1.5 billion settlement — the costliest in AI copyright history. OpenAI leads with 18 publisher licensing deals, ranging from $1M to $250M+ annually, while the market shifts from bilateral agreements toward systematic marketplace infrastructure.

Traditional Crawlers: The SEO Industrial Complex

Traditional bots generated 98,551 requests — even more than AI crawlers:

Crawler	Requests	Share
Googlebot	45,879	46.6%
UptimeRobot	21,621	21.9%
AhrefsBot	21,063	21.4%
DotBot (Moz)	2,070	2.1%
MJ12bot (Majestic)	2,009	2.0%
SemrushBot	1,327	1.3%
DataForSeoBot	1,163	1.2%
Claude-SearchBot	675	0.7%
bingbot	640	0.6%

Googlebot at 46.6% is expected — it's the web's primary indexer. But the SEO tool ecosystem is striking: AhrefsBot (21.4%), DotBot/Moz (2.1%), MJ12bot/Majestic (2.0%), SemrushBot (1.3%), and DataForSeoBot (1.2%) collectively account for 28.0% of crawler traffic. These tools crawl the web continuously to build their backlink databases and SEO metrics.

UptimeRobot at 21.9% represents the monitoring category — services that ping URLs every few minutes to check availability.

Bingbot at just 0.6% underscores Microsoft's reduced crawling ambitions compared to Google.

The SEO tool ecosystem behind these bots is massive: the global SEO software market hit $84.9 billion in 2025, projected to reach $295 billion by 2035. Semrush alone crossed $105M in quarterly revenue in Q1 2025, while Ahrefs holds 14.8% of the SEO/SEM market. Every one of these companies operates its own crawler fleet, continuously scanning the web for backlinks, rankings, and competitive intelligence.

Crawl budget competition is intensifying. Google's crawl budget — the rate at which Googlebot can request pages without overloading a server — is a finite resource, and it matters most for sites with 10,000+ pages. As AI bots consume more server resources, some operators report that AI crawler traffic crowds out Googlebot, potentially harming their search visibility. This creates a perverse incentive: blocking AI crawlers may actually improve SEO performance.

Not all Googlebot traffic is real. DataDome detects more than one million hits per day from fake Googlebots across its customer base, and 16.3% of sites suffer from Googlebot impersonation attacks. Legitimate Googlebot can be verified via reverse DNS lookup, but most site operators never check.

The uptime monitoring market — represented here by UptimeRobot's 21.9% share — reached $6.2 billion in 2024, projected to grow to $15.1 billion by 2033. UptimeRobot alone serves 3.2 million users and companies.

HTTP Libraries: The Automation Layer

HTTP libraries account for 13,583 requests from 45 unique agents:

Library	Requests	Share
curl	11,543	85.0%
Python aiohttp	816	6.0%
Go-http-client	548	4.0%
axios	164	1.2%
Python Requests	136	1.0%
okhttp	116	0.9%

curl dominates at 85.0%, which makes sense for a tool whose primary interface is an API. Developers testing endpoints from the command line generate the bulk of library traffic. Python's aiohttp and Go's standard HTTP client follow — both common in backend automation and monitoring scripts.

curl's dominance reflects the tool's extraordinary ubiquity: an estimated 50 billion curl installations exist worldwide (roughly 16.5 per household), serving 65 billion requests monthly. It's embedded in everything from smartphones to Tesla dashboards to BMW infotainment systems, used by Google, Apple, Sony, and 47 car brands. As curl creator Daniel Stenberg notes, it's "used by millions, recognized by none" — arguably the most widely deployed open-source software after the Linux kernel.

The broader trend is the API-first web. Over 50% of all Cloudflare traffic is now API-based, growing twice as fast as traditional page loads. AI-related API traffic on Postman grew 73% in the past year alone. HTTP libraries are no longer a niche traffic source — they represent how machines talk to the web.

For site operators, distinguishing legitimate library traffic from malicious automation is increasingly difficult. Traditional WAF methods — IP rate limiting, signature-based detection — are being defeated by anti-detect frameworks that forge HTTP headers and TLS fingerprints. TLS fingerprinting (JA3/JA4) offers a key signal: each HTTP library produces a distinctive TLS handshake signature, so a request claiming to be Chrome but with a Python-requests TLS fingerprint is immediately flagged.

Social Media Crawlers: The Preview Tax

Social media bots — FacebookExternalHit, Twitterbot, LinkedInBot — generated just 187 requests (0.05%), but they punch above their weight in operational impact. Every time a URL is shared on a social platform, these crawlers fetch the page to extract Open Graph metadata (og:title, og:description, og:image) for link preview cards. Blocking them means your shared links show bare URLs instead of rich previews — punishing your own content distribution.

The line between "social media bot" and "AI crawler" is blurring. Meta-ExternalAgent — distinct from FacebookExternalHit — surged to 19% of AI crawler traffic in 2025, indicating Meta is crawling far beyond what link previews require, likely for AI training. In our data, we classified Meta-ExternalAgent under AI Crawlers (10.7% share), reflecting its actual behavior rather than its social media origins.

The Unknown: Web Traffic's Dark Matter

The 18,196 requests (4.7%) classified as "Unknown" come from user agents we couldn't match to any known browser, bot, or library. This 4.7% aligns with Cloudflare's finding that ~6% of web traffic carries entirely unknown user agents.

What's hiding in this category? Custom scripts with default or empty UA strings. IoT devices — 21.1 billion are connected as of 2025, most running minimal HTTP clients with non-standard user agents. Malware and botnets: the DoJ disrupted a 3-million-device IoT botnet in early 2026, comprising compromised DVRs, webcams, and routers that generated traffic with unidentifiable user agents. And increasingly, AI agents that deliberately use real Chromium browsers to blend in with human traffic — making them indistinguishable from Chrome in server logs.

The identity problem is getting worse. User agent strings were designed for a simpler era, and as agentic AI traffic grew 7,851% in 2025, the gap between verified and actual AI traffic is a growing blind spot for analytics and security teams.

What This Means

Four takeaways from this data:

1. The web is half-bot — and that's the new normal. At 48.6% non-human traffic, our data mirrors the global trend. Server capacity, rate limiting, and caching strategies must account for the bot half. Analytics that don't filter bots are inflated by roughly 2x.

2. AI crawlers are an extraction economy. At 15% of our traffic from just 15 unique user agents, AI crawlers take content at industrial scale while sending almost nothing back. The crawl-to-click gap (286,930 bot visits per human referral for Anthropic) reveals the fundamental asymmetry. With 50+ lawsuits filed, $1.5 billion in settlements, and Cloudflare blocking AI crawlers by default, the web is fighting back — but the economic model remains unresolved.

3. Browser diversity is eroding. Three rendering engines remain. Chrome's Manifest V3 restricts ad blockers while Google runs the largest ad network. Firefox's over-representation in developer audiences (16.5% vs 6% globally) suggests technical users are already voting with their feet, but the Chromium monoculture deepens for everyone else.

4. User agents are becoming unreliable. With AI agents using real Chromium browsers, IoT botnets generating unidentifiable traffic, and 5.7% of AI crawler traffic being spoofed, the user agent string — designed decades ago — is increasingly inadequate for traffic classification. The 4.7% "Unknown" category in our data is likely a floor, not a ceiling.

Data collected by LLMSE from organic traffic and API submissions. User agent parsing powered by the user-agents Python library. See our User Agent database for the full dataset.