X Didn't Open-Source the Algorithm.
They Open-Sourced the Rulebook.

A pattern read on the May 2026 release

Published May 15, 2026

The thesis: X published the part of the algorithm that shapes creator behavior — named actions, safety classifiers, diversity decay — and withheld every part that would let anyone audit the feed: weights, retrieval source ratios, account-tier modifiers, topic boosts, classifier prompts, the production model. Creators read it and adjust what they post. Auditors read it and learn nothing they can verify.

A spotlit rulebook on a stage; behind a velvet curtain, a locked safe in shadow.

The Evidence: What's In, What's Out

Anyone reading the repo can verify this list. We've cross-referenced every claim against the technical breakdown.

An official document on a wooden desk; the left page is lit and shows visible text, the right page is covered with heavy black redaction bars.

Published — the Citizenship Code

Everything that tells creators what to do or not do is named explicitly:

22 named actions with their own score fields — a published target list. Creators learn what to optimize for.
The negative signals — report, not_dwelled, block_author, mute_author, not_interested — prominently named, with their own slots in the prediction proto.
Author-diversity decay — the formula that limits how many posts from one author you'll see. Justifies its own existence as a fairness measure.
OON downweight branches — how out-of-network posts are penalized in three flavors (default, topic-aware, new-user). Tells creators to grow followers.
PostSafetyScreenDeluxe + SafetyPtos — the Grok-powered classifiers. Named. Their existence is announced.
BangerInitialScreen — Grok viral-quality screener. Tells creators there is an AI quality bar before posts even enter the feed.
47 Python files of Grox classifiers — the wrapper, the plan/task/scheduler structure, the entire flow.

Net effect: a creator reading this learns exactly what behaviors are blessed and what behaviors are shamed.

Withheld — Everything Discretionary

Every value an auditor would need to verify the system is missing:

Every weight value. The entire params.rs file is absent. Without it, none of the 22 named actions has a magnitude. The system cannot be run.
The real production model. What ships is a 4-layer / 128-dim toy trained on a Sports-only 537K-post corpus. Not what serves your timeline.
Retrieval source ratios. What percentage of your candidate pool comes from Following vs In-Network vs Out-of-Network? The structure is named. The mix is not disclosed.
Account-tier modifiers. Does Premium get a boost? Does verified? Do Musk-tier accounts get explicit weight? Researchers have produced evidence yes. The code does not say.
Topic-level boosts and suppressions. TopicOonWeightFactor exists as a knob. The actual topic lists and their values do not ship.
Geo and political content rules. No file in the repo addresses jurisdiction-specific visibility.
VFFilter rule contents. Visibility Filtering is named as a stage. Its rules — what gets shadow-suppressed and why — are opaque.
Grox classifier prompts and labels. The wrapper without the brain. We can see that Grok screens for "viral quality." We cannot see what Grok was told to consider viral.
Reporter-reputation logic. Whether a serial reporter's signal decays. Whether brigading works.
All xai_* Rust crates. The whole dependency tree. No Cargo.toml exists. You cannot build it.
Grox internals. grox.config, grox.lm, grox.prompts, grok_sampler, monitor, strato_http — all referenced, none included.

Net effect: nothing about the actual behavior of the feed can be independently verified.

Why the Pattern Can't Be Accidental

The kindest read on what's missing is engineering pragmatism: contracts with ad partners, employee privacy in tracked configs, GPU-bound model weights too large to host. All real constraints. None of them explain the clustering of the omissions.

Every omission is in the same direction

The published surface tells creators what to do. The withheld surface contains every value that would let someone check whether the system actually does what it says. If the cuts were random — some weights here, some retrieval ratios there, a few classifier prompts elsewhere — pragmatism would be a plausible story. The cuts are not random. They are categorical.

Every weight is missing. Every list is missing. Every tier modifier is missing. Every classifier prompt is missing. The retrieval mix is missing. The pattern reads like a redaction policy, not a build artifact.

The behavior-shaping parts are more prominent than they need to be

The 22 actions don't have to be named at this level of granularity to make the code work. The score fields could be opaque indices. They are not — they are favorite_score, reply_score, report_score, not_dwelled_score. The names are doing work. The names are the announcement. They tell creators what counts even when the weight is hidden.

Naming the signals is a publication choice. It teaches creators what the system measures. It does not enable anyone to verify how heavily each signal is weighted.

The README still describes the old architecture

Four months after the rewrite, the top-level README still describes the three-scorer system from January. The repo's outermost layer of documentation has not been updated. The architectural changes — RankingScorer, VMRanker, BlenderSelector, Grox — are discoverable only by reading the source. The release ships, the announcement happens, the README stays stale.

A real engineering effort to make something runnable updates its README. A messaging effort updates the announcement and lets the code drift.

Nothing in the repo can run

No Cargo.toml. No internal Rust crates. No Grox internals. The mini Phoenix model is a sports-only toy. A third party cannot stand up a comparable system from what was published. That is not a side effect — it is the entire integration surface, withheld.

If the goal were external reproducibility, the dependency tree would be the first thing in the repo. It isn't there at all.

What the Release Actually Does

Read as a transparency document, the release fails — it conceals the values that determine behavior. Read as a behavior-modification document, it succeeds beautifully. Three effects, all working as designed:

1. It trains creators

Every creator who reads "not_dwelled is now a penalty" rewrites their hook. Every creator who reads "cont_click_dwell_time" starts front-loading payoff. Every creator who reads "report is a scored action" gets quieter about spicy takes. The publication of the action list is a coordination signal for the entire creator economy. Within a week, posting patterns will shift in the direction the named signals point.

2. It manufactures legitimacy

"The only open-source social media algorithm" is a brand. The fact that the repo doesn't build, the model is a toy, and every weight is missing does not impair the brand — the brand needs only the gesture of openness, not the substance. Most coverage will report the release on its terms. Few outlets will note that nothing in it can be reproduced.

3. It defuses oversight pressure

Regulators and researchers asking "what's in the algorithm" can be pointed at GitHub. Anyone trying to actually answer the question discovers the values are missing — but that discovery happens slowly, in technical write-ups, after the press cycle has moved on. The release pays the political cost of transparency at the engineering cost of nothing verifiable.

The release modifies creator behavior. It does not enable third-party audit. Those are not the same thing. A document that does the first and not the second is doing public relations dressed as engineering.

The Honest Caveats

We can't read minds

We have the code. We don't have the meetings. The pattern of omissions is consistent with deliberate curation, and also consistent with a half-finished engineering effort and a marketing department that pushed the publish button anyway. We can't prove intent.

What we can prove is the pattern itself. The pattern is the story regardless of who designed it.

Some omissions have real reasons

Production model weights are huge and GPU-bound — releasing them is a real cost. Ad-brand-safety lists may be under contract. Employee names in config files raise privacy concerns. None of this is fake. But none of it explains why every weight, every list, every tier modifier, and every classifier prompt is missing together.

An honest release would publish what could be published and document what couldn't. This release neither publishes the values nor explains their absence.

There is still more here than competitors release

Meta, TikTok, YouTube publish nothing comparable. The xai-org repo, even in its curated form, is the most code anyone has shipped from a major recommendation system. That deserves acknowledgment.

More than competitors is a low bar. The question is whether this clears the bar of "open" as it is generally understood. By the standard of "can a third party verify the system behaves as described" — no, it does not.

What Real Transparency Would Look Like

The shape of the gap is easy to describe. A release that actually enabled audit would include:

The weights, with confidence intervals or version notes

If report_score has a coefficient of -47.3 in production, that number could be published. If it varies by experiment, the range could be published. The current release names the signal and hides the magnitude — making serious analysis impossible while appearing to invite it.

Retrieval source ratios

Of the candidates that enter ranking, what fraction came from Following, In-Network, and Out-of-Network sources? This single ratio determines how much of your feed is shaped by people you chose vs. people the algorithm chose for you. It is not in the code.

Account-tier modifiers

Premium boost? Verified boost? Special handling for high-follower accounts? For employee accounts? For ownership-tier accounts? Researchers have documented the existence of differential treatment with empirical methods. The code does not acknowledge it.

Sample classifier outputs

Even without releasing prompts, X could publish hashes of representative inputs alongside the classifier's outputs — letting external researchers run their own posts through the same model and check for consistency. Currently the classifier exists only as a wrapper around a missing brain.

Reporter-reputation logic and brigading defenses

Does a serial reporter's signal decay? Are coordinated report patterns detected and discounted? These are real ML problems with real published literature. X's approach is invisible. Whether the platform is brigadable is the question — and the question cannot be answered from this release.

An honest changelog of what was removed and why

A file titled WITHHELD.md listing every withheld category and the reason — legal, contractual, privacy, size, security — would convert the current ambiguity into accountable decisions. Right now the absences are silent.

The Bottom Line

X did not open-source the algorithm. They open-sourced the rulebook for creators. Those are different artifacts. The first would let outsiders verify whether the feed is fair. The second tells creators how to win inside a system whose fairness cannot be checked.

The release is not nothing. The architectural picture is real and useful — we wrote two pages about it. But calling this "the algorithm" is a category error that benefits the publisher and disadvantages everyone trying to think clearly about how the platform shapes public conversation.

If you read the release and adjust what you post, the release has worked on you. The right reading is: study the rulebook to compete, and remain skeptical about whether the unpublished half of the system rewards what the published half claims to.

Read the Underlying Analysis

The two companion pages cover the architecture in detail — every claim above is sourced from them.

The Creator Playbook → The Technical Deep Dive →