e414c17) is a 187-file rewrite — 18,263 line additions, 926 deletions. The three independent scorers (weighted, author-diversity, OON) have been collapsed into a single RankingScorer, with a new optional second-stage VMRanker (gRPC). A new BlenderSelector merges scored posts with ads, prompts, and Who-To-Follow modules into the final feed. A new Python module grox/ ships Grok-powered viral, safety, and spam classifiers. Per-post action prediction expanded from 19 to 22 signals — including a new negative not_dwelled penalty.
The previous weighted_scorer.rs, author_diversity_scorer.rs, and oon_scorer.rs are deleted. A single RankingScorer now applies the weighted sum, score normalization, author diversity decay, and OON downweighting in one pass. Diversity is now computed against the full sorted batch (not just sequential pairs), and the OON factor branches into three values: topic-mode (TopicOonWeightFactor), new-user (NEW_USER_OON_WEIGHT_FACTOR), and default (OonWeightFactor).
A new VMRanker scorer (feature-flagged via EnableVMRanker) sends the entire scored candidate set to an external gRPC service (xai_vm_ranker_proto) and replaces the score with the value-model output. Includes DPP (Determinantal Point Process) parameters: VMRankerDppTheta, VMRankerDppMaxSelectedRank. Value model selected via VMRankerValueModelId. The actual value-model implementation is NOT in the published code — only the client call site is visible.
The single pipeline has been split into PhoenixCandidatePipeline (inner — produces scored posts) and ForYouCandidatePipeline (outer — assembles the final feed). The outer pipeline pulls scored posts from the inner one via ScoredPostsSource and combines them with ads, prompts, Who-To-Follow modules, and Push-To-Home posts.
Replaces the simple TopKScoreSelector for the For-You pipeline. Partitions candidates by item type (post / ad / WTF / prompt / push-to-home), then runs one of two ad-blending strategies (SafeGapAdsBlender or PartitionOrganicAdsBlender, selected via the AdsBlenderType param). Prompts insert at the front; WTF modules at WHO_TO_FOLLOW_POSITION; push-to-home pins at position 0.
The Phoenix scorer struct (PhoenixScores) now carries 22 fields, up from 19. New: quoted_vqv_score (video quality view inside a quote), click_dwell_time (continuous duration after click), and not_dwelled_score (probability the user scrolled past). The latter is a new negative signal with a corresponding NotDwelledWeight in the params. Several previously aggregated signals (share_via_dm, share_via_copy_link, quoted_click) are now first-class scored actions in the ranking_scorer formula.
An entirely new top-level Python package grox/ ships 47 new files. Includes Grok-based classifiers for viral quality (BangerInitialScreen), comprehensive safety (PostSafetyScreenDeluxe, SafetyPtos), spam (SpamEapiLowFollowerClassifier), reply ranking (ReplyScorer), multimodal post embedding (v2 and v5 embedders), ASR for audio/video, and a plan/task/scheduler framework. Grox imports grox.config, grox.lm, grox.prompts, grok_sampler, monitor, and strato_http — none of which are in the repo. Grox cannot be run as published.
In addition to the original Thunder and Phoenix sources: ads_source, cached_posts_source, phoenix_moe_source (Mixture-of-Experts retrieval), phoenix_topics_source, prompts_source, push_to_home_source, scored_posts_source, tweet_mixer_source, who_to_follow_source.
The query is now hydrated with substantially more context: blocked/muted/followed/subscribed user IDs, cached posts, followed and inferred Grok topics, followed starter packs, impressed posts, an impression bloom filter, IP, mutual-follow lists, past-request timestamps, retrieval/scoring sequences, served history, user demographics, and an inferred-gender feature (UserInferredGender).
Logging and state-mutation now broken out into discrete side effects: ads-injection logging, client-events Kafka, For-You response stats, mutual-follow stats, Phoenix experiments, Phoenix request cache, publish-seen-ids Kafka, Redis post-candidate cache, reranking Kafka, scored-stats, served-candidates Kafka, truncate-served-history, update-past-request-timestamps, update-served-history.
New filters: topic_ids_filter.rs (571 lines — major topic filtering logic), new_user_topic_ids_filter.rs, previously_seen_posts_backup_filter.rs, ancillary_vf_filter.rs, video_filter.rs. Many existing filters were also significantly modified.
Including ads brand-safety (two variants), blocked_by, engagement_counts, filtered_topics, following_replied_users, has_media, language_code, mutual_follow_jaccard (Jaccard similarity over follow graphs), quote_hydrator, tweet_type_metrics.
A new phoenix/artifacts/oss-phoenix-artifacts.zip Git LFS pointer (3.1 GB) ships actual trained weights: retrieval transformer + candidate tower (~3 MB), 1M-entry hash embedding tables (~1.4 GB each for retrieval and ranker), a 537K-post sports-only retrieval corpus, and a config. This is a "mini" 4-layer / 128-dim model trained on real engagement data — explicitly NOT the production model, which is larger and trains continuously.
The For You feed is now a two-pipeline system. The inner Phoenix pipeline produces ranked posts. The outer ForYou pipeline assembles the final feed by mixing posts with ads, prompts, and other modules. Retrieval is multi-source (Thunder + Phoenix-MoE + Phoenix-Topics + Tweet Mixer + Push-To-Home + cached posts).
Eleven sources feed the candidate pool: Thunder (in-network), Phoenix (out-of-network), Phoenix-MoE (Mixture-of-Experts retrieval), Phoenix-Topics (topic-based retrieval over followed/inferred Grok topics), Tweet Mixer, Cached Posts, Push-To-Home, plus Ads, Who-To-Follow, and Prompts which join later. Sources: home-mixer/sources/.
Candidate hydrators enrich posts (engagement counts, language, media, quote ancestry, brand safety, etc.). Then the pre-scoring filter stack removes duplicates, old posts, blocked/muted, seen/served, paywalled, and posts failing core-data hydration.
The PhoenixScorer calls the Phoenix prediction service (gRPC). New-user requests can be routed to a separate cluster via PhoenixRankerNewUserInferenceClusterId when the user's action-sequence length is below PhoenixRankerNewUserHistoryThreshold. Returns 22 per-action probabilities + 2 continuous dwell signals per candidate.
Single scorer applies weighted sum, score normalization, author diversity decay, and OON downweight in one pass. Replaces three separate scorers from the previous release.
If EnableVMRanker is set, the scored candidate set is shipped via gRPC to a value-model service (VMRankerClient). The service applies a separate value-model + optional DPP-based diversity reranking and returns new scores. The proto/service implementation is NOT in the open-source release.
Partitions all candidates by type, runs the configured ads blender (safe_gap or partition_organic) to interleave ads, then inserts prompts (at the front), Who-To-Follow modules (at WHO_TO_FOLLOW_POSITION), and pins any Push-To-Home post at position 0.
VFFilter (safety) and DedupConversationFilter run after selection. Then 14 side effects fire: Kafka publishes (client events, served candidates, seen IDs, reranking), Redis caching, served-history updates, response stats, Phoenix experiments logging, and timestamp/history maintenance.
Two scorers run in sequence. The first (PhoenixScorer) populates raw per-action probabilities. The second (RankingScorer) collapses them to a single score with diversity + OON adjustments. An optional third scorer (VMRanker) can replace the score with a value-model output.
Calls the Phoenix prediction service (gRPC) and populates PhoenixScores on each candidate. Routes new users (action-sequence shorter than PhoenixRankerNewUserHistoryThreshold) to a separate inference cluster. An egress sidecar can be enabled via UseEgressSidecar with automatic fallback to the primary client on failure. Product surface routes between HomeTimelineRankedFollowing (in-network-only mode) and HomeTimelineRanking (full).
Replaces the three previous scorers. Applies all four operations on each candidate in one pass:
Score offset normalization (negative branch):
When combined_score ≥ 0: offset_score = combined_score + NEGATIVE_SCORES_OFFSET. Fallback: if total_sum == 0, max(combined_score, 0.0).
After weighting, each candidate's score is normalized via util::score_normalizer::normalize_score(c, raw) (implementation not in open source).
Author diversity then applies geometric decay over rank position per author:
Candidates are first sorted by weighted score descending; each author's nth appearance in that ordering gets decayn. Finally, out-of-network candidates are downweighted:
Three branches for effective_oon_weight:
query.topic_ids non-empty: TopicOonWeightFactorNewUserAgeThresholdSecs AND followed count ≥ NEW_USER_MIN_FOLLOWING): NEW_USER_OON_WEIGHT_FACTOROonWeightFactorparams.rs, which is NOT in the open-source release.Feature-flagged via EnableVMRanker. When enabled, builds a RankRequest with all PhoenixScores + candidate metadata (tweet_id, author_id, in_network, is_retweet, is_reply, author_followers_count, vqv_ineligible, retweeted_tweet_id, current ranking_scorer score) and ships it via gRPC to VMRankerCluster resolved from VMRankerClusterId. The returned score replaces the ranking_scorer score; on missing entries the previous score is preserved. Includes optional DPP parameters: if either VMRankerDppTheta > 0 or VMRankerDppMaxSelectedRank > 0, sends DppParams. Value-model selection via VMRankerValueModelId. The remote service implementation is NOT in the open-source release — the DPP math, value-model architecture, and reranking logic all live behind the gRPC boundary.
These are the actions the ranking model predicts. Each has a probability (or continuous value) and a weight from params.rs. Three are new in this release.
| Action | Proto / Field | Type | Description |
|---|---|---|---|
| favorite | favorite_score | + | User liked the post |
| reply | reply_score | + | User replied to the post |
| retweet | retweet_score | + | User reposted |
| photo_expand | photo_expand_score | + | User expanded an image |
| click | click_score | + | User clicked into thread/media |
| profile_click | profile_click_score | + | User visited author's profile |
| vqv | vqv_score | +* | Video quality view (conditional on video_duration_ms > MinVideoDurationMs via util::candidates_util::vqv_weight) |
| share | share_score | + | User shared the post |
| share_via_dm | share_via_dm_score | + | Sent via direct message |
| share_via_copy_link | share_via_copy_link_score | + | Copied link to clipboard |
| dwell | dwell_score | + | User dwelled on post (boolean threshold) |
| quote | quote_score | + | User quoted the post |
| quoted_click | quoted_click_score | + | User clicked into a quoted post |
| quoted_vqv NEW | quoted_vqv_score | +* | Video quality view inside a quote post (conditional on duration via quoted_vqv_weight, gated by EnableQuotedVqvDurationCheck) |
| cont_dwell_time | dwell_time (f64 seconds) | + (continuous) | Continuous dwell duration. Scales linearly with watch/read time, unlike binary signals. |
| cont_click_dwell_time NEW | click_dwell_time (f64 seconds) | + (continuous) | Continuous dwell duration after a click (e.g., time in thread or external article). Brand new continuous signal in this release. |
| follow_author | follow_author_score | + | User followed the author from the post |
| not_interested | not_interested_score | − | User marked "not interested" |
| block_author | block_author_score | − | User blocked the author |
| mute_author | mute_author_score | − | User muted the author |
| report | report_score | − | User reported the post |
| not_dwelled NEW | not_dwelled_score | − | Probability user scrolled past without dwelling. Brand-new negative signal in this release. Previously, no-dwell carried no weight; it is now an explicit downvote with weight NotDwelledWeight. |
* VQV weight is conditional via util::candidates_util::vqv_weight which checks video_duration_ms against MinVideoDurationMs. quoted_vqv applies the same gate when EnableQuotedVqvDurationCheck is true.
** Two of the 22 are continuous (f64 seconds) rather than 0–1 probabilities: dwell_time and click_dwell_time. Both contribute linearly with attention duration.
*** The Phoenix demo model artifact published with this release was trained on 19 actions (per phoenix/test_recsys_model.py config); the 22-action expansion shows up in the Rust scoring code and reflects what production is using.
The new outer For-You pipeline assembles posts, ads, prompts, and Who-To-Follow modules into a single ordered feed. Source: home-mixer/selectors/blender_selector.rs, home-mixer/ads/.
The new FeedItem proto wraps five distinct item types: Post(ScoredPost), Ad(AdIndexInfo), WhoToFollow(WhoToFollowModule), Prompt(Prompt), PushToHome(PushToHomePost).
Two implementations of the AdsBlender trait: SafeGapAdsBlender (enforces minimum gap between ads) and PartitionOrganicAdsBlender (default — partitions organic and ad slots). Selection via the AdsBlenderType string parameter ("safe_gap" or anything else).
ads_brand_safety_hydrator, ads_brand_safety_vf_hydrator) enrich ad candidates before blending.The selector runs in this order: (1) ads blender produces blended posts+ads, (2) prompts inserted at the front (positions 0..n by index), (3) one Who-To-Follow module inserted at WHO_TO_FOLLOW_POSITION - 1, (4) Push-To-Home post pinned at position 0.
Dropped posts and ads (those not selected after blending) are emitted as non_selected placeholders in the SelectResult for downstream logging via side effects like scored_stats_side_effect and ads_injection_logging_side_effect.
A new top-level Python package grox/ introduces a content-classification subsystem. It pre-processes posts before they enter the recommendation system. Source: grox/ (47 files, 6,000+ lines).
Vision-language model classifier (VLM_PRIMARY, temperature 1e-6) that scores posts on viral quality, slop_score, has_minor_score, and produces taxonomy categories + a description. Posts that don't clear the screen may be filtered before reaching the ranker.
grox/classifiers/content/banger_initial_screen.pyTwo-stage safety pipeline. PostSafetyScreenDeluxe performs comprehensive screening; SafetyPtos (Policies and Terms of Service) classifies category and policy violations separately.
grox/classifiers/content/post_safety_screen_deluxe.py, safety_ptos.pySpam detection targeting low-follower accounts via the EAPI surface. Feeds into rate-limiting decisions.
grox/classifiers/content/spam.pyGrok-powered reply ranking. Replies are no longer sorted purely chronologically or by engagement — Grok scores them too.
grox/classifiers/content/reply_ranking.pyTwo generations of multimodal (image + text) post embedders. Embeddings are published to Kafka via task_write_mm_embedding_sink and feed downstream retrieval and ranking.
grox/embedder/multimodal_post_embedder_v2.py, v5.pyAutomatic speech recognition for video posts. Audio is transcribed and the transcript becomes input to the embedder and summarizer for downstream classification.
grox/data_loaders/asr_processor.pyA general execution framework: plans/ (initial_banger, master, post_embedding_v5, post_safety, reply_ranking, safety_ptos, spam_comment), tasks/ (24 task definitions including pub, rate_limit, ASR, banger_screen, embedding_pub, multimodal_post_embedding, post_safety, summarizer), schedules/, dispatcher.py, engine.py.
Grox imports grox.config.config, grox.lm (post, user, convo), grox.prompts.template, grok_sampler.config, grok_sampler.vision_sampler, monitor.metrics, and strato_http.queries.grok_topics — none of which are in the published repo.
The Phoenix ranking model. Source: phoenix/grok.py, phoenix/recsys_model.py, phoenix/recsys_retrieval_model.py.
Lower triangular causal mask for user + history. Candidates attend to user history and themselves, but NOT to other candidates. Each candidate's score is independent of batch composition — making it consistent and cacheable.
Multi-head grouped query attention with configurable num_q_heads and num_kv_heads. RMSNorm. Attention clipping via tanh at max_attn_val = 30.0. Masking value -1e30.
RoPE with base exponent 10,000. Encodes sequence position so the model can weight recent engagement differently from older history.
Hash-based embeddings: 2 hashes per user/item/author by default. Action embeddings via multi-hot to signed vector (2*action - 1) with learned projection. Product surface: categorical vocab size 16.
The model artifact zip ships a smaller-than-production model: 128-dim embeddings, 4 transformer layers, 4 attention heads, key size 32, widening factor 2, history sequence length 127, candidate sequence length 64, 1M-entry user/item/author vocab, 19 action types.
User tower: Grok transformer → average pool → L2-normalize. Candidate tower: 2-layer MLP with SiLU → L2-normalize. Dot-product similarity for top-K retrieval. EPS: 1e-12.
Before any candidates are fetched, the query is hydrated with the user's full context. The May 15 release adds 17 new hydrators producing a much richer per-request feature set. Source: home-mixer/query_hydrators/.
Filters run before and after scoring. The May 15 release adds 5 new filters and modifies most of the existing ones. Source: home-mixer/filters/.
Source: home-mixer/models/candidate.rs, candidate_features.rs, query.rs, user_features.rs.
The "incomplete" claim circulating on X is accurate. What was released is the architecture, formulas, and data flow — not a buildable or runnable system. Below is a precise audit.
params.rs — all 20+ weight constants and feature-switch keysutil/ module — score_normalizer, candidates_util, phoenix_requestclients/ module — VMRanker, Gizmoduck, AdIndex, Kafka, TES, ServedHistory, WhoToFollow, Prompts, PastRequestTimestampsxai_* crates: xai_home_mixer_proto, xai_recsys_proto, xai_vm_ranker_proto, xai_feature_switches, xai_decider, xai_x_rpc, xai_dark_traffic, xai_stringcenter, xai_profiling, xai_urt_thrift, xai_pipeline_tracing, xai_candidate_pipeline::component_library