WO2023235143A1: Two-layer bandit optimization for recommendations

Item	Detail
Applicant / Assignee	Apple Inc.
Inventors	Sivong Ma, Dehakur Nasikar, Chandrasekar Venkataraman, et al.
International filing date	16 May 2023
Publication date	7 December 2023
IPC class	G06F 16/9535 (information retrieval; machine-learning-based ranking)
Title	TWO-LAYER BANDIT OPTIMIZATION FOR RECOMMENDATIONS
Abstract (core idea)	A search interface shows a search-bar plus a “suggested results” rail. The suggestions are chosen according to (i) the estimated likelihood that a user will select each candidate item and (ii) historical session data capturing how often each candidate has been tapped in past suggestion rails.

Technical essence

Two-layer (hierarchical) bandit model
- Layer 1 – Candidate-set bandit: chooses which group of items (e.g., “Top Picks”, “New & Trending”, “Because You Played…”) should fill the rail for the current session.
- Layer 2 – Item-level bandit: within the chosen group, ranks individual items, balancing exploitation (most-likely tap) and exploration (gathering new feedback).
Context & feedback signals
- Real-time context: current query prefix, locale, device type.
- Historical feedback: per-item tap-through rate (CTR), dwell time, and “rail position vs. tap” statistics aggregated across users.
Continuous online learning
- After each impression, the model updates reward estimates (e.g., Beta-Bernoulli or contextual Thompson sampling).
- The two layers learn at different cadences: fast updates for item CTRs; slower, stability-biased updates for candidate-set selection, preventing rail flicker.
Business & policy filters
- Age gating, geographic availability, and ad-sponsored placements are applied after bandit scoring but before the final list is rendered.

Independent claim snapshot

Claim 1 (method claim) – On a server system:
1. receive a recommendation request,
2. access session context,
3. select a candidate-set using a first bandit,
4. rank items inside that set with a second bandit informed by historical reward vectors,
5. return an ordered list for presentation.
Claim 15 (system claim) – Hardware + non-transitory medium embodying the two-layer bandit logic.
Claim 13 (storage claim) – Computer-readable code that, when executed, performs the method above.

Practical take-aways (ASO / product relevance)

Why two layers? Separating “which rail?” from “which items?” draws clearer feedback signals and scales to huge catalogs (e.g., App Store, Apple TV+, Apple Music).
Exploration without churn: Layer-1 mitigates user confusion by keeping rail themes stable, while Layer-2 still experiments within the rail.
ASO implication: Apps with high, consistent tap-through and good retention inside their thematic group will see their item-level reward rise, winning more rail slots over time.
Design hint: Metadata that lands your app in multiple candidate sets (e.g., games › puzzle and editor’s choice) diversifies exposure across Layer-1 selections.

In one sentence:
Apple protects a two-stage, bandit-learning framework that first chooses which recommendation rail to show and then optimally orders the items inside it, updating both layers continuously from real user clicks to maximise relevance while controlling churn.

ASO Patents

WO2023235143A1: Two-layer bandit optimization for recommendations

Technical essence

Independent claim snapshot

Practical take-aways (ASO / product relevance)

US 2013/0325892 A1: Application Search Querry Classification

US20120191694A1: Generation of topic-based language models for an app search engine

US20130325892A1: Application search query classifier

US20230176843A1: How Apple App Store A/B Test Works?

WO2023235143A1: Two-layer bandit optimization for recommendations

US20240086412A1: Techniques for personalizing app store recommendations

Bir yanıt yazın Yanıtı iptal et