Item | Detail |
---|---|
Applicant / Assignee | Apple Inc. |
Inventors | Sivong Ma, Dehakur Nasikar, Chandrasekar Venkataraman, et al. |
International filing date | 16 May 2023 |
Publication date | 7 December 2023 |
IPC class | G06F 16/9535 (information retrieval; machine-learning-based ranking) |
Title | TWO-LAYER BANDIT OPTIMIZATION FOR RECOMMENDATIONS |
Abstract (core idea) | A search interface shows a search-bar plus a “suggested results” rail. The suggestions are chosen according to (i) the estimated likelihood that a user will select each candidate item and (ii) historical session data capturing how often each candidate has been tapped in past suggestion rails. |
Technical essence
- Two-layer (hierarchical) bandit model
- Layer 1 – Candidate-set bandit: chooses which group of items (e.g., “Top Picks”, “New & Trending”, “Because You Played…”) should fill the rail for the current session.
- Layer 2 – Item-level bandit: within the chosen group, ranks individual items, balancing exploitation (most-likely tap) and exploration (gathering new feedback).
- Context & feedback signals
- Real-time context: current query prefix, locale, device type.
- Historical feedback: per-item tap-through rate (CTR), dwell time, and “rail position vs. tap” statistics aggregated across users.
- Continuous online learning
- After each impression, the model updates reward estimates (e.g., Beta-Bernoulli or contextual Thompson sampling).
- The two layers learn at different cadences: fast updates for item CTRs; slower, stability-biased updates for candidate-set selection, preventing rail flicker.
- Business & policy filters
- Age gating, geographic availability, and ad-sponsored placements are applied after bandit scoring but before the final list is rendered.
Independent claim snapshot
- Claim 1 (method claim) – On a server system:
- receive a recommendation request,
- access session context,
- select a candidate-set using a first bandit,
- rank items inside that set with a second bandit informed by historical reward vectors,
- return an ordered list for presentation.
- Claim 15 (system claim) – Hardware + non-transitory medium embodying the two-layer bandit logic.
- Claim 13 (storage claim) – Computer-readable code that, when executed, performs the method above.
Practical take-aways (ASO / product relevance)
- Why two layers? Separating “which rail?” from “which items?” draws clearer feedback signals and scales to huge catalogs (e.g., App Store, Apple TV+, Apple Music).
- Exploration without churn: Layer-1 mitigates user confusion by keeping rail themes stable, while Layer-2 still experiments within the rail.
- ASO implication: Apps with high, consistent tap-through and good retention inside their thematic group will see their item-level reward rise, winning more rail slots over time.
- Design hint: Metadata that lands your app in multiple candidate sets (e.g., games › puzzle and editor’s choice) diversifies exposure across Layer-1 selections.
In one sentence:
Apple protects a two-stage, bandit-learning framework that first chooses which recommendation rail to show and then optimally orders the items inside it, updating both layers continuously from real user clicks to maximise relevance while controlling churn.
Bir yanıt yazın