Article Content
Abstract
In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order [KT ln(N/K)]1/2 for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of [KT ln(N)/ln(K)]1/2. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.
submission
afiliatedsites
AI ACCESS FOUNDATION

JAIR is published by AI Access Foundation, a nonprofit public charity whose purpose is to facilitate the dissemination of scientific results in artificial intelligence. JAIR, established in 1993, was one of the first open-access scientific journals on the Web, and has been a leading publication venue since its inception.
PDF https://apippress.com/wp-content/uploads/2025/07/16738_final.pdf