Improved Regret Bounds for Bandits with Expert Advice

Abstract
In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order [KT ln(N/K)]^1/2 for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of [KT ln(N)/ln(K)]^1/2. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

submission

afiliatedsites
AI ACCESS FOUNDATION

JAIR is published by AI Access Foundation, a nonprofit public charity whose purpose is to facilitate the dissemination of scientific results in artificial intelligence. JAIR, established in 1993, was one of the first open-access scientific journals on the Web, and has been a leading publication venue since its inception.

PDF https://apippress.com/wp-content/uploads/2025/07/16738_final.pdf

Related Articles

Contact us

Article Content

submission

afiliatedsites
AI ACCESS FOUNDATION

Improved Regret Bounds for Bandits with Expert Advice

Related Articles

Contact us

Article Content

submission

afiliatedsites AI ACCESS FOUNDATION

afiliatedsites
AI ACCESS FOUNDATION