<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement Learning | Shinnosuke Ono | CS Master's Student, University of Tokyo</title><link>https://shinnosukeono.github.io/tag/reinforcement-learning/</link><atom:link href="https://shinnosukeono.github.io/tag/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><description>Reinforcement Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-gb</language><copyright>© 2025 Shinnosuke Ono</copyright><lastBuildDate>Fri, 03 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://shinnosukeono.github.io/media/icon_hu_9d9b593248f3c06d.png</url><title>Reinforcement Learning</title><link>https://shinnosukeono.github.io/tag/reinforcement-learning/</link></image><item><title>Mitigating Reward Hacking in RLHF via Advantage Sign Robustness</title><link>https://shinnosukeono.github.io/publication/ono_et_al_2026/</link><pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate><guid>https://shinnosukeono.github.io/publication/ono_et_al_2026/</guid><description>&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>Cite&lt;/em> button above to get publication metadata for your reference management software in &lt;em>.bib&lt;/em> format.
&lt;/div>
&lt;/div></description></item></channel></rss>