Shinnosuke Ono | CS Master's Student, University of Tokyo
Shinnosuke Ono | CS Master's Student, University of Tokyo
Home
Posts
Projects
Publications
CV
1
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
We address reward hacking in RLHF by identifying flipped advantage signs as a key cause and proposing Sign-Certified Policy Optimization (SignCert-PO), a lightweight method that down-weights non-robust completions during policy optimization.
Shinnosuke Ono
,
Johannes Ackermann
,
Soichiro Nishimori
,
Takashi Ishida
,
Masashi Sugiyama
PDF
DOI
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
This paper presents a Japanese phamaceutical-specific langauge model, JPharmatron, along with a benchmark suite consisting three new benchmarks: YakugakuQA (Japanese national pharmacist licensing exams); NayoseQA (cross-lingual synonym and terminology normalization); and SogoCheck (consistency reasoning between paired statements.)
Shinnosuke Ono
,
Issey Sukeda
,
Takuro Fujii
,
Kosei Buma
,
Shunsuke Sasaki
Cite
PDF
Code
Dataset
DOI
Cite
×