Shinnosuke Ono | CS Master's Student, University of Tokyo
Shinnosuke Ono | CS Master's Student, University of Tokyo
Home
Posts
Projects
Publications
CV
Large Language Models
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
We address reward hacking in RLHF by identifying flipped advantage signs as a key cause and proposing Sign-Certified Policy Optimization (SignCert-PO), a lightweight method that down-weights non-robust completions during policy optimization.
Shinnosuke Ono
,
Johannes Ackermann
,
Soichiro Nishimori
,
Takashi Ishida
,
Masashi Sugiyama
PDF
DOI
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
This paper presents a Japanese phamaceutical-specific langauge model, JPharmatron, along with a benchmark suite consisting three new benchmarks: YakugakuQA (Japanese national pharmacist licensing exams); NayoseQA (cross-lingual synonym and terminology normalization); and SogoCheck (consistency reasoning between paired statements.)
Shinnosuke Ono
,
Issey Sukeda
,
Takuro Fujii
,
Kosei Buma
,
Shunsuke Sasaki
Cite
PDF
Code
Dataset
DOI
Cite
×