firstbacksecondback
2 Results
Poster
|
Tue 7:30 |
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt |
|
Poster
|
Improving Out-of-distribution Generalization with Indirection Representations Kha Pham · Hung Le · Man Ngo · Truyen Tran |