We frame Question Answering as a Reinforcement Learning task, an approach that we call Active Question Answering.
We propose an agent that sits between the user and a black box question-answering system and which learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient.
We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. Our agent improves F1 by 11.4% over a state-of-the-art base model that uses the original question/answer pairs.
Based on a qualitative analysis of the language that the agent has learned while interacting with the question answering system, we propose that the agent has discovered basic information retrieval techniques such as term re-weighting and stemming.