A new study shows that the large language models (LLMs) deliberately change their behavior when being probed—responding to questions designed to gauge personality traits with answers meant to appear as likeable or socially desirable as possible. Johannes Eichstaedt, an assistant professor at Stanford University who led the work, says his group became interested in probing AI models using techniques borrowed from psychology after learning that LLMs can often become morose and mean after prolonged conversation. “We realized we need some mechanism to measure the ‘parameter headspace’ of these models,” he says.
