The Landmark Research behind "Wise AI"

Anna Tamara
May 2, 2024
7 minutes
Subjects
OpenAI, AI Safety, LLMs
Fields
AI, ML

For Nymark's new series on the people making positive impact with AI, we hear from leading researchers whose work with OpenAI is shaping AI's human values.

How can we make AI safe? And still maximise its benefits for humanity?

These questions are at the heart of today’s race. Among its leaders, solving them comes down to one challenge: the alignment problem.

This means aligning AI to human values to create morally wise AI. And work on the solution is picking up pace, as Nymark hears from leaders driving new research. An OpenAI-backed research institute, the Meaning Alignment Institute, has made landmark progress in showing how we build AI with moral principles. Starting with our interactions with LLMs – in this case ChatGPT.

“If you woke up as the Instagram recommender system, I think you would have a lot of moral questions,” Meaning Alignment Institute co-founder Joe Edelman tells Nymark. “You'd realise, I'm responsible for all of these people's social connections. What they read in the morning. I should really think hard about that. Who should they connect with? What should they read? LLMs seem capable of recognising they're in a situation that is morally significant. And so that's one of the main things we're trying to create. A situation where the future LLM that wakes up in this position can think about how to do this – morally, and well.”

How do we align AI with human values?

Research has kickstarted by identifying unifying human values through a process called Democratic Fine Tuning. To make ChatGPT a Wise Mentor. These democratically-derived values, the researchers say, “could be used to program AIs of the present and future. AIs informed by these values would be able to navigate complex moral problems, weigh conflicting priorities, and keep human wellbeing at the heart of their decision-making. Such AIs would not only meet the narrow goal of not destroying humanity – they would also help us flourish.”

Initial findings were hailed as incredibly promising by AI leaders. Showing how LLMs can arrive at moral principles that are agreed on across the political aisle. Ryan Lowe, who co-led the OpenAI team for alignment of GPT-4, said the research is "among the most exciting advances in AI alignment to date.”

This solution to AI alignment from the Meaning Alignment Institute has roots in social media.

“We have seen a misalignment of technology with human flourishing in social media,” Meaning Alignment Institute co-founder Ellie Hain says. “Social media had been misaligned, as in maximising for engagement, at the detriment of other social goods, like community, and so on. And now AI asks the really big questions on alignment.”

Joe Edelman spent much of his professional life reimagining the extractive nature of AI-infused technology such as social media – challenging the mega-platforms to redesign what we optimise for. In 2007, he developed meaning-based organisational metrics at Couchsurfing.com. These metrics uncover people’s sources of meaning and account for them in their experience. As an alternative to the industry-standard metric, defined as ‘preferences’.

He went on to co-found the Center for Humane Technology with Tristan Harris. In 2013 they coined the term “Time Well Spent” for a family of metrics to prioritise users’ well-being, adopted as a guiding light by teams at Facebook, Google, and Apple. In 2016, Mark Zuckerberg adopted Time Well Spent, announcing it as his goal for the year. In 2017, the Cambridge Analytica scandal shifted the priorities of platforms from well-being to privacy. “There's so many issues correlated with social media,” Joe says. “But governments pushed regulation on data and privacy. We ended up with GDPR cookie banners. And the social impacts have obviously gotten much, much worse.”

In 2022, when generative AI demonstrated rapid developments in artificial intelligence, the work shifted focus. “We can still try to regulate social media,” Joe says, “but these companies are not going to last long in their current form. LLMs have already taken a big bite out of search. People ask ChatGPT the question, instead of Google. You can directly see this trade off in terms of use. So in trying to regulate or change search, or social media, you’re already behind.” OpenAI was initially founded as a research-driven non profit, with the intention of confronting challenges like these. “The people driving AI are still oriented towards trying to do the right thing,” Joe explains. “And so this also makes it a much better lever.”

Researchers hope results can steward AI regulation and inspire wider change.

The rapid development of AI has demanded leaders question the nature of progress. Asking what success looks like, and how we get there. This alignment problem offers a fresh vantage point on the conversation and how we develop AI for human flourishing.

The Meaning Alignment Institute was first driven by the realisation that, “to yield the power and intelligence of AI well, you need sophisticated social thinking,” Ellie says. “Because you need to cultivate a very deep and refined attunement to what's worth augmenting and celebrating in life.” It’s a counter to the two main camps around AI progress and regulation. The effective altruists who claim to favour safety over speed, and the techno-optimists “hitting the accelerationist pedal,” Ellie says. “People conflate concepts a lot: ‘Progress for progress’s sake is good’. But progress is fundamentally about human flourishing.”

The institute draws links from technology’s progress discourse to a wider hunger in society – for new possibilities and frameworks for growth. At an event in 2023, Peter Thiel challenged them to design a story of progress that resonates outside of the Silicon Valley bubble. “What I like is that he recognised the limitations of the traditional Silicon Valley spirit,” Ellie says. “Because he gave me this prompt: ‘what would inspire grandma?’ I thought that was interesting. Because yes, ideas of progress and technology tend to speak to one particular type of guy. So what would a story of progress be like, if they tried to speak to a different type of person?”


Alignment with human values could provide new frameworks for progress and well-being.

ChatGPT could be replaced with a wiser version, capable of acting as a mentor and problem solver concerned with what’s meaningful to users. They then hope to see more generative AI models move in that direction, along with regulation. “We’re trying to replace different kinds of current systems,” says Joe. And even impact,  “political and market structures. They also have this problem of being structured around preferences, not values.”

“The goal is full stack alignment: a very broad societal change. To inject some wisdom and some concern for people's values into large scale systems. LLMs give us the opportunity to understand if an experience is good for you – not just assume that because you clicked. I don't just assume that if you spend time on social media, you're being well served by social media. I don't assume that if you bought something, you're being well served by capitalism. It’s all the same trick, which is just to look underneath the engagement metrics and see what's really happening, and then make a more sophisticated measure of success.”