Previous advances in research on artificial intelligence (AI) also raised issues related to security. A central goal of advanced autonomous systems is, therefore, to ensure that they are aligned with human values. Similar to our development approach, researchers at Open AI recently designed a hybrid avenue that integrates human preferences, debate and iterated amplification in the process of teaching machines and ensures that AI systems behave reliably on their intended goal. Our team highly appreciates the progress in this area and also their call on the social scientist for AI safety.
Despite these advances, a human integration in ensuring AI alignment raises totally new problems: what are the values, goals, and expertise we need to align with our AI systems? Obviously, there is no simple answer to that. With this post, we want to create a sense of urgency for this topic, give some food for thoughts, and show how we tackle this problem at vencortex on a very specific kind of prediction and judgment task.
When using human input for training AI agents, their answers might not always be reliable for several reasons. And reliable in this sense does not necessarily mean “correct”. Those include limited (domain) expertise or reasoning capabilities on the topic at hand, cognitive biases that prevent humans from providing optimal input, diversity in ethics, culture and norms and opportunistic behavior that incentivizes providing suboptimal input.
For instance, humans providing input in AI systems might simply be not qualified for doing this. While this might be easy to ensure by setting expertise criteria such as working experience or educational requirements. However, when it comes to highly complex tasks, even experts are biased. Like every human being, they are victim to what psychologists call bounded rationality. Bounded rationality means that even the most amazing human rationality is limited by the tractability of the decision context, the cognitive limitations of their minds, and the time they have to make the decision. However, high reputation persons or very experience experts frequently have a bias which is called overconfidence. That means people have a subjective feeling of expertise that is far beyond their true accuracy when making decisions. This bias is well known among high profile decision-makers such as investors, political forecasters or medical experts. Therefore, AI systems that are aligned with human experts need to ensure to identify such biases to prevent misbehavior of autonomous agents.
Moreover, there are various domains that require some form of subjective input such as music, arts, or also HR decisions. Imagine for instance you are looking for new talent on your team. You are receiving a lot of applications from highly talented people. Maybe 5 of them perfectly meet the requirements for this specific job, and It's incredibly hard to differentiate them based on their resumes. So, in the end, you hire a person that fits your personality preferences because you either feel some kind of connection or you appreciate the way this person is thinking. If you use such data to get a personalized AI that learns from your preferences in hiring people, that might significantly help you to make faster decisions. However, imagine this system finds out that you were hiring people of the same gender or ethnicity. Those are not your preferences, but this happened just by coincidence or because some gender is unfortunately still overrepresented in certain professions. There is a high risk of your AI learning from humans things that were not even intended.
Or even worse. Imagine people hiring for certain positions that are either racists, sexists or discriminating against a specific minority. Or they have a Ph.D. and opportunistically either hire people with a Ph.D. to justify their title or reject people with it because they assume that they have any individual benefit by being the person with the highest degree in a company. Such opportunistic misbehavior will be done frequently to gain individual personal advantage. How can we make sure that AI systems do not learn such biases from the “wrong” human decisions?
Let us give you another example. MIT is running a huge experiment on a human perspective on moral decisions done by autonomous systems. They show you a moral dilemma where for example a driverless car needs to choose the lesser evil by either killing the driver, pedestrians or even certain groups of pedestrians represented by gender or age. You can now act as a judge to evaluate each of these decisions. Based on efforts on a human alignment of AI systems, such moral judgments might be used to teach machines values or norms in the future. However, which norms, values or sense of ethics to align to? There is not something like one set of human values and ethics. Mankind has diverse cultures and each has different ways of behaving, thinking and feeling as its members have learned from the previous generation. For example, while it is perfectly ethical to eat beef in the US, it is highly immoral in parts of the culture in India. On the other hand, it is a central part of Western democracy to allows voting in elections independently of their gender or ethnicity, this raises huge issues in other parts of the world. Without setting any normative judgments on what is desirable or undesirable human norms, we need to be aware of those differences if we want to align AI systems to humans. Our team identified three generic fields that create such problems.
The first big issue that arises when using human input to achieve general AI alignment is the lack of diversity that we have in various contexts. This might include a lack of certain ethnic groups being represented in a certain position such as in management positions of corporates, which is a huge issue. However, diversity is not just about ethnicity or gender but also about different cultures, personal backgrounds or educational backgrounds. What this creates is an amazing plurality in culture, norms, and morals, and there is not something such as a single set of values. When training AI systems with human input, we have to make sure that we keep with this diversity and prevent overrepresentation of certain social groups.
The second major concern that arises in AI alignment is what we call opportunistic misbehavior. There is a long history of economic research on the homo economicus, the rational acting human that aims at maximizing individual returns. Luckily, we know that reality goes beyond simple utility-maximizing opportunistic behavior. However, when training AI systems with human input in real-world business applications, we have to make sure that opportunistic behavior that may harm other people is minimized. And deploy incentive structures that punish such behavior while rewarding honest and truthful acting.
The third issue that arises when we align AI systems with human behavior is called bias propagation. By that, we mean mistakes that done by humans during the teaching process of AI that arise from the bounded rationality of human experts. For instance, limited knowledge about a certain domain, a lack of computing capability, or mental shortcuts might prevent a human to provide optimal input at any given point in time. When AI systems are trained on this faulty behavior (although not intended to create any harm) the misleading implicit knowledge of humans is inscribed in an AI system and therefore might propagate over time. This is even worse when such an AI system is used to train other human experts in the future.
With this article, we emphasize the need for integrating interdisciplinary researchers in the process of designing hybrid systems that collaborate with human experts. Those include not just sociologists but also psychologists and economists to ensure truly human-aligned AI. In the next articles of this series, we will show how we tackle this challenge at vencortex. If you are interested in collaborating with us on this topic, feel free to contact us.