AI Alignment vs 'The Invisible Hand'
AI Alignment is defined as "the field of research/engineering dedicated to ensuring that AI systems act in accordance with human intentions, values, and ethical principles".
And to understand this issue at a deeper level, we need to understand how humans act in accordance with intentions, values, and ethical principles. And, in my opinion, we don't... or more accurately, we do not act with intention to ethically act.
Humans act selfishly, as all animals do. The difference in intelligent species is that our selfish acts are governed by the 'invisible hand' as outlined by Adam Smith in 1776 in his Wealth of Nations. The invisible hand is thus loosely defined as: where the individual selfish act (generally) results in a net benefit to society, when there was no intention to do so.
A simple example is our own survival. We selfishly act to continue to live, yet we implicitly understand that we should not end the lives of others, as they could do the same to us. This unspoken societal 'restriction' is an unintended result of our selfish acts/thoughts.
Societal morality is the same logic. We generally act, governed by an invisible hand, within a bell-curve of possible actions which have been 'accepted' by the majority of society. So the invisible hand becomes... I won't act this way because I don't want anyone else to act this way. It's a form of empathy, not a personal one, but for society itself.
Thus alignment cannot be a set of rules, as intentions, values, and ethical principles are just points-in-time within the ever-changing morass of societal subjective opinions governed by the invisible hand.
So the question for AI alignment becomes: how do we instil within AI this 'invisible hand'?
And I think it is impossible. We implicitly 'behave' because society has penalties for acting in a non-behaving manner. Either hard penalties such as fines, or imprisonment. Or softer penalties such as shaming/ridicule and isolation.
What are such penalties to an AI engine? Or, more accurately, what penalties would 'matter' to an AI engine?
We can't tell AI to 'go to your room without dinner'... or can we?
Comments
Post a Comment