Agents for Trustworthy Ethical Assistance

We consider a hypothetical agent that informs humans about potential ethical problems, such as human rights violations. It may be argued that such an agent has to be embodied in the human world, with human-like experiences and emotions. Otherwise, it would be unreliable due to its “indifference” to human concerns. We argue that a non-human-like ethical agent could be feasible if two requirements are met: (1) The agent must protect the integrity of its own reasoning (including its representations of ethical rules etc.). Therefore it requires a reflective architecture with selfprotection. (2) The agent’s world should generate events that can be related to ethical requirements in the human world. A step in this direction is intrusion detection based on “policy”(e.g. stating which network hosts can talk to each other using which protocols). The policy requirements can be translated into “acceptable” patterns of network events in the agent’s world and the agent can learn to recognise violations. A key question is whether the “policy” can be abstracted to the level of general ethical principles (e.g. specifying honest business relationships) and whether the agent can learn these principles by associating them with events in its own world.