Poster
in
Workshop: Safe Generative AI
An Adversarial Behavior Model for Contextual Ethical Alignment in Large Language Models
Edward Chang
This research develops methodologies for Large Language Models (LLMs) to manage linguistic behaviors related to emotions and ethics. We introduce DIKE, a framework that enhances LLMs' ability to internalize and reflect human values, adapting to cultural contexts to promote transparency and trust. The methodology involves modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our approaches include mapping emotions and behaviors using self-supervised learning, refining guardrails through adversarial reviews, and adjusting outputs for ethical alignment. This framework establishes a foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for responsible AI interactions.