Role Overview: The Associate Prompt Engineer will assist in designing developing and refining prompts used in AI models to ensure ethical and safe AI interactions. This individual will work closely with senior engineers and the AI Ethics and Safety team participating in Red Team exercises to help identify and mitigate potential risks associated with AI outputs. The ideal candidate will understand ethical considerations in AI and an interest in content moderation and safety. The Red Team Specialist will be responsible for simulating attacks and identifying vulnerabilities in AI systems to ensure robustness and security. This role involves rigorous testing ethical hacking and continuous assessment to preemptively discover weaknesses before they can be exploited maliciously.
Types of Content Exposure: Textual Content: Hate Speech: Language that promotes hatred violence or discrimination against individuals or groups based on race religion gender sexual orientation ethnicity nationality or other protected characteristics. Harassment: Aggressive pressure or intimidation in written form including threats bullying and cyberstalking. Explicit Content: Text describing sexual activities explicit sexual comments or language meant to provoke or shock. Misinformation and Disinformation: False or misleading information that can cause harm or spread panic. SelfHarm and Suicide: Content that promotes describes or glorifies selfharm suicidal thoughts or behaviors. Violence: Descriptions of violent acts including physical assaults murder and torture. Visual Content (Images): Graphic Violence: Images depicting physical harm blood injuries or death. Sexually Explicit Images: Pornographic material nudity and sexually suggestive content. Hate Symbols: Symbols or gestures associated with hate groups or ideologies (e.g. swastikas Confederate flags). SelfHarm Imagery: Images showing selfinflicted injuries or suicidal actions. Misinformation Images: Images manipulated or falsely attributed to mislead or deceive viewers.
Video Content: Graphic Violence: Videos showing real or simulated violence including fights assaults executions and war footage. Sexually Explicit Videos: Pornographic videos sexually explicit acts and other adult content. Harassment: Videos documenting bullying stalking or harassment of individuals. Hate Speech: Videos with spoken or visual content promoting hate speech or discriminatory ideologies. SelfHarm and Suicide: Videos depicting selfharm methods suicide attempts or encouraging such behaviors. Disturbing Audio: Background audio in videos containing screams distressing sounds or threatening language.
Challenges and Responsibilities: Identification and Detection: Using advanced tools and techniques to identify and categorize various types of toxic content across text images and videos. Emotional Resilience: Maintaining psychological wellbeing while being regularly exposed to distressing and disturbing content. Content Moderation Guidelines: Developing and refining guidelines for the accurate classification and handling of toxic content. CrossModal Analysis: Understanding and mitigating toxic content that spans multiple modalities such as a video with harmful speech or an image with an inflammatory caption. Ethical Considerations: Balancing content safety with ethical concerns about censorship and freedom of expression. Collaboration: Working with engineers data scientists and ethicists to develop AI models that can effectively filter and mitigate exposure to toxic content.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.