human-review-escalation 태그 문서 1개 | 우성짱의 문서

우성짱의 문서

Tag1건Article 1

#human-review-escalation

이 태그와 연결된 문서를 한곳에서 모아보고, 함께 자주 등장하는 연관 태그까지 이어서 탐색할 수 있습니다.

연관 태그

#agent-monitoring-infrastructure공동문서 1 · 연관도 100%#ai-alignment-operations공동문서 1 · 연관도 100%#coding-agent-safety공동문서 1 · 연관도 100%#defense-in-depth공동문서 1 · 연관도 100%#deployment-lessons공동문서 1 · 연관도 100%#internal-coding-agents공동문서 1 · 연관도 100%#misalignment-detection공동문서 1 · 연관도 100%#pre-execution-screening공동문서 1 · 연관도 100%#prompt-induced-bypass공동문서 1 · 연관도 100%#runtime-agent-monitoring공동문서 1 · 연관도 100%

How we monitor internal coding agents for misalignment

Article2026년 3월 19일

How we monitor internal coding agents for misalignment

OpenAI는 내부 코딩 에이전트의 실제 업무 사용 기록과 추론·도구 호출을 저지연으로 모니터링해 사용자 의도와 어긋나는 행동, 보안 우회, 잠재적 misalignment를 탐지하고 안전장치 개선에 활용하고 있다.

#openai #human-review-escalation #internal-coding-agents #tool-call-logs