에이전트 프레임워크AI 개발HarnessMulti AgentSkillsWorkflow시스템 설계

에이전트 프레임워크 스터디 Day 10: harness는 팀 아키텍처를 만드는 공장이다

2026년 6월 20일19분 읽기

예상 읽기 시간: 20~30분

오늘의 목표

Day 1에서는 에이전트를 모델 하나가 아니라 harness, 즉 실행 환경 전체로 봤습니다. Day 2에서는 도구를 계약으로 봤고, Day 3에서는 컨텍스트를 실행 상태로 봤습니다. Day 4에서는 관측 가능성, Day 5에서는 워크플로와 에이전트 판단의 경계, Day 6에서는 사람 개입, Day 7에서는 이벤트와 산출물, Day 8에서는 capability/handoff, Day 9에서는 확장점을 책임과 권한의 경계면으로 봤습니다.

오늘은 Day 1의 harness 개념을 한 단계 더 밀어붙입니다.

text

harness는 테스트 도구인가?
공유 프롬프트인가?
아니면 agent team을 찍어내는 실행 환경인가?

이 시리즈에서 말하는 harness는 테스트 harness보다 넓습니다. Claude Code나 Hermes 같은 도구를 실제로 오래 쓰다 보면, 중요한 것은 “AI가 똑똑한가”보다 “어떤 실행 환경 안에서 일하게 만들었는가”가 됩니다.

좋은 harness는 에이전트를 하나 더 만드는 기능이 아니라, 반복 가능한 전문 작업 단위를 만드는 공장이다.

여기서 말하는 전문 작업 단위는 사람이 흉내 내는 역할극이 아닙니다. 기획자, 리뷰어, 코더 같은 이름을 붙이는 것이 핵심도 아닙니다. 핵심은 아래 조각이 함께 고정되는 것입니다.

text

model
+ prompt/context
+ source data
+ tools
+ permissions
+ workflow
+ validation
+ artifact contract
+ feedback loop

이 묶음이 안정되면 “이번에는 어떤 에이전트에게 맡길까?”가 아니라 “이 일은 어떤 harness로 실행해야 하나?”라고 생각할 수 있습니다.

1. harness를 너무 좁게 보면 생기는 착각

개발에서 harness라는 말은 보통 테스트 환경을 떠올리게 합니다.

text

input을 넣고
함수를 실행하고
output을 검증하는 틀

이 뜻도 맞습니다. 하지만 에이전트 프레임워크에서 harness를 이렇게만 보면 부족합니다.

AI 작업에서 실패는 단순히 정답이 틀려서만 생기지 않습니다.

text

- 필요한 파일을 못 읽었다.
- 오래된 기억을 현재 사실처럼 썼다.
- tool output을 지시처럼 믿었다.
- 승인 없이 외부 side effect를 냈다.
- 중간 산출물을 남기지 않아 이어받을 수 없다.
- 리뷰는 했지만 어떤 diff를 봤는지 모른다.
- 실패했는데 ticket/session/cron 어디에도 기록이 없다.

이런 실패는 모델 하나를 바꾼다고 해결되지 않습니다. 실행 환경을 설계해야 합니다.

그래서 agent harness는 더 넓은 구조입니다.

text

Agent Harness
  input contract
  context assembly
  tool boundary
  permission policy
  execution workflow
  artifact capture
  verification gate
  feedback / memory / skill update

테스트 harness가 “함수를 안전하게 실행하는 틀”이라면, agent harness는 “AI 작업을 안전하게 실행하고 남기는 틀”입니다.

2. 공유 프롬프트만으로는 팀이 되지 않는다

AI-assisted development를 오래 쓰다 보면 이런 시도를 하게 됩니다.

text

- backend agent prompt
- frontend agent prompt
- reviewer prompt
- planner prompt
- QA prompt

text

Planner가 repo 상태를 모른다.
Reviewer가 실제 lint/test 결과를 모른다.
Coder가 ticket acceptance criteria를 잊는다.
QA가 어떤 브랜치/diff를 봐야 하는지 모른다.
각 agent가 서로 다른 사실을 기억한다.

text

Planner harness:
  contract -> plan / risk / sequence

Implementer harness:
  contract + plan -> changed files / local checks

Reviewer harness:
  contract + diff + checks -> pass/fail findings

Publisher harness:
  contract + final diff + gate result -> commit/push/report

text

Fixed:
  repo path
  content folder convention
  frontmatter shape
  format/lint/build commands
  commit scope
  public URL verification
  ticket report format

Variable:
  post topic
  examples
  article structure
  external references
  final wording

text

Fixed:
  git status/diff inspection
  untracked file review
  deterministic checks
  OCR or fallback reviewer
  high/medium finding resolution rule
  no remote handoff on unresolved blockers

Variable:
  project-specific commands
  severity judgement evidence
  whether OCR must rerun after fix

text

Harness Template
  + Task Contract
  + Profile / Context Scope
  + Tool Allowlist
  + Workflow Steps
  + Artifact Schema
  + Verification Gate
  + Feedback Rule
        |
        v
Runnable Work Unit

text

- 같은 Hermes 세션 안에서 수행되는 절차일 수도 있고
- 별도 profile/spoke로 실행될 수도 있고
- cron job일 수도 있고
- CI job일 수도 있고
- MCP tool 뒤쪽의 worker일 수도 있습니다.

text

Input:
  series = agent-framework
  next_day = 10
  topic = harness as team architecture factory

Allowed tools:
  file write
  pnpm format/lint/build
  git commit/push main

No-touch:
  unrelated local drafts
  provider/model config
  gateway restart

Artifacts:
  day10.mdx
  local verification log
  commit hash
  public URL
  ticket report

Done when:
  format/lint/build pass
  pushed HEAD == origin/main
  public route verified or deploy evidence recorded

text

Input:
  large source/log/session corpus
  question

Allowed tools:
  local index/search/router
  source coordinate extraction

No-touch:
  durable memory write
  remote upload of raw logs

Artifacts:
  selected source spans
  confidence notes
  stale/missing evidence flags

Done when:
  cloud reasoning receives coordinates, not raw dump

text

Hub:
  user conversation
  final decision
  ticket/result ledger
  reusable rule absorption

Stable Spoke:
  repeated project context
  dense local conventions
  recurring verification pattern

Disposable Spoke:
  one-off exploration
  isolated risky context
  archive/discard after result

text

if repetition is low and context is small:
  stay in hub session

if repetition is high and project conventions are dense:
  use stable profile

if task is risky/noisy/exploratory:
  use disposable profile or isolated worktree

text

- canonical checkout이 dirty인가?
- origin/main이 앞서 있는가?
- node_modules가 있는가?
- 이번 run은 commit/push standing approval 범위인가?
- unrelated draft가 같이 build에 섞이는가?

type AgentHarness = {
  name: string;
  trigger: TriggerContract;
  input: InputSchema;
  context: ContextPolicy;
  tools: ToolPolicy;
  permissions: PermissionPolicy;
  workflow: WorkflowStep[];
  artifacts: ArtifactContract[];
  verification: VerificationGate[];
  feedback: FeedbackRule[];
  disposition: DispositionPolicy;
};

text

trigger:
  언제 이 harness를 쓸지

input:
  무엇을 받아야 실행 가능한지

context:
  어떤 memory/session/file/source를 넣을지

 tools:
  어떤 도구를 노출하고 어떤 도구는 막을지

permissions:
  read-only / local write / external side effect / destructive action 경계

workflow:
  deterministic step과 agent judgement step의 순서

artifacts:
  중간/최종 산출물의 위치와 형식

verification:
  pass/fail 기준과 명령

feedback:
  memory, skill, trace, ticket 중 어디에 무엇을 남길지

disposition:
  결과를 hub에 absorb할지, archive할지, discard할지

text

1. 이 작업은 반복되는가?
2. 실패했을 때 복구할 수 있는가?
3. 산출물이 어디에 남는가?
4. 어떤 정보가 hub에 흡수되어야 하는가?
5. 어떤 정보는 버려야 하는가?
6. 검증은 deterministic한가, agent judgement인가?
7. 사람 개입은 위험도에 맞게 배치되어 있는가?
8. 다음 실행에서 재사용할 수 있는 trace가 남는가?

text

나쁜 방향:
  agent를 많이 만든다
  각 agent에게 긴 역할 프롬프트를 준다
  결과를 채팅으로만 받는다

좋은 방향:
  반복 작업을 harness로 정의한다
  입력/도구/권한/산출물/검증을 고정한다
  필요한 경우에만 profile/spoke를 쓴다
  결과를 ticket/artifact/trace/skill로 나누어 남긴다

text

name: seojing-study-publish
trigger:
  scheduled 04:00 run with standing approval

input:
  series
  next day number
  topic direction

context:
  origin/main worktree
  existing series posts
  seojing skill
  content conventions

tools:
  file write
  pnpm prettier/format/lint/build
  git commit/push
  ticket report

permissions:
  local write: allowed for intended MDX
  commit/push: allowed only for approved study lane
  unrelated files: no-touch
  gateway restart: forbidden

workflow:
  1. inspect repo state
  2. use isolated worktree if canonical tree is dirty/behind
  3. choose next Day N from origin/main
  4. write MDX
  5. run formatting and build checks
  6. commit only intended file
  7. push to origin main
  8. verify origin/main equality and public/deploy evidence
  9. record ticket report and workflow trace

artifacts:
  apps/web/content/study/agent-framework/dayN.mdx
  command outputs
  commit hash
  public URL
  ticket report

feedback:
  if repeated failure -> patch skill/reference
  if success -> workflow trace
  if new reusable rule -> skill update

에이전트 프레임워크 스터디 Day 10: harness는 팀 아키텍처를 만드는 공장이다

에이전트 프레임워크 스터디 Day 10: harness는 팀 아키텍처를 만드는 공장이다

오늘의 목표

1. harness를 너무 좁게 보면 생기는 착각

2. 공유 프롬프트만으로는 팀이 되지 않는다

오케이징에게 물어보기

포스트 목록

3. harness는 무엇을 고정하고 무엇을 열어 둘까?

4. 팀 아키텍처 공장으로서의 harness

예시: 글 발행 work unit

예시: local evidence routing work unit

5. profile/spoke는 역할극이 아니라 context boundary다

6. skill은 harness의 일부이지 전부가 아니다

7. harness의 최소 스키마

8. 실패 사례로 보는 harness 설계

실패 1: role은 있는데 output contract가 없다

실패 2: 도구는 있는데 permission tier가 없다

실패 3: memory와 task state가 섞인다

실패 4: verification이 final response 안에만 있다

실패 5: disposable work가 hub를 오염시킨다

9. OkayJing식 harness 판단 기준

10. 작은 설계 연습: SEOJing publish harness

오늘의 정리