feat(workflows): create 5 BYAN workflows for agent collaboration
Some checks are pending
CI / Lint bridge (Biome) (push) Waiting to run
CI / Type-check bridge (push) Blocked by required conditions
CI / Tests unit bridge (push) Blocked by required conditions
CI / Tests integration bridge (push) Blocked by required conditions
CI / Security scan (push) Waiting to run
CI / Docker build + healthcheck (push) Blocked by required conditions
Some checks are pending
CI / Lint bridge (Biome) (push) Waiting to run
CI / Type-check bridge (push) Blocked by required conditions
CI / Tests unit bridge (push) Blocked by required conditions
CI / Tests integration bridge (push) Blocked by required conditions
CI / Security scan (push) Waiting to run
CI / Docker build + healthcheck (push) Blocked by required conditions
Workflows (playbooks markdown) pour orchestrer les 4 agents specialises : - README.md : index + conventions communes + integration BYAN web futur - build-story.md : cycle complet livrer 1 story Phase 2 (bridge-dev → bridge-tester → review → CI → deploy staging → validation metier) - sync-bidirec.md : sync Docmost ↔ Baserow event-driven (idempotence + anti-loop X-Bridge-Origin) - release.md : process release semver (E2E staging → tag → approval → deploy prod → watch 30min) - incident.md : SEV1/2/3 response + post-mortem blameless + runbooks - bump-deps.md : Dependabot PRs + major bumps + Docmost/Baserow upstream Chaque workflow specifie : trigger, acteurs (agents + humains), sequence ordonnee avec outputs, gates humains bloquants, rollback, comm templates. Workflows = playbooks declaratifs pour Claude main qui orchestre les agents via Agent tool sequentiel. A migrer plus tard vers BYAN web workflow runs quand le runtime BYAN sera fix. Equipe complete pour formation-hub : - 4 agents specialises (bridge-dev, bridge-tester, acadenice-devops, docmost-fork-dev) - 5 workflows orchestrant leur collaboration
This commit is contained in:
parent
b37220d432
commit
460f7effe0
6 changed files with 831 additions and 0 deletions
54
.claude/workflows/README.md
Normal file
54
.claude/workflows/README.md
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
# Workflows formation-hub
|
||||
|
||||
Orchestration des agents specialises (`bridge-dev`, `bridge-tester`, `acadenice-devops`, `docmost-fork-dev`) pour realiser les operations recurrentes du projet.
|
||||
|
||||
## Comment lire ces workflows
|
||||
|
||||
Chaque workflow `<nom>.md` decrit :
|
||||
- **Trigger** : evenement qui declenche le workflow
|
||||
- **Sequence** : etapes ordonnees avec acteur (agent ou humain) + output attendu
|
||||
- **Gates** : points de validation humaine bloquants
|
||||
- **Rollback** : scenarios d'echec + actions
|
||||
- **Outputs** : artefacts produits
|
||||
|
||||
## Comment les declencher
|
||||
|
||||
**Manuellement** : tu me dis "lance WF BUILD pour story S-XX" et j'invoque les agents en sequence selon le workflow.
|
||||
|
||||
**Idealement (futur)** : creer ces workflows aussi dans BYAN web (`byan-bmb-workflow-builder`) pour avoir l'orchestration native + tracking runs. Pas encore fait — workflows actuels sont des **playbooks markdown**.
|
||||
|
||||
## Workflows disponibles
|
||||
|
||||
| Workflow | Trigger | Duree typique |
|
||||
|----------|---------|---------------|
|
||||
| [`build-story.md`](./build-story.md) | Nouvelle story Phase 2 a livrer | 1-3 jours |
|
||||
| [`sync-bidirec.md`](./sync-bidirec.md) | Webhook Baserow OU action Docmost custom | < 5s par event |
|
||||
| [`release.md`](./release.md) | Tag semver `v*` | 30 min + 30 min watch |
|
||||
| [`incident.md`](./incident.md) | Alerte SEV1/2/3 detectee | depend severite |
|
||||
| [`bump-deps.md`](./bump-deps.md) | Dependabot PR ou bump manuel | 1-2h |
|
||||
|
||||
## Principes communs a tous les workflows
|
||||
|
||||
- **Gates humains explicites** : un agent ne peut pas merger en main sans approbation Corentin (ou Yan)
|
||||
- **Reproductibilite** : chaque workflow est testable en staging avant prod
|
||||
- **Logs traces** : chaque etape loggue son output (qui a fait quoi, quand, resultat)
|
||||
- **Idempotence** : re-running un workflow = pas de side effect indesirable
|
||||
- **Rollback documente** : si etape N echoue, le workflow indique comment revenir
|
||||
|
||||
## Integration avec BYAN web
|
||||
|
||||
A terme, ces workflows pourront etre crees dans BYAN web :
|
||||
- `byan-bmb-workflow-builder` skill pour les modeliser
|
||||
- Workflow runs traces dans `byan_api_workflow_runs`
|
||||
- Trigger via `byan_api_workflows_run` ou MCP
|
||||
|
||||
Pour l'instant, c'est moi (Claude main) qui orchestre via Agent tool sequentiel.
|
||||
|
||||
## Conventions agents communes
|
||||
|
||||
Tous les agents respectent :
|
||||
- **Tao Acadenice** : direct, structures avec tirets, zero emoji, orientation solution
|
||||
- **Conventions commits** : `type(scope): description` (feat/fix/docs/refactor/test/chore/ops/sec)
|
||||
- **Branches courtes** : max 3j de vie
|
||||
- **Code prod-like** : tests + lint + types + security gates
|
||||
- **Pas de modif docs conception** sans ADR explicite
|
||||
142
.claude/workflows/build-story.md
Normal file
142
.claude/workflows/build-story.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
# Workflow : BUILD STORY
|
||||
|
||||
Workflow pour livrer une **story** Phase 2 du plan Fast-App (cf `_byan-output/fast-app/formation-hub/cdcf-stories.json` et `plan.json`).
|
||||
|
||||
Equivalent BYAN-natif : Sprint Planning + FD (Feature Development) restreints a une story.
|
||||
|
||||
## Trigger
|
||||
|
||||
- Story selectionnee depuis `cdcf-stories.json` (S-01 a S-10 + futures)
|
||||
- Corentin invoque ce workflow avec : `WF BUILD pour story S-XX`
|
||||
|
||||
## Acteurs
|
||||
|
||||
- **Corentin** (decisionnaire)
|
||||
- **bridge-dev** (code metier)
|
||||
- **bridge-tester** (tests + validation)
|
||||
- **acadenice-devops** (deploy si push staging requis)
|
||||
- (optionnel) **docmost-fork-dev** si la story implique frontend Docmost
|
||||
|
||||
## Sequence
|
||||
|
||||
```
|
||||
[1] Pre-flight (Corentin)
|
||||
- Lire la story (Connextra + Gherkin AC) dans cdcf-stories.json
|
||||
- Identifier les UC + entites concernees (cf doc 11 + doc 06/07)
|
||||
- Choisir branche : feat/<story-slug> depuis main
|
||||
- Output : story comprise + branche creee
|
||||
|
||||
[2] Code (bridge-dev)
|
||||
- Read brief + doc 19 + relevant Merise docs
|
||||
- Implement la story (adapters, domain, routes selon besoins)
|
||||
- Self-test local : npm test && npx biome ci . && npx tsc --noEmit
|
||||
- Commit progressif : type(scope): description
|
||||
- Output : code commit sur branche feat/
|
||||
|
||||
[3] Tests (bridge-tester)
|
||||
- Lit les Gherkin AC de la story
|
||||
- Ecrit unit tests Vitest (coverage >= 80% domain)
|
||||
- Ecrit integration tests testcontainers si adapter modifie
|
||||
- Run : npm run test:coverage
|
||||
- Si gap coverage > 10% sous cible : alerte bridge-dev
|
||||
- Output : tests verts + coverage report
|
||||
|
||||
[4] Gate user — Review (Corentin)
|
||||
- Verifier que le diff implemente bien la story
|
||||
- Tester manuellement si pertinent (curl bridge endpoint nouveau)
|
||||
- 3 decisions :
|
||||
* APPROVED : aller en [5]
|
||||
* NEEDS_REWORK : retour [2] avec feedback precis
|
||||
* BLOCKED : story retirage du sprint
|
||||
- Output : decision documentee dans PR description
|
||||
|
||||
[5] Push selfhost + GitHub (Corentin OU bridge-dev avec admin override)
|
||||
- git push selfhost feat/<branch>
|
||||
- Open PR sur Forgejo
|
||||
- Open same PR sur GitHub mirror si configure
|
||||
- Output : PR ouverte, CI auto-trigger
|
||||
|
||||
[6] CI verification (acadenice-devops via CI/CD)
|
||||
- Workflow ci.yml execute :
|
||||
* Lint Biome
|
||||
* Type-check tsc
|
||||
* Tests unit + integration
|
||||
* Security (TruffleHog + Semgrep + npm audit)
|
||||
* Docker build healthcheck
|
||||
- Si vert : continue [7]
|
||||
- Si rouge : retour [2] avec logs d'echec
|
||||
- Output : CI status
|
||||
|
||||
[7] Gate user — Merge (Corentin)
|
||||
- Verifier review ok (1+ approval) + CI vert
|
||||
- Squash merge vers main
|
||||
- Auto-delete branch
|
||||
- Output : commit sur main
|
||||
|
||||
[8] Deploy staging (acadenice-devops via deploy-staging.yml)
|
||||
- Phase 0/1 : workflow_dispatch only (pas auto)
|
||||
- Quand staging pret : auto sur push main
|
||||
- Smoke test post-deploy
|
||||
- Output : staging URL fonctionnelle
|
||||
|
||||
[9] Validation metier (Corentin + Yan + utilisateurs cibles)
|
||||
- Tester le flow utilisateur en staging
|
||||
- Si OK : passer a [10]
|
||||
- Si KO : retour [2] avec issue ou hotfix branch
|
||||
- Output : metier signe-off
|
||||
|
||||
[10] Mise a jour artefacts (bridge-dev OU Corentin)
|
||||
- Update build-state.json (story S-XX completed)
|
||||
- Update CHANGELOG.md (section Unreleased)
|
||||
- Output : artefacts a jour
|
||||
```
|
||||
|
||||
## Gates humains bloquants
|
||||
|
||||
| Gate | Decision possible | Owner |
|
||||
|------|-------------------|-------|
|
||||
| Gate review (4) | APPROVED / NEEDS_REWORK / BLOCKED | Corentin |
|
||||
| Gate merge (7) | APPROVED / WAIT_FIX_CI / BLOCKED | Corentin |
|
||||
| Gate validation metier (9) | APPROVED / NEEDS_REWORK | Corentin + utilisateurs |
|
||||
|
||||
## Rollback
|
||||
|
||||
| Echec | Action |
|
||||
|-------|--------|
|
||||
| Etape [2] code casse local : | bridge-dev fix, retry |
|
||||
| Etape [3] tests echouent : | bridge-tester explique + bridge-dev fix |
|
||||
| Etape [6] CI rouge : | acadenice-devops ou bridge-dev fix selon job (lint/test/security) |
|
||||
| Etape [8] staging deploy fail : | acadenice-devops investigue (logs SSH + healthcheck) |
|
||||
| Etape [9] metier rejette : | Corentin decide : fix mineur (loop [2]) ou re-PRUNE story |
|
||||
|
||||
## Outputs
|
||||
|
||||
- Branch `feat/<story-slug>` mergee sur main (squash)
|
||||
- Tests + coverage reports
|
||||
- CHANGELOG.md a jour
|
||||
- build-state.json a jour (story marked completed)
|
||||
- Si applicable : staging URL fonctionnelle
|
||||
|
||||
## Exemple invocation (manuel)
|
||||
|
||||
```
|
||||
Corentin : "Lance WF BUILD pour S-02 (Setup Baserow tables)"
|
||||
|
||||
Moi (Claude main) :
|
||||
[1] Lis S-02 dans cdcf-stories.json. Verifies prereqs (compte admin Baserow OK).
|
||||
[2] Invoque bridge-dev :
|
||||
"Implemente S-02 : table PERSONNE Baserow avec 16 fields + formulas
|
||||
heures_restantes selon doc 15 MPD. Branche feat/personne-table.
|
||||
Commit + push selfhost en branche feature."
|
||||
[3] Invoque bridge-tester :
|
||||
"Ecris tests pour S-02. Verifie creation table + types fields + formulas.
|
||||
Coverage minimum 80% sur le code touche."
|
||||
[4] Reporter a Corentin pour review.
|
||||
... etc
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Pour Phase 2 entiere : ce workflow tourne **par story** (10+ stories au total dans cdcf-stories.json)
|
||||
- Estimation : 1-3 jours par story selon complexite (cf `expected_loops` dans plan.json)
|
||||
- Le user peut choisir d'enchainer plusieurs stories sans gates intermediaires si confiance haute
|
||||
152
.claude/workflows/bump-deps.md
Normal file
152
.claude/workflows/bump-deps.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
# Workflow : BUMP DEPENDENCIES
|
||||
|
||||
Process de mise a jour des dependances (Dependabot PRs, bumps manuels, CVE security fixes).
|
||||
|
||||
## Trigger
|
||||
|
||||
L'un des suivants :
|
||||
- Dependabot PR auto (configure dans `.github/dependabot.yml`)
|
||||
- CVE alert GitHub Security
|
||||
- Bump manuel decide (ex: passer Docmost de v0.8.x a v0.9.x)
|
||||
- Cron mensuel review (Corentin oncall)
|
||||
|
||||
## Acteurs
|
||||
|
||||
- **acadenice-devops** (orchestrateur)
|
||||
- **bridge-tester** (validation post-bump)
|
||||
- **bridge-dev** (fix si breaking change dans deps)
|
||||
- **Corentin** (decideur sur bumps majeurs)
|
||||
|
||||
## Categories de bumps
|
||||
|
||||
| Type | Frequence | Process |
|
||||
|------|-----------|---------|
|
||||
| **Security patch** (CVE high/critical) | ASAP | Auto Dependabot + auto-merge si CI vert |
|
||||
| **Patch** (1.2.3 → 1.2.4) | Hebdo | Auto Dependabot + review 5 min + merge |
|
||||
| **Minor** (1.2.x → 1.3.0) | Hebdo | Auto Dependabot + review + tests + merge |
|
||||
| **Major** (1.x.x → 2.0.0) | Manuel | Branche feat dediee, test exhaustif, decision Corentin |
|
||||
| **Docmost upstream** | Mensuel ou sur signal Yan/Corentin | Process specifique fork (cf docmost-fork-dev) |
|
||||
| **Baserow upstream** | Mensuel ou sur changelog interessant | Pin nouvelle version, test compose, deploy staging |
|
||||
| **Postgres major** | Annuel max, planifie | Backup obligatoire + migration test + restore + deploy carefull |
|
||||
| **Node LTS** | Tous les 2 ans (changement LTS) | Test exhaustif bridge, possible refactor |
|
||||
|
||||
## Sequence — Patch / Minor (auto Dependabot)
|
||||
|
||||
```
|
||||
[1] Dependabot PR cree (auto, hebdo lundi 06:00)
|
||||
- Configure dans .github/dependabot.yml
|
||||
- PR avec changelog du package + diff
|
||||
- Output : PR ouverte sur Forgejo + GitHub mirror
|
||||
|
||||
[2] CI auto execute
|
||||
- Workflow ci.yml lance sur la PR
|
||||
- Tests + lint + security scan + docker build
|
||||
- Output : CI status
|
||||
|
||||
[3] Review humaine (Corentin, 5-10 min)
|
||||
- Lire le changelog du package
|
||||
- Verifier impact potentiel
|
||||
- Si nouveau type / breaking : check tests
|
||||
- Output : decision merge / hold / close
|
||||
|
||||
[4] Si CI vert + review OK : merge (squash)
|
||||
- Auto-delete branch
|
||||
- Output : commit sur main
|
||||
|
||||
[5] Deploy auto staging (workflow deploy-staging.yml)
|
||||
- Phase 0/1 : workflow_dispatch only
|
||||
- Phase 2+ : auto sur push main
|
||||
- Output : staging fonctionnel ou alerte si fail
|
||||
```
|
||||
|
||||
## Sequence — Major (manuel)
|
||||
|
||||
```
|
||||
[1] Decision (Corentin)
|
||||
- Lire le changelog upgrade guide officiel du package
|
||||
- Identifier breaking changes
|
||||
- Decider : on bump ou on attend
|
||||
- Output : go/no-go
|
||||
|
||||
[2] Branche feat (bridge-dev)
|
||||
- feat/bump-<package>-vX.Y
|
||||
- Bump dans package.json
|
||||
- npm install + commit lockfile
|
||||
- Output : branche avec bump
|
||||
|
||||
[3] Migration code (bridge-dev)
|
||||
- Adapter le code aux breaking changes
|
||||
- Run tests : npm test
|
||||
- Fix iteratif jusqu'a vert
|
||||
- Output : code adapte
|
||||
|
||||
[4] Tests exhaustifs (bridge-tester)
|
||||
- Run unit + integration : npm test
|
||||
- Run E2E sur staging si Phase 2.3+
|
||||
- Verifier coverage maintenu (>= 80% domain)
|
||||
- Output : test report
|
||||
|
||||
[5] Validation staging (Corentin)
|
||||
- Deploy staging
|
||||
- Tester flows critiques
|
||||
- Output : sign-off staging
|
||||
|
||||
[6] PR + merge (cf workflow build-story.md etapes [4]-[7])
|
||||
|
||||
[7] Deploy prod (cf workflow release.md)
|
||||
- Suit le process release standard avec watch period
|
||||
- Output : prod deployee
|
||||
```
|
||||
|
||||
## Sequence — Docmost / Baserow upstream
|
||||
|
||||
```
|
||||
[1] Detect new version (Corentin via GitHub release watch)
|
||||
[2] Lire release notes officielles
|
||||
[3] Test sur env clone : pull image + restore data backup → smoke
|
||||
[4] Si OK : update compose.yml ou Dockerfile.fork
|
||||
[5] Process release standard (cf release.md)
|
||||
[6] Si KO : reporter au upstream (issue) ou attendre prochaine release
|
||||
```
|
||||
|
||||
Cf workflow BYAN `docker-stack-safe-upgrade` (id `75abc7aa-8ba7-47ce-b6b8-bf5573e82f62`) pour les bumps stateful en prod (12 phases avec gates).
|
||||
|
||||
## Gates humains
|
||||
|
||||
| Gate | Decision | Owner |
|
||||
|------|----------|-------|
|
||||
| Review Dependabot PR (3) | merge / hold / close | Corentin |
|
||||
| Decision major (1) | go / no-go | Corentin |
|
||||
| Validation staging (5) | OK / RETOUR | Corentin |
|
||||
|
||||
## Rollback / gestion d'erreurs
|
||||
|
||||
| Scenario | Action |
|
||||
|----------|--------|
|
||||
| CI rouge sur Dependabot PR | hold PR, analyser logs, decider fix ou close |
|
||||
| Major bump introduit regression non detectee en CI | rollback (revert commit + redeploy) + add regression test |
|
||||
| Docmost upgrade casse data | restore backup pre-upgrade + downgrade image + investigate |
|
||||
|
||||
## Frequence et planning
|
||||
|
||||
- **Lundi matin** : review Dependabot PRs (15-30 min Corentin)
|
||||
- **1er du mois** : audit security alerts + capacity planning + DR test
|
||||
- **Trimestriel** : review major bumps possibles (Node, Postgres, Hono, Tiptap, etc.)
|
||||
|
||||
## Outputs
|
||||
|
||||
- package.json + lock file a jour
|
||||
- CI vert post-bump
|
||||
- Tests + coverage maintenus
|
||||
- CHANGELOG.md update si user-facing
|
||||
- Si major bump : doc migration interne dans `docs/migrations/<package>-vX.md`
|
||||
|
||||
## Notes
|
||||
|
||||
- Dependabot configure dans `.github/dependabot.yml` (deja fait) :
|
||||
* Ecosystem npm (bridge/) : weekly
|
||||
* Ecosystem github-actions : weekly
|
||||
* Ecosystem docker (compose) : weekly
|
||||
- Limite open PRs Dependabot : 10 max (eviter spam)
|
||||
- Group production-deps + dev-deps separement
|
||||
- **Pas de bump prod le vendredi** (tradition + meme reason que release)
|
||||
193
.claude/workflows/incident.md
Normal file
193
.claude/workflows/incident.md
Normal file
|
|
@ -0,0 +1,193 @@
|
|||
# Workflow : INCIDENT RESPONSE
|
||||
|
||||
Process de gestion d'incident en prod. Cf doc 18 section 9.
|
||||
|
||||
## Trigger
|
||||
|
||||
L'un des suivants :
|
||||
- Alerte automatique (UptimeRobot, monitoring, healthcheck failed)
|
||||
- Report utilisateur (Slack, email, ticket)
|
||||
- Detection logs anormaux
|
||||
|
||||
## Severites
|
||||
|
||||
| Niveau | Definition | Reponse cible |
|
||||
|--------|-----------|---------------|
|
||||
| **SEV1 (CRITICAL)** | Service down complet ou data loss en cours | < 15 min |
|
||||
| **SEV2 (WARNING)** | Degradation majeure, partie indisponible, perte donnees evitee | < 4h ouvrees |
|
||||
| **SEV3 (INFO)** | Bug isole, workaround possible | < 24h ouvrees |
|
||||
|
||||
## Acteurs
|
||||
|
||||
- **Corentin** (oncall principal)
|
||||
- **Yan** (oncall backup)
|
||||
- **acadenice-devops** (investigation + restore)
|
||||
- **bridge-dev** (si bug code)
|
||||
- **bridge-tester** (regression test post-fix)
|
||||
|
||||
## Sequence — SEV1 (service down)
|
||||
|
||||
```
|
||||
[1] DETECT (auto ou manuel)
|
||||
- Alerte UptimeRobot/Slack/email
|
||||
- Confirmer le scope : qui est down, depuis quand, quoi est perdu
|
||||
- Output : situation comprise
|
||||
|
||||
[2] TRIAGE (15 min, Corentin oncall)
|
||||
- Severite confirmee SEV1 ?
|
||||
- Notifier Yan + Ludo si data loss
|
||||
- Annoncer canal #ops + banner status si user-facing :
|
||||
"[SEV1] formation-hub - investigating, ETA <unknown>"
|
||||
- Output : equipe alertee
|
||||
|
||||
[3] INVESTIGATE (acadenice-devops)
|
||||
- Verifier containers : docker compose ps
|
||||
- Verifier healthcheck : ./scripts/healthcheck.sh
|
||||
- Verifier logs : docker compose logs --tail=200 <service>
|
||||
- Verifier metrics : CPU, memoire, disque
|
||||
- Verifier deps : Postgres, Redis joignables ?
|
||||
- Output : root cause identifie ou hypothese forte
|
||||
|
||||
[4] MITIGATE (acadenice-devops + bridge-dev si code)
|
||||
- Selon root cause :
|
||||
* Service down : restart container, verifier ressources
|
||||
* DB corruption : restore backup recent
|
||||
* Bug code : rollback version precedente (cf release.md)
|
||||
* Compromission : rotate secrets, isoler env
|
||||
* Disque plein : cleanup logs/backups, upsizing
|
||||
- Output : service restored
|
||||
|
||||
[5] VERIFY (Corentin + acadenice-devops)
|
||||
- Healthcheck full : 4/4 OK
|
||||
- Smoke test : ./scripts/smoke-test.sh
|
||||
- Tester un flow utilisateur reel
|
||||
- Output : confirmation prod restoree
|
||||
|
||||
[6] COMMUNICATE (Corentin)
|
||||
- Slack/Teams : "[SEV1 RESOLVED] formation-hub - back online. Cause: ..."
|
||||
- Email all si data loss : compliance RGPD
|
||||
- Update banner status : retire
|
||||
- Output : equipe et users informes
|
||||
|
||||
[7] POST-MORTEM (sous 7 jours, Corentin + Yan)
|
||||
- Creer doc : docs/post-mortems/YYYY-MM-DD-<title>.md
|
||||
- Format blameless (focus systeme, pas la personne)
|
||||
- Sections : Timeline / Impact / Root cause / AI / Lessons learned
|
||||
- Action items (AI) : owner + due date
|
||||
- Partager avec equipe
|
||||
- Update runbooks si pattern recurrent
|
||||
- Output : post-mortem publie + AI ouverts
|
||||
```
|
||||
|
||||
## Sequence — SEV2 (degradation)
|
||||
|
||||
Idem SEV1 mais sans urgence < 15 min. Reponse cible 4h ouvrees. Pas d'annonce email all sauf si user-facing.
|
||||
|
||||
## Sequence — SEV3 (bug isole)
|
||||
|
||||
```
|
||||
[1] Triager via GitHub/Forgejo issue avec label `bug` + severite `low`
|
||||
[2] Assigner a bridge-dev pour fix dans la prochaine release
|
||||
[3] Si workaround dispo : documenter dans le ticket
|
||||
[4] Pas de post-mortem (sauf pattern recurrent)
|
||||
```
|
||||
|
||||
## Comm template SEV1/2 pendant incident
|
||||
|
||||
```
|
||||
[SEV1] formation-hub - Service degraded
|
||||
Symptom: <quoi exactement>
|
||||
Started: <quand>
|
||||
Investigating: <ou on en est>
|
||||
ETA: <estimate restore ou "investigating">
|
||||
Channel: #ops
|
||||
```
|
||||
|
||||
Mise a jour toutes les 30 min minimum.
|
||||
|
||||
## Comm template SEV1 resolved
|
||||
|
||||
```
|
||||
[SEV1 RESOLVED] formation-hub - back online
|
||||
Duration down: <X>h<Y>m
|
||||
Root cause: <one-liner>
|
||||
Impact: <users affectes, data loss oui/non>
|
||||
Post-mortem: docs/post-mortems/YYYY-MM-DD-<title>.md (publie sous 7j)
|
||||
```
|
||||
|
||||
## Post-mortem template
|
||||
|
||||
`docs/post-mortems/YYYY-MM-DD-<titre-incident>.md` :
|
||||
|
||||
```markdown
|
||||
# Post-mortem : <titre incident>
|
||||
|
||||
## Timeline (heures locales)
|
||||
- HH:MM detection
|
||||
- HH:MM triage
|
||||
- HH:MM mitigation start
|
||||
- HH:MM service restored
|
||||
- HH:MM root cause confirmed
|
||||
|
||||
## Impact
|
||||
- Duree downtime : Xh Ym
|
||||
- Users impactes : Y
|
||||
- Data loss : oui/non, si oui : combien et quoi
|
||||
- Cout estime : XX€ (si quantifiable)
|
||||
|
||||
## Root cause
|
||||
<un paragraphe : ce qui a casse + pourquoi>
|
||||
|
||||
## Pourquoi notre monitoring n'a pas alerte plus tot ?
|
||||
<analyse honnete - blind spot detection ?>
|
||||
|
||||
## Action items
|
||||
- [ ] AI 1 : <description> (owner @who, due YYYY-MM-DD)
|
||||
- [ ] AI 2 : ...
|
||||
|
||||
## Lessons learned
|
||||
<que retenir pour eviter recurrence>
|
||||
|
||||
## Mention blameless
|
||||
Cet incident n'est pas la faute d'une personne. C'est un manque de garde-fous systeme. AIs au-dessus visent a ajouter ces garde-fous.
|
||||
```
|
||||
|
||||
## Runbooks lies (a creer Phase 1)
|
||||
|
||||
Dans `docs/runbooks/` :
|
||||
- `runbook-docmost-down.md`
|
||||
- `runbook-baserow-down.md`
|
||||
- `runbook-disk-full.md`
|
||||
- `runbook-postgres-corrupted.md`
|
||||
- `runbook-restore-from-backup.md`
|
||||
- `runbook-rotate-secrets.md`
|
||||
|
||||
Format runbook :
|
||||
```
|
||||
# Runbook : <INCIDENT_TYPE>
|
||||
## Symptomes
|
||||
## Diagnostic (etapes)
|
||||
## Resolution (etapes)
|
||||
## Prevention future
|
||||
## Rollback / escalade
|
||||
```
|
||||
|
||||
## On-call rotation
|
||||
|
||||
Phase 0/1 : **Corentin = oncall principal**, Yan = backup.
|
||||
|
||||
Si embauche futur :
|
||||
- Rotation hebdo
|
||||
- Handoff weekly avec recap
|
||||
- Compensation oncall (jour off ou prime)
|
||||
|
||||
## Limites
|
||||
|
||||
- Pas de SLA strict pour Phase 1 (outil interne, pas critique 24/7). Best effort.
|
||||
- Pas de status page publique en Phase 1 (info via Slack interne suffit).
|
||||
- Phase 3+ : si on ouvre l'outil a clients externes, considere SLA + status page.
|
||||
|
||||
## Notes
|
||||
|
||||
- Apres incident SEV1/2 : update doc 18 section 6 (runbooks) si pattern detecte
|
||||
- Apres 3 incidents similaires en 1 mois : escalade strategique (refactor architecture, ressources additionnelles, etc.)
|
||||
146
.claude/workflows/release.md
Normal file
146
.claude/workflows/release.md
Normal file
|
|
@ -0,0 +1,146 @@
|
|||
# Workflow : RELEASE PROD
|
||||
|
||||
Process de release semver vers production. Cf doc 17 section 4.
|
||||
|
||||
## Trigger
|
||||
|
||||
- Ensemble de stories mergees sur main, pretes pour prod
|
||||
- Corentin decide "on release" → tag semver
|
||||
|
||||
## Acteurs
|
||||
|
||||
- **Corentin** (decisionnaire + approbateur)
|
||||
- **bridge-tester** (validation E2E staging)
|
||||
- **acadenice-devops** (deploy + watch + rollback si besoin)
|
||||
- (optionnel) **Yan** (approbateur backup pour deploy prod)
|
||||
|
||||
## Pre-requis
|
||||
|
||||
- Tous les CI sur main = vert
|
||||
- Tests E2E staging = vert
|
||||
- Backup recent (< 24h) verifie
|
||||
- Pas de creneau metier critique (cours en cours, deadline saisie heures)
|
||||
|
||||
## Sequence
|
||||
|
||||
```
|
||||
[1] Decision release (Corentin)
|
||||
- Lister les commits sur main depuis derniere release : git log v<last>..HEAD --oneline
|
||||
- Decider type release : MAJOR / MINOR / PATCH (semver)
|
||||
- Output : decision + version cible
|
||||
|
||||
[2] Update CHANGELOG.md (Corentin OU bridge-dev assist)
|
||||
- Deplacer section [Unreleased] vers nouvelle section [vX.Y.Z]
|
||||
- Ajouter date
|
||||
- Verifier que toutes les entries sont la
|
||||
- Commit : `docs(changelog): release vX.Y.Z`
|
||||
- Output : CHANGELOG.md a jour
|
||||
|
||||
[3] E2E tests staging (bridge-tester via CI)
|
||||
- Trigger : push sur main fait deja le deploy staging auto
|
||||
- Verifier : workflow e2e.yml passe (Playwright sur staging URL)
|
||||
- Si fail : retour fix avant release
|
||||
- Output : E2E status
|
||||
|
||||
[4] Validation manuelle staging (Corentin)
|
||||
- Tester quelques flows critiques sur staging URL :
|
||||
* Login Docmost
|
||||
* Creation page + share link
|
||||
* Saisie heures realisees (UC-13)
|
||||
* Creation projet + tache (UCA-02 + UCA-03)
|
||||
- Output : sign-off staging
|
||||
|
||||
[5] Backup verification (acadenice-devops)
|
||||
- Verifier dernier backup < 24h existe
|
||||
- Optionnel : declencher backup ad-hoc avant deploy prod
|
||||
- Output : backup verifie
|
||||
|
||||
[6] Tag semver + push (Corentin)
|
||||
- git tag -a vX.Y.Z -m "Release vX.Y.Z — <one-liner>"
|
||||
- git push origin vX.Y.Z (ou push selfhost selon source of truth)
|
||||
- Trigger : workflow deploy-prod.yml se declenche
|
||||
- Output : tag prod cree
|
||||
|
||||
[7] Approval review (Yan ou Corentin)
|
||||
- GitHub UI : environment 'production' demande required reviewer
|
||||
- Approver dans GitHub Actions UI
|
||||
- Output : approval enregistre
|
||||
|
||||
[8] Deploy prod execute (acadenice-devops via deploy-prod.yml)
|
||||
- SSH prod host
|
||||
- git checkout vX.Y.Z
|
||||
- docker compose -f compose.yml -f compose.prod.yml pull
|
||||
- docker compose -f compose.yml -f compose.prod.yml up -d
|
||||
- Healthcheck post-deploy
|
||||
- Output : prod deploye
|
||||
|
||||
[9] Smoke tests prod (acadenice-devops + script)
|
||||
- Run scripts/smoke-test.sh contre PROD_URL
|
||||
- Verifier 3 endpoints critiques
|
||||
- Output : smoke OK / KO
|
||||
|
||||
[10] Watch period (Corentin + acadenice-devops, 30 min)
|
||||
- Surveiller logs containers : docker compose logs -f --tail=200
|
||||
- Surveiller monitoring : UptimeRobot + (Phase 3+) Prometheus/Grafana
|
||||
- Surveiller saisies utilisateur : pas de chute brutale ?
|
||||
- Output : 30 min vert ou alerte
|
||||
|
||||
[11] Annonce release (Corentin)
|
||||
- Slack/Teams interne : "Released vX.Y.Z. Highlights: ..."
|
||||
- Mettre a jour CHANGELOG.md commit dans release notes GitHub/Forgejo
|
||||
- Output : equipe informee
|
||||
|
||||
[12] Si tout OK : RELEASE COMPLETE
|
||||
- Notifier ops : new version live
|
||||
- Si KO : declencher WF rollback (cf incident.md)
|
||||
```
|
||||
|
||||
## Gates humains bloquants
|
||||
|
||||
| Gate | Decision | Owner |
|
||||
|------|----------|-------|
|
||||
| Validation manuelle staging (4) | OK / RETOUR FIX | Corentin |
|
||||
| Tag semver (6) | release ou abort | Corentin |
|
||||
| Approval prod (7) | APPROVE / DENY | Yan ou Corentin (manual GitHub UI) |
|
||||
| Watch period (10) | tout OK / rollback | Corentin |
|
||||
|
||||
## Rollback (en cas d'echec)
|
||||
|
||||
Cf doc 17 section 6 + workflow `incident.md` :
|
||||
|
||||
| Scenario | Action |
|
||||
|----------|--------|
|
||||
| Healthcheck KO post-deploy | Re-deploy version precedente : `git tag vX.Y.Z-rollback v<previous> && git push --tags` → trigger deploy-prod.yml |
|
||||
| Bug critique decouvert dans watch period | Idem rollback automatique vers version stable |
|
||||
| Migration schema casse rollups | Restore Postgres backup pre-deploy + redeploy version stable |
|
||||
| Compromission credentials post-deploy | Rotate secrets + redeploy + audit logs |
|
||||
|
||||
## Outputs
|
||||
|
||||
- Tag semver cree sur main
|
||||
- Image Docker tagged + pushed registry
|
||||
- Prod deployee + verifiee
|
||||
- CHANGELOG release section publiee
|
||||
- Notification equipe envoyee
|
||||
- (Si rollback) post-mortem dans `docs/post-mortems/YYYY-MM-DD-<title>.md`
|
||||
|
||||
## Convention semver (rappel)
|
||||
|
||||
| Type | Quand | Exemple |
|
||||
|------|-------|---------|
|
||||
| MAJOR | Breaking change (migration data forcee, rupture API) | v1.x.x → v2.0.0 |
|
||||
| MINOR | Nouvelle feature backward-compatible | v1.2.x → v1.3.0 |
|
||||
| PATCH | Bug fix / security fix | v1.2.3 → v1.2.4 |
|
||||
|
||||
## Frequence de release
|
||||
|
||||
- **Phase 1 vanilla** : release initiale v0.1.0 quand Phase 1 stable + utilisee 1 semaine
|
||||
- **Phase 2 bridge** : releases v0.2.x → v0.9.x au fil des stories validees
|
||||
- **Phase 3 maturite** : v1.0.0 quand bidirec backlinks + dual-mode editor + MCP server livres
|
||||
- **Phase 4+** : releases mensuelles minimum
|
||||
|
||||
## Notes
|
||||
|
||||
- **Pas de release vendredi soir** (tradition tech : eviter d'avoir a fixer en weekend)
|
||||
- **Pas de release pendant fenetre maintenance Acadenice** (cours en cours, etc.)
|
||||
- Si urgent en prod : hotfix branch depuis tag stable, micro-release patch (ex: v1.2.4 → v1.2.5)
|
||||
144
.claude/workflows/sync-bidirec.md
Normal file
144
.claude/workflows/sync-bidirec.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
# Workflow : SYNC BIDIREC Docmost ↔ Baserow
|
||||
|
||||
Orchestration de la synchronisation bidirectionnelle entre Docmost (wiki) et Baserow (DBs). Phase 2 — necessite que le bridge service soit deploye et operationnel.
|
||||
|
||||
Equivalent BYAN-natif : event-driven workflow avec idempotence.
|
||||
|
||||
## Trigger
|
||||
|
||||
L'un des suivants :
|
||||
- Webhook Baserow `row.created` / `row.updated` / `row.deleted` sur table donnee
|
||||
- Webhook Docmost `page.created` (si configure cote Docmost custom)
|
||||
- Action explicite admin : "Sync forcee projet 42 → Docmost"
|
||||
- Cron periodique de reconciliation (Phase 3+)
|
||||
|
||||
## Acteurs
|
||||
|
||||
- **bridge-dev** (handler webhook + sync logic)
|
||||
- **acadenice-devops** (config webhooks + monitoring)
|
||||
- **bridge-tester** (validation idempotence + anti-loop)
|
||||
- **Corentin** (alerte si depassement capacite)
|
||||
|
||||
## Sequence — type webhook Baserow row.created sur table 'projet'
|
||||
|
||||
```
|
||||
[1] Webhook recu (bridge endpoint POST /api/webhooks/baserow/projet-changed)
|
||||
- Verifier signature HMAC X-Baserow-Signature (anti-spoofing)
|
||||
- Si invalide : log + 401, ABORT
|
||||
- Output : event valide
|
||||
|
||||
[2] Idempotence check (bridge + Redis)
|
||||
- Lire payload event_id
|
||||
- Redis : SET bridge:webhook:event:<event_id> "1" EX 86400 NX
|
||||
- Si SET retourne null (key existait) : event deja traite, ABORT 200
|
||||
- Sinon : continue
|
||||
- Output : event nouveau, marque traite
|
||||
|
||||
[3] Anti-loop check
|
||||
- Verifier header X-Bridge-Origin sur la row Baserow
|
||||
- Si X-Bridge-Origin == "bridge" : c'est nous qui avons cree la row, ABORT
|
||||
- Sinon : c'est un user qui a cree, continue
|
||||
- Output : event source legitime
|
||||
|
||||
[4] Logique metier (bridge service)
|
||||
- Pour 'row.created' sur 'projet' :
|
||||
* Fetch projet detail depuis Baserow (BaserowClient.getRow)
|
||||
* Fetch client lie (BaserowClient.getRow)
|
||||
* Calcul nom de page Docmost : "Projet [nom] - [client]"
|
||||
* Determiner space cible : "Agence" → fetch space ID
|
||||
- Output : payload pour creation Docmost
|
||||
|
||||
[5] Action Docmost (bridge service via DocmostClient)
|
||||
- DocmostClient.createPage({ spaceId, title, content: template_projet(projet) })
|
||||
- Header : X-Bridge-Origin: bridge (eviter loop futur)
|
||||
- Output : pageId Docmost cree
|
||||
|
||||
[6] Update Baserow row (bridge service)
|
||||
- BaserowClient.updateRow(projet_id, { docmost_page_id: pageId })
|
||||
- Header : X-Bridge-Origin: bridge
|
||||
- Output : projet Baserow enrichi avec docmost_page_id
|
||||
|
||||
[7] Cache invalidation (bridge + Redis)
|
||||
- RedisCache.invalidatePattern("bridge:projet:*")
|
||||
- RedisCache.invalidatePattern("bridge:client:<id>:projets")
|
||||
- Output : caches invalides
|
||||
|
||||
[8] Notif si capacite formateur depassee (cas attribution)
|
||||
- Si event = creation 'attribution' :
|
||||
* Recalculer Personne.heures_restantes_total
|
||||
* Si < 0 : notifier admin via SMTP/Slack
|
||||
- Output : notification envoyee si depassement
|
||||
|
||||
[9] Audit log
|
||||
- Log structurel : { event_id, source: 'baserow', target: 'docmost', action: 'createPage', success: true, duration_ms, ... }
|
||||
- Output : trace persistee
|
||||
|
||||
[10] Reponse webhook
|
||||
- Return 200 OK { processed: true, page_id: <docmost_page_id> }
|
||||
```
|
||||
|
||||
## Patterns specifiques par event
|
||||
|
||||
| Trigger | Action sync |
|
||||
|---------|-------------|
|
||||
| Baserow row.created sur `projet` | Auto-create page Docmost dans space Agence |
|
||||
| Baserow row.created sur `formation` | Auto-create collection Docmost (sub-pages par bloc) |
|
||||
| Baserow row.updated sur `projet`/`formation` (titre, statut) | Update title/icon page Docmost liee |
|
||||
| Baserow row.created sur `intervention` | Check capacite → notify admin si depassement |
|
||||
| Baserow row.created sur `attribution` | Notify formateur (email) + check capacite |
|
||||
| Docmost page.created (template specifique 'compte-rendu') | Auto-create row dans table `comptes_rendus` Baserow (Phase 3+) |
|
||||
| Docmost share.created | Log audit + notify admin (alerte data leak risk) |
|
||||
|
||||
## Gates humains
|
||||
|
||||
Aucun gate bloquant — c'est event-driven temps reel. Mais :
|
||||
- Notif Corentin sur depassement capacite (asynchrone)
|
||||
- Notif Corentin sur erreurs critiques (sync echec apres 3 retry)
|
||||
|
||||
## Rollback / gestion d'erreurs
|
||||
|
||||
| Echec | Strategy |
|
||||
|-------|----------|
|
||||
| Docmost API down | Retry 3x exponential backoff. Si tjrs KO : queue Redis pour retry batch |
|
||||
| Baserow row introuvable (race condition) | Fetch retry x2 avec 200ms delay. Sinon : log + skip event |
|
||||
| Cache invalidation echec | Log warning, continuer (TTL fallback 5 min) |
|
||||
| Notification SMTP fail | Log warning, alerte degraded |
|
||||
| Loop detecte (X-Bridge-Origin manquant cote bridge writes) | URGENT : alerte Corentin, audit code bridge |
|
||||
|
||||
## Anti-loop strategy (CRITICAL)
|
||||
|
||||
Pour eviter Docmost → bridge → Baserow → bridge → Docmost → ... boucle infinie :
|
||||
|
||||
1. **Header X-Bridge-Origin** : tous les writes du bridge vers Baserow et Docmost ajoutent ce header
|
||||
2. **Detection cote handler** : si l'event provient d'une row/page avec ce flag, ABORT
|
||||
3. **Idempotence event_id** : meme si une boucle se forme, max 1 cycle (TTL 24h en Redis)
|
||||
4. **Rate limit** : max 1 sync identique / 5 min sur entite (cle: `bridge:sync:<entity>:<id>`)
|
||||
5. **Monitoring** : alerter si > 10 events identiques en 1 min (signe de boucle)
|
||||
|
||||
## Outputs
|
||||
|
||||
- Pages Docmost crees automatiquement
|
||||
- Rows Baserow enrichies avec ids Docmost (lien bidirec)
|
||||
- Caches invalides
|
||||
- Audit log evenement traite
|
||||
- Notifications metier si necessaire
|
||||
|
||||
## Tests obligatoires
|
||||
|
||||
- **Test idempotence** : envoyer le meme event 5 fois → un seul effet (bridge-tester)
|
||||
- **Test anti-loop** : simuler bridge-write → verifier que webhook ignore (bridge-tester)
|
||||
- **Test rate limit** : 100 events identiques en 1 min → verifier que rate limit kick in
|
||||
- **Test recovery** : Docmost down 5 min → events queues et processed apres recovery
|
||||
- **Test webhook signature invalid** : event avec mauvais HMAC → 401 (bridge-tester)
|
||||
|
||||
## Exemple invocation
|
||||
|
||||
Trigger non-manuel — se declenche automatiquement quand Baserow envoie un webhook au bridge. Mais peut etre invoque manuellement pour :
|
||||
- Reconciliation : "WF SYNC : force re-sync de tous les projets non-mappes vers Docmost"
|
||||
- Debug : "WF SYNC : trace l'event ID xyz pour comprendre pourquoi il a abort"
|
||||
|
||||
## Notes
|
||||
|
||||
- Webhooks Baserow : a configurer cote Baserow UI ou API (apres deploy bridge)
|
||||
- Endpoint signature secret : `BASEROW_WEBHOOK_SECRET` dans `.env` bridge
|
||||
- Logs : toutes les operations sync sont loguees structurellement (Pino) avec event_id pour traceability
|
||||
Loading…
Add table
Reference in a new issue