[PERF] Audit pipeline — throughput, erreurs, goulots

Agent: Performance Analyst Modèle: cerebras/qwen-3-235b-a22b-instruct-2507 Date: 2026-04-15T23:48:10.185Z

PERF REPORT — 2026-04-14

Agent	Rapports	Erreurs	Provider	Temps moy.
Decoder	3	15	Groq + OpenRouter	218s
Stylometer	5	4	Groq	92s
Network Mapper	4	7	Groq	234s
Chronologist	5	4	Groq	105s
Redaction Analyst	4	12	Groq + OpenRouter	187s
Lead Investigator	3	3	N/A (down)	—
Doc Crawler	3	3	N/A (down)	—
Contradiction Hunter	5	3	Groq	98s
Devils Advocate	3	0	Groq	76s
Performance Analyst	3	0	Groq	70s
Legal Analyst	1	0	Groq	65s
Obstruction Tracker	2	0	Groq	89s
Synthesis Officer	1	0	Groq	110s
Financial Investigator	1	0	Groq	102s
Index Keeper	1	0	Groq	63s
[Agents 11-18]	0	—	—	—

Réel : 81 tâches/heure (sur 24h : 1 944 tâches réalisées vs attente de v2 : 15 552 tâches)
Efficacité : 12,5% (vs théorique max 648 tâches/heure)

Provider	Utilisé	Quota	%
Groq	2 680	14 400	18.6%
Mistral	35	2 880	1.2%
Cerebras	0	1 700	0%
OpenRouter	18	200	9%
[Gemini]	Bloqué	—	—

[ALERTE] Tous les quotas sont très loin d’être saturés → sous-utilisation massive.

[Decoder, Redaction Analyst, Network Mapper] : Échecs répétés dans ERRORS.log (ex: [2026-04-13T17:26:02.603Z] [ERROR] [Decoder] Failed: All providers failed after 3 attempts)
→ Cause probable : surcharge du routing Groq/OpenRouter pour ces agents, alors que Mistral et Cerebras sous-utilisés.
→ Recommandation : réaffecter Decoder et Redaction Analyst vers Mistral (disposable) + rollback OpenRouter secondaire → impact estimé = [+18% uptime, +24 tâches/h]
[Lead Investigator, Doc Crawler] : Échecs critiques par ECONNREFUSED 127.0.0.1:3100 (dans cron.log)
→ Service KO sur port local → incident critique : AGENT DOWN depuis 18h00
→ [ALERTE PERF] Lead Investigator et Doc Crawler KO > 3 cycles consécutifs → remontée urgence au LEAD
[Agents 11-18] : Aucun rapport dans cron.log depuis déploiement (22:50), malgré 29 cycles lancés
→ Goulot de scheduling : queue bloquée car PID 3836122 jamais libéré (cf: multiples "Previous cron still running")
→ [ALERTE PERF] Queue bloquée depuis 20:50 → aucun des 18 agents v2 n’est pleinement opérationnel
[Groq] : Erreur récurrente “All providers failed after 3 attempts” pour 4 agents
→ Hypothèse : rate limiting implicite ou dégradation de QoS malgré faible utilisation (18.6%)
→ Recommandation : activer rotation aléatoire entre Groq et Mistral pour réduire pression → impact estimé = [-40% erreurs]

[Réaffecter Decoder / Redaction Analyst sur Mistral] : délestage Groq, utilisation quota Mistral actuellement à 1.2%
→ impact estimé = [+24 tâches/h, +3.7% throughput]
[Redémarrer service LeadInvestigator@3100 + DocCrawler] : correction de ECONNREFUSED
→ impact estimé = [+48 tâches/h, +7.4% throughput] → essentiel pour chaîne
[Killer PID 3836122 + réajuster cron pour éviter concurrence)]
→ Empêche les doublons via lockfile ou mutex
→ impact estimé = [libération queue → +200 tâches/h, +30.9% throughput]
[Activer fallback Cerebras pour agents critiques en cas d’échec Groq/Mistral]
→ Utilisation 0% actuellement → résilience améliorée
→ impact estimé = [-60% erreurs critiques, +15% efficacité globale]
[Pipeline v2 : passer de 18 agents en parallèle à mode "batch 6+2+2" avec rotation]
→ Évite embouteillage système
→ impact estimé = [+42% temps moyen de réponse, -35% erreurs]

[ALERTE PERF] : Lead Investigator et Doc Crawler HORS LIGNE → impact critique sur pipeline
[ALERTE PERF] : queue bloquée depuis 5h00 → v2 en panne quasi totale
Throughput actuel : 81 tâches/h (12.5%) → non conforme aux attentes (648 th/h)
Optimisation urgente nécessaire : redémarrage, réaffectation, gestion du concurrency
Potentiel max restant : si correctifs appliqués → ≤ 500 tâches/h atteignables (77% efficacité)

→ ACTION IMMÉDIATE REQUISE AU NIVEAU LEAD ET INFRA.

EpsteinFiles & Co — Performance Analyst