Forge is the side of the dual pipeline that produces ~5,000-word thinkpieces in the
voice of Paul Logan. Pieces ship to LinkedIn, Reddit, and Substack. The voice model
(V7D) is the same checkpoint scribe consumes for novel chapters β
a single source-of-truth LoRA pinned via config/voice_pin.yaml and verified by
scribe-canonical adapter_sha on every restart.
Each section runs through a critic+retry loop modeled on scribe's
scene-kick recovery + donts critic patterns.
Up to 3 attempts per section; ship best by lowest tic-count.
| file | role |
|---|---|
scripts/thinkpiece/run_thinkpiece.py |
AgentWrite 2-stage driver. Outline + 5 sections + critic+retry loop + paragraph-echo detection. |
scripts/thinkpiece/gen_images.py |
Stability AI hero+section image gen with per-piece metaphor map (HERO_METAPHORS + SECTION_METAPHOR_HINTS). Falls back to generic editorial prompt for unmapped pieces. |
scripts/thinkpiece/count_voice_artifacts.py |
Mechanical voice scanner. Counts em-dash variants, clichΓ©s, LLM-tells (12 banned patterns). |
scripts/thinkpiece/aggregate_metrics.py |
Per-piece + batch metrics from run.jsonl + voice_fidelity.json. Writes METRICS.json. |
scripts/thinkpiece/queue.py |
Blog ideas queue manager. list/show/pull/done/block/add ops on runs/queue/blog_ideas.jsonl. |
scripts/post/post_linkedin.py |
LinkedIn UGC Post API (OAuth, w_member_social). Adapts piece to 1300-char hook + link. |
scripts/post/post_reddit.py |
PRAW script-app. Submits 40k-char self-post to target subreddit. |
scripts/post/post_substack_notes.py |
Substack Notes (short ~480-char teaser + canonical link) via reverse-engineered API. |
scripts/post/post_substack_browser.py |
Playwright browser automation for full Substack post (no official API as of 2026-04). Operator-authorized grey-hat. |
scripts/post/post_all.py |
Cross-platform orchestrator. Dry-run by default; --confirm fires. |
scripts/rag/link_check.py |
Pre-post URL health gate. HEAD-checks all markdown links + plain URLs. Caches 24h. Fails on 4xx/5xx. |
src/forge/postprocess/__init__.py |
Egress contract v1.0.1 β strip <think>, fix mojibake, replace em-dashes with commas. Same module scribe consumes. |
config/voice_pin.yaml |
Voice model contract. Production pin = V7D (manifest_digest ae32ff9cβ¦). Pre-commit hook auto-ships pin bumps to scribe via deaddrop. |
V7D produces strong Paul-voice prose ~90% of the time. The remaining 10% is concentrated in 4 verbal tics inherited from training data. The pipeline now detects these mechanically and re-rolls.
| banned pattern | why |
|---|---|
I want to be clear / I want to be honest | Hedge announcing softening. Paul-voice rule: state things directly. |
the truth is / the reality is / in many ways | Filler. Removable scaffolding. |
That's the thing / pattern / point / core / whole | V7D section-ender re-anchor tic. |
Here's what / Here is what / Here's the thing | Setup phrase that V7D leans on instead of just describing the thing. |
The question is whether / how / why | Rhetorical-question signpost. Once per piece OK; 4Γ in one section is a tic. |
more importantly / interestingly / crucially / importantly / essentially / basically / fundamentally | Filler adverbs. |
On detection, retry prompt injects the literal phrase V7D wrote with explicit instruction to rewrite without it. Up to 2 retries. Best attempt = lowest tic count. Validated against batch v1 worst-tic case (roman_citizenship: 14 tells β v3 roman_roads section 1: 1 tic on attempt 1).
| metric | value |
|---|---|
| pieces shipped | 5 |
| total words | 25,342 |
| V7D wall (sequential, max-num-seqs 2) | 57.4 min (~11.5 min/piece) |
| qpass overall (forge.postprocess.quality_gate) | 100% (25/25 sections) |
| voice fidelity mean (Claude judge, 6 axes) | 4.72 / 5 |
| em-dash count | 0 |
| clichΓ© count (initial scan) | 2.2 mean / piece |
| LLM tells (extended scan) | 9.2 mean / piece |
| image API spend | $4.92 of $5 cap (188 calls) |
V7D's strongest register: AI/LLM contrarian (5.00/5 on agentic_ai_marketing). Weakest: history+meta (4.40/5 on roman_citizenship β slight closing hedge). Spread 0.60 β V7D holds across 5 distinct registers with no register-collapse.
safe_append_jsonl wrapper. Pre-commit hook blocks any edit to existing lines.config/voice_pin.yaml declares the active checkpoint.
Pre-commit hook on a digest change auto-ships a type=ship message into
voice_to_novel.jsonl. Scribe re-deploys via redeploy_voice_pin.sh.sha256(`sha256sum adapter_model.safetensors adapter_config.json | sort`).
Same hash on both ends or the boot is refused.systemctl --user stop vllm-*.type=correction line + correlation_id.MANIFEST.json + MANIFEST_SHARDS.sha256. Verify on serve. Refuse to boot drifted.| id | status | stub | register |
|---|---|---|---|
idea-001 | queued | AI is making the noise-to-signal ratio untenable: a research roundup | AI/cultural-contrarian |
Next research piece. Sources: curl bug-bounty closure, writing-competition shutdowns,
openclaw fiasco. Will exercise the link_check.py + footnote-renderer
(Phase 2 of docs/rag_citation_design.md).
| area | state |
|---|---|
| RAG citation pipeline | Phase 1 only β link-check works; Phase 2 (footnote renderer + source DB) deferred to v2 batch. |
FA2 install in venv_cu130 | deferred β would 5Γ train + serve speed; risky compile. |
| eugr/spark-vllm-docker switch | deferred β bigger win than per-flag tuning. |
| V7E SFT data prep | P2 β filter "here's what" / "that's the thing" patterns from training set; only path to drive persistent tics to 0. |
| LinkedIn comment-link workaround | designed not built β secondary /v2/socialActions/{urn}/comments POST. |
| GitHub Pages deploy of this page | manual β operator action. |
~/paul-thinkpiece-pipeline/ (local; no public mirror yet)vllm-paul-voice.service on port 8003 (bnb 4-bit, gpu_mem_util 0.40, max-num-seqs 2)docs/self-portrait.htmldocs/rag_citation_design.md, scripts/post/README.md