Posted:
1/27/2026, 1:12:33 AM
Location(s):
Leinster, Ireland ⋅ Dublin, Leinster, Ireland
Experience Level(s):
Senior
Field(s):
AI & Machine Learning
Storyful is an equal opportunity employer
Job Description :
Reporting to: Chief Product & Technology Officer (CPTO)
Dublin (Hybrid — 3 days/week in office)
Team: Product & Engineering (foundational hire for Data Science / AI function)
Mission: Build a next-generation Risk & Insights Intelligence Platform that disrupts media monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.
This role is for someone who can architect and build (hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over evaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.
Why this role exists:
We’re building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You’ll shape the technical direction including network science and explainability early: agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.
What you’ll do (Responsibilities)
1) Architect and ship agentic GenAI systems
Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).
2) Build RAG + Graph RAG for multi-doc intelligence
Deliver RAG chatbots for investigation and exploration across large document sets.
Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.
deep agents or deep research; graph traversal strategies (network science); agentic RAG
3) Multi-document classification + scoring (risk-focused)
Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
Bonus: experience building “risk detection” classifiers and adverse media style pipelines.
4) Context engineering + automatic prompt improvement
Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.
Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval
5) Evaluation: make quality measurable and repeatable
Build robust evaluation methodologies for prompts, RAG, summarization, and classification.
Apply multiple evaluation techniques, including:
offline metrics (precision/recall/F1 where appropriate)
retrieval metrics and ablations
LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
Define quality gates that allow the team to move fast without breaking trust.
Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.
6) LLMOps + cost control
Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.
Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.
Deploying and maintaining open source models
7) Lead by influence (and occasionally by direct leadership)
Bring “Senior/Lead Engineer” judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.
What success looks like (first 6–12 months)
A production-grade agentic architecture powering key workflows (investigate → summarize → classify → score → recommend action).
A measurable evaluation framework where quality improves release over release.
A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
Clear cost/performance tradeoffs and observability that make the system operable at scale.
A team around you that’s leveled up in GenAI engineering practices.
Required experience (Must-have)
Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.
Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).
Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.
Hands-on experience in AWS and GCP (Azure acceptable as additional).
Production experience with:
RAG chatbots
multi-document summarization (ideally Graph RAG)
multi-document classification
scoring methodologies (risk scoring is a strong bonus)
Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.
Strong LLMOps and GenAI product design experience: experimentation → deployment → monitoring → iteration.
Nice-to-have (Strong bonuses)
Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
Experience running annotation programs / building labeled datasets for NLP tasks.
Skills & tools (examples)
We don’t require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.
GenAI frameworks & LLMs
LangChain, LlamaIndex
OpenAI / Gemini / Claude
Vector RAG + Graph RAG patterns
LLMOps / experimentation / observability
MLflow (experiments, tracking)
Langfuse (prompt & trace observability)
Data & retrieval
Neo4j (graph), ElasticSearch
Vector stores (Pinecone-style capability), embeddings, semantic chunking
Cloud / infrastructure (examples)
AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
GCP (plus Azure exposure helpful)
Languages
Python (primary), TypeScript, Java (Ruby on Rails experience welcome)
Job Category:
Storyful - Product & TechnologyWebsite: https://storyful.com/
Headquarter Location: Dublin, Dublin, Ireland
Employee Count: 51-100
Year Founded: 2009
IPO Status: Private
Last Funding Type: Venture - Series Unknown
Industries: News ⋅ Publishing ⋅ Social Media