AI Architect

Posted:
1/27/2026, 1:12:33 AM

Location(s):
Leinster, Ireland ⋅ Dublin, Leinster, Ireland

Experience Level(s):
Senior

Field(s):
AI & Machine Learning

Storyful is an equal opportunity employer

Job Description :

Reporting to: Chief Product & Technology Officer (CPTO)

Dublin (Hybrid — 3 days/week in office)


Team: Product & Engineering (foundational hire for Data Science / AI function)
Mission: Build a next-generation Risk & Insights Intelligence Platform that disrupts media monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.

This role is for someone who can architect and build (hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over evaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.

Why this role exists:

We’re building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You’ll shape the technical direction including network science and explainability early: agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.

What you’ll do (Responsibilities)

1) Architect and ship agentic GenAI systems

  • Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
     

  • Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
     

  • Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).
     

2) Build RAG + Graph RAG for multi-doc intelligence

  • Deliver RAG chatbots for investigation and exploration across large document sets.
     

  • Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
     

  • Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.

  • deep agents or deep research; graph traversal strategies (network science); agentic RAG
     

3) Multi-document classification + scoring (risk-focused)

  • Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
     

  • Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
     

  • Bonus: experience building “risk detection” classifiers and adverse media style pipelines.
     

4) Context engineering + automatic prompt improvement

  • Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
     

  • Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.

  • Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval

5) Evaluation: make quality measurable and repeatable

  • Build robust evaluation methodologies for prompts, RAG, summarization, and classification.
     

  • Apply multiple evaluation techniques, including:
     

    • offline metrics (precision/recall/F1 where appropriate)
       

    • retrieval metrics and ablations
       

    • LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
       

  • Define quality gates that allow the team to move fast without breaking trust.

  • Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.
     

6) LLMOps + cost control

  • Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.
     

  • Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.

  • Deploying and maintaining open source models
     

7) Lead by influence (and occasionally by direct leadership)

  • Bring “Senior/Lead Engineer” judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
     

  • Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.
     

What success looks like (first 6–12 months)

  • A production-grade agentic architecture powering key workflows (investigate → summarize → classify → score → recommend action).
     

  • A measurable evaluation framework where quality improves release over release.
     

  • A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
     

  • Clear cost/performance tradeoffs and observability that make the system operable at scale.
     

  • A team around you that’s leveled up in GenAI engineering practices.

Required experience (Must-have)

  • Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.
     

  • Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).
     

  • Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.
     

  • Hands-on experience in AWS and GCP (Azure acceptable as additional).
     

  • Production experience with:
     

    • RAG chatbots

    • multi-document summarization (ideally Graph RAG)

    • multi-document classification

    • scoring methodologies (risk scoring is a strong bonus)
       

  • Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.
     

  • Strong LLMOps and GenAI product design experience: experimentation → deployment → monitoring → iteration.

Nice-to-have (Strong bonuses)

  • Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
     

  • Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
     

  • Experience running annotation programs / building labeled datasets for NLP tasks.

Skills & tools (examples)

We don’t require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.

GenAI frameworks & LLMs

  • LangChain, LlamaIndex
     

  • OpenAI / Gemini / Claude
     

  • Vector RAG + Graph RAG patterns
     

LLMOps / experimentation / observability

  • MLflow (experiments, tracking)
     

  • Langfuse (prompt & trace observability)
     

Data & retrieval

  • Neo4j (graph), ElasticSearch
     

  • Vector stores (Pinecone-style capability), embeddings, semantic chunking
     

Cloud / infrastructure (examples)

  • AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
     

  • GCP (plus Azure exposure helpful)
     

Languages

  • Python (primary), TypeScript, Java (Ruby on Rails experience welcome)

Job Category:

Storyful - Product & Technology