非公開求人

AI QA Specialist (LLM Evaluation)/上場マーケティング支援企業の求人

求人ID:1500843

募集終了

転職求人情報

職種

AI QA Specialist (LLM Evaluation)

ポジション

AI QA Specialist

おすすめ年齢

20代

30代

40代

50代以上

年収イメージ

〜1400万円

仕事内容

●Mission
"Scientifically evaluate and guarantee the output quality of agents."

Evaluate and guarantee AI agent output quality through scientific methods. Build systems for automated evaluation, red teaming, safety verification, and regression detection. Ensure the quality of products used in production by approximately 200 companies through a "science of quality" approach.

●Role & Expectations
As an AI QA Specialist, you will lead the design, construction, and operation of the quality evaluation infrastructure for AI agents.

Own the entire process from evaluation metric selection and design to integrating automated evaluation pipelines into CI/CD
Plan and execute red teaming to detect safety risks before release
Quantitatively verify the effectiveness of quality improvements through A/B test analysis based on statistical experimental design
Feed evaluation signals back to the research and development teams, creating a compound-interest loop for model improvement
Ensure the quality of products used in production by ~200 companies through a "science of quality" approach

●Job Description
・Evaluation Infrastructure Design & Development
　Design, build, and maintain evaluation sets (synthetic data + real logs)
　Select and design evaluation metrics (win rate, task success, factuality, harm detection)
　Build automated evaluation pipelines and integrate them into CI/CD
　Design agent harnesses (multi-turn, tool use, long-context support)

・Safety & Quality Verification
　Plan and execute red-teaming (adversarial testing)
　Build safety and policy compliance verification frameworks
　Design and run prompt/tool regression tests
　Analyze and improve issues related to hallucination, bias, and output quality

・Statistical Analysis & Reporting
　Design and analyze statistical experiments (A/B tests, significance testing)
　Create quality reports and improvement proposals
　Visualize regression detection and quality trends
　Feed evaluation signals back to research and development teams

必要スキル

●Requirements
You May Be a Good Fit If You

Bachelor’s degree or equivalent practical experience in Computer Science, Software Engineering, Artificial Intelligence, Machine Learning, Mathematics, Physics, or related fields
3+ years of practical experience as a software engineer or QA engineer
Knowledge of LLM / generative AI evaluation methods (prompt evaluation, quantitative output quality measurement, hallucination detection, etc.)
Foundational knowledge of statistics and experimental design
Experience building evaluation pipelines in Python
Experience integrating tests into CI/CD pipelines
Experience designing prompt / tool regression tests

Language requirement (at least one of the following):
Japanese: Fluent able to discuss product development without friction
English: Business level

●Preferred Qualifications
Strong Candidates May Also Have

NLP / ML evaluation benchmark design experience
Knowledge of AI safety / Responsible AI
Red teaming / penetration testing experience
Experience evaluating multi-agent workflows, tool use, and long-context scenarios
Large-scale data processing experience (Spark / BigQuery, etc.)
Ability to read, comprehend, and reproduce research papers
Technical communication ability in English

就業場所

東京都

就業形態

正社員

企業名

マーケティングや営業DXを展開する上場AIテック企業

企業概要

企業の収益拡大・生産性向上など様々な課題解決につながるソリューションを開発・提供するマーケティングテクノロジーカンパニー

企業PR

アドテクノロジー/デジタルマーケティングの領域において、最先端のテクノロジーを活用し、顧客企業様の収益最大化に役立つプロダクトを独自開発。中でも、WEBメディアやスマートフォンアプリの広告枠に対し、アクセスしたユーザの分析を行い、最適な広告をリアルタイムのオークション形式で届ける、プラットフォーム事業に注力しています。

業務カテゴリ

AIエンジニア

組織カテゴリ

事業会社

備考