非公開求人

AI QA Specialist (LLM Evaluation)/上場マーケティング支援企業の求人

求人ID:1500843

募集終了

転職求人情報

職種
AI QA Specialist (LLM Evaluation)
ポジション
AI QA Specialist
おすすめ年齢
20代
30代
40代
50代以上
年収イメージ
〜1400万円
仕事内容
●Mission
"Scientifically evaluate and guarantee the output quality of agents."

Evaluate and guarantee AI agent output quality through scientific methods. Build systems for automated evaluation, red teaming, safety verification, and regression detection. Ensure the quality of products used in production by approximately 200 companies through a "science of quality" approach.

●Role & Expectations
As an AI QA Specialist, you will lead the design, construction, and operation of the quality evaluation infrastructure for AI agents.

Own the entire process from evaluation metric selection and design to integrating automated evaluation pipelines into CI/CD
Plan and execute red teaming to detect safety risks before release
Quantitatively verify the effectiveness of quality improvements through A/B test analysis based on statistical experimental design
Feed evaluation signals back to the research and development teams, creating a compound-interest loop for model improvement
Ensure the quality of products used in production by ~200 companies through a "science of quality" approach

●Job Description
・Evaluation Infrastructure Design & Development
 Design, build, and maintain evaluation sets (synthetic data + real logs)
 Select and design evaluation metrics (win rate, task success, factuality, harm detection)
 Build automated evaluation pipelines and integrate them into CI/CD
 Design agent harnesses (multi-turn, tool use, long-context support)

・Safety & Quality Verification
 Plan and execute red-teaming (adversarial testing)
 Build safety and policy compliance verification frameworks
 Design and run prompt/tool regression tests
 Analyze and improve issues related to hallucination, bias, and output quality

・Statistical Analysis & Reporting
 Design and analyze statistical experiments (A/B tests, significance testing)
 Create quality reports and improvement proposals
 Visualize regression detection and quality trends
 Feed evaluation signals back to research and development teams
必要スキル
●Requirements
You May Be a Good Fit If You

Bachelor’s degree or equivalent practical experience in Computer Science, Software Engineering, Artificial Intelligence, Machine Learning, Mathematics, Physics, or related fields
3+ years of practical experience as a software engineer or QA engineer
Knowledge of LLM / generative AI evaluation methods (prompt evaluation, quantitative output quality measurement, hallucination detection, etc.)
Foundational knowledge of statistics and experimental design
Experience building evaluation pipelines in Python
Experience integrating tests into CI/CD pipelines
Experience designing prompt / tool regression tests

Language requirement (at least one of the following):
Japanese: Fluent able to discuss product development without friction
English: Business level

●Preferred Qualifications
Strong Candidates May Also Have

NLP / ML evaluation benchmark design experience
Knowledge of AI safety / Responsible AI
Red teaming / penetration testing experience
Experience evaluating multi-agent workflows, tool use, and long-context scenarios
Large-scale data processing experience (Spark / BigQuery, etc.)
Ability to read, comprehend, and reproduce research papers
Technical communication ability in English
就業場所
就業形態
正社員
企業名
上場マーケティング支援企業
企業概要
企業の収益拡大・生産性向上など様々な課題解決につながるソリューションを開発・提供するマーケティングテクノロジーカンパニー
企業PR
アドテクノロジー/デジタルマーケティングの領域において、最先端のテクノロジーを活用し、顧客企業様の収益最大化に役立つプロダクトを独自開発。中でも、WEBメディアやスマートフォンアプリの広告枠に対し、アクセスしたユーザの分析を行い、最適な広告をリアルタイムのオークション形式で届ける、プラットフォーム事業に注力しています。
業務カテゴリ
組織カテゴリ
備考
関連キーワード