Success Stories
Learn how companies around the world are pushing the boundaries of AI with LLM post-training and evaluation

Detecting hidden harm in long contexts: How Toloka built AWS Bedrock's advanced safety dataset
Jul 14, 2025

Standardizing AI safety with MLCommons
May 15, 2025

AI agents under attack: A case study on advanced agent red-teaming
Apr 28, 2025

Multi-domain, multi-language SFT dataset pushes LLM performance to the next level
Oct 22, 2024

Toloka helps ServiceNow increase evaluation throughput multiple times
Oct 11, 2024

LLM for code generation: a scalable pipeline to gather SFT data
Apr 29, 2024

Building a lead classification system for 10x client leads
Dec 13, 2023

LLM costs vs quality: How Eightify picked the right GPT model
Dec 11, 2023

Perplexity enhances LLMs with holistic quality evaluation
Dec 8, 2023
Load More