Human feedback. Real-world relevance. Responsible AI.
On June 18, 2025, 色导航 and AI Circle brought together over 70 AI professionals in Mountain View, California to explore what it takes to build safe, scalable, and multilingual AI systems. Expert panelists from Google, Meta, Amazon, NVIDIA, Microsoft, and 色导航 led the discussion, including 色导航鈥檚 latest research into the persistent challenges presented by multilingual AI translation.
From Agents to AGI
Moderated by NVIDIA鈥檚 Pratik Mehta, the panel explored critical topics such as the evolving role of humans in the loop (HITL), challenges in red teaming and evaluation, and the shifting landscape of fine-tuning for foundation models.
The lineup of speakers included:
- Si Chen 鈥 VP of Strategy and Marketing, 色导航
- Himanshu Gupta 鈥 Applied Scientist, Amazon
- Mark Hoffmann 鈥 Staff ML Engineer, Meta
- Arsalan Mosenia 鈥 Tech Lead for AI Agents, Google
- Mayur Shastri 鈥 Software Engineer, Microsoft
Key Takeaways
1. Human-in-the-Loop Is Evolving, Quickly
At Meta, HITL isn鈥檛 just annotation. It鈥檚 a multi-layered process involving subject-matter expert validation, iterative relabeling, and quality loops across use cases like legal data and low-resource languages.
色导航鈥檚 Si Chen emphasized that HITL is core to building AI that respects linguistic and cultural nuance, especially across 500+ languages and 200+ markets. But she also noted that the challenge lies in sourcing and training the right experts, and making sure their knowledge is captured in ways that models can learn from.
鈥淎 lot of the focus today around how we leverage experts is: how do we find the right rubrics, the right benchmarks, the right guidelines to extract the knowledge that exists in the minds of these experts, and shift it toward consistent representation?鈥
鈥 Si Chen, 色导航
This shift toward domain-specific human expertise, not just annotation at scale, reflects the industry鈥檚 recognition that model quality depends just as much on what is annotated as who is doing the data annotation.
2. Fine-Tuning Isn鈥檛 Going Away
Forget the idea that fine-tuning is obsolete. Microsoft鈥檚 Mayur Shastri underscored the need for domain-specific fine-tuning, especially in fields like healthcare and legal reasoning. Retrieval-augmented generation (RAG) may be emerging, but it complements rather than replaces fine-tuning for many enterprise needs.
3. Agents Are the Next Frontier, But Not Without Risk
Google鈥檚 Arsalan Mosenia highlighted the shift from agent frameworks to autonomous agents capable of completing enterprise tasks from end to end. But adoption won鈥檛 be easy. Concerns around security, reliability, and error compounding in agentic systems remain major challenges, especially in sensitive applications where caution around AI safety is vital.
4. Synthetic Data Is Powerful, But Needs Human Oversight
Synthetic data is crucial for scaling, but the panel agreed it must be paired with human oversight to ensure high-quality outputs. Si noted that synthetic variants often lack the cultural, domain, and linguistic grounding required for safe and aligned model behavior.
5. Benchmarks Are Breaking, And That鈥檚 Not a Bad Thing
Several panelists shared concerns that benchmarks are increasingly outpaced by model performance. As Mayur noted, the real challenge is evaluating models in context-specific, evolving, and high-stakes scenarios, and that鈥檚 where human expertise remains essential.
Open Questions for the Industry
The audience Q&A pushed the conversation further:
- How do we preserve human judgment as AI systems grow in autonomy?
- Who gets to decide how models should respond, especially in subjective domains?
- What鈥檚 the path from HITL to AI-in-the-loop?
There were no easy answers. But the consensus was clear: building real-world AI isn鈥檛 just about scale. It鈥檚 about nuance, responsibility, and collaboration between humans and machines.
Looking Ahead
As AI becomes more embedded in everyday products, from shopping assistants to global ad platforms, the need for human-guided, multilingual, and culturally aware systems has never been greater.
Events like AI Circle x 色导航 serve as a crucial reminder that you can鈥檛 outsource alignment to your training data pipeline. You need people - experts, linguists, domain specialists - in the loop, every step of the way. 聽
Contact us to learn more about how 色导航 partners with leading teams to build safe and scalable AI.