The Context Problem Nobody's Fixing in Talk…

Feb 19

What testing NL-to-SQL on an enterprise-scale schema - with multi-layer validation and statistical significance revealed about production readiness.

Read →

4 Comments

Claire Gouze

Feb 26

That's very interesting! Which tool did you use to test the agent?

Did you rebuilt your own AI agent?

And how did you test that each question had the right answer?

I did a similar benchmark but also included other metrics like LLM token costs, query costs, and time to answer. Is it something you've analyzed as well?

Ramona C. Truta

Feb 19Edited

Very interesting experiment.

I'm curious how you accounted for hot/cold cache as you switch between contexts. Benchmarking should be done in a cold cache + cold OS guaranteed environment, and with a replication factor.

Jessica Talisman, MLS

Feb 19

Curious what is your definition of context?

Reply (1)

Manoj Shanmugasundaram

Feb 19

This post is a great summary: https://atlan.com/know/context-layer-enterprise-ai/

Context & Chaos

The Context Problem Nobody's Fixing in Talk…