Why some productized agent systems will fail, and others won't. My two cents on why evals matter and what we can do about it.