Tag: ai-safety
All the articles with the tag "ai-safety".
-
Stop trusting LLM benchmarks
hrbrmstrEight major AI benchmarks can be gamed to near-perfect scores without solving tasks. Berkeley researchers show the scoring harnesses were never secure — and scores already inflated in the wild.