AI is achieving better-than-human levels of performance for certain very specific, or closed, tasks; broader application of these successful tools in real-world environments requires progress in robustness.

The benchmark of achieving better-than-human performance has been a catalyst for public conversations about the potential of AI technologies. There are already some tasks where AI’s performance against human benchmarks has been well-documented: in object detection from static images, for example, AI tools have performed at higher-than-human levels since 2017.  In other areas, how AI compares to current human practices is less clear. In image analysis systems for cancer diagnosis, for example, the performance of AI in deployment in comparison to human radiologists has been mixed. Performance for restricted tasks continues to improve, and the range of tasks for which AI could be used continues to grow.  However, these restricted tasks are not often representative of real-world challenges; even small deviations from these constrained environments might derail performance. Adding snow to landscape images, for example, drastically decreases the performance of current systems for image recognition. However, large, foundation models are opening the possibility of creating broader-purpose AI tools that can be trained on one task and applied for another. One approach to increasing AI performance on a range of tasks is transfer learning – the ability to take learning from one task and apply it to another. Despite being close to human standards, this remains challenging. The creation of AI that delivers high performance in dynamic environments will require progress in robustness and deployability, creating tools that are suitable for broader applications.