Antonio Torralba

Understanding the visual knowledge of language models | MIT News

Mathew12 months ago09 mins

You’ve likely heard that a picture is worth a thousand words, but can a large language model (LLM) get the picture if it’s never seen images before? As it turns out, language models that are trained purely on text have a solid understanding of the visual world. They can write image-rendering code to generate…

Daily Search Forum Recap: October 8, 2024

Meta introduces generative AI video advertising tools

How to use Microsoft Clarity for deeper website analytics

8 ways to keep human creativity front and center

What you need to know in 2025

How to use images and videos in 2025

Antonio Torralba

Understanding the visual knowledge of language models | MIT News