What is Multi-modal AI?
AI systems that can understand and work with multiple types of input such as text, images, audio, and video.
Why It Matters
Multi-modal AI can analyse images, listen to audio, and read text together, enabling richer and more useful applications.
Real-World Example
Uploading a photo of a room and asking AI to suggest furniture arrangements based on the space.
“Understanding terms like Multi-modal AI matters because it helps you have better conversations with developers and make smarter decisions about your software. You do not need to be technical. You just need to know enough to ask the right questions.”
Related Terms
Large Language Model (LLM)
An AI system trained on massive amounts of text that can understand and generate human language.
Computer Vision
The field of AI that enables computers to understand and interpret images and video.
Agentic AI
AI systems that can autonomously plan, make decisions, and take actions to accomplish goals.
Image Generation
Using AI to create new images from text descriptions or other inputs.
Learn More at buildDay Melbourne
Want to understand these concepts hands-on? Join our one-day workshop and build a real web application from scratch.
Related Terms
Large Language Model (LLM)
An AI system trained on massive amounts of text that can understand and generate human language.
Computer Vision
The field of AI that enables computers to understand and interpret images and video.
Image Generation
Using AI to create new images from text descriptions or other inputs.
Agentic AI
AI systems that can autonomously plan, make decisions, and take actions to accomplish goals.
Transformer
A type of AI architecture that processes text by paying attention to relationships between all words at once, rather...
Attention Mechanism
A technique that lets AI models focus on the most relevant parts of the input when generating output.