Tag:

Multimodal

AI News
Google AI introduces Gemini Embedding 2: a multimodal embedding model that lets you bring text, images, video, audio, and docs into the embedding space.

by ai-intensify March 11, 2026

March 11, 2026

Google expands its Gemini model family with the release of gemini embedding 2. This second generation model enables text-only success gemini-embedding-001 And it is specifically designed to address the high-dimensional …

0 Facebook Twitter Pinterest Email
Generative AI
Microsoft Releases Phi-4-Reasoning-Vision-15B: A Compact Multimodal Model for Mathematics, Science, and GUI Understanding

by March 7, 2026

March 7, 2026

Microsoft has released fi-4-logic-vision-15bA 15 Billion Parameter Open-Weight Multimodal Reasoning Model Designed for image and text tasks that require both perception and selective reasoning. It is a compact model designed …

0 Facebook Twitter Pinterest Email
AI News
YuanLab AI Releases Yuan 3.0 Ultra: A Flagship Multimodal MOE Foundation Model, Built for Strong Intelligence and Unmatched Efficiency

by March 5, 2026

March 5, 2026

How can a trillion-parameter large language model achieve state-of-the-art enterprise performance while reducing its total parameter count by 33.3% and increasing pre-training efficiency by 49%? Yuan Lab AI releases Yuan3.0 …

0 Facebook Twitter Pinterest Email
AI News
Google AI introduces Natively Adaptive Interface (NAI): an agentic multimodal accessibility framework built on Gemini for adaptive UI design

by February 11, 2026

February 11, 2026

Google Research is proposing a new way of creating accessible software with Natively Adaptive Interfaces (NAI), an agentic framework where a multimodal AI agent becomes the primary user interface and …

0 Facebook Twitter Pinterest Email
AI News
How to Design Complex Deep Learning Tensor Pipelines Using Enops with Vision, Attention, and Multimodal Examples

by February 10, 2026

February 10, 2026

section(“6) pack unpack”) B, Cemb = 2, 128 class_token = torch.randn(B, 1, Cemb, device=device) image_tokens = torch.randn(B, 196, Cemb, device=device) text_tokens = torch.randn(B, 32, Cemb, device=device) show_shape(“class_token”, class_token) show_shape(“image_tokens”, image_tokens) …

0 Facebook Twitter Pinterest Email
AI News
Stanford Researchers Build SleepFM Clinical: A Multimodal Sleep Foundation AI Model for 130+ Disease Prediction

by January 8, 2026

January 8, 2026

A team of Stanford Medicine researchers has introduced SleepFM Clinical, a multimodal sleep foundation model that learns from clinical polysomnography and predicts long-term disease risk from a single night of …

0 Facebook Twitter Pinterest Email
Generative AI
Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): Audiovisual encoder powering SAM audio and large-scale multimodal retrieval

by December 22, 2025

December 22, 2025

Meta Researchers Introduce Perception Encoder Audiovisual, PEAVAs a new family of encoders for joint audio and video understanding. The model learns aligned audio, video and text representations in a single …

0 Facebook Twitter Pinterest Email
AI News
Google Introduces T5Gemma 2: Encoder-Decoder Model with Multimodal Input via SigLIP and 128K Context

by December 19, 2025

December 19, 2025

Google has published T5Gemma 2open family encoder-decoder Custom-made transformer checkpoints Gemma 3 Pre-trained weights in an encoder-decoder layout, then continuing pre-training with that UL2 Objective. is released pre trained onlyThe …

0 Facebook Twitter Pinterest Email
Generative AI
Meta AI Releases SAM Audio: A State-of-the-Art Unified Model That Uses Spontaneous and Multimodal Signals for Audio Separation

by December 17, 2025

December 17, 2025

Meta has released SAM Audio, a quick-driven audio separation model that targets a common editing constraint, separating a sound from a real-world mix without creating a custom model per sound …

0 Facebook Twitter Pinterest Email