ArticleTech

Amazon Surpasses OpenAI's ChatGPT with Multimodal-CoT Generative AI

Recent advancements in AI technology have allowed large language models (LLMs) to perform well in complex reasoning tasks through chain-of-thought (CoT) prompting.

Junja Choudhary Editorial Desk

February 13, 2023 • 2:24 PM

Tech

NEWS CARD

“Amazon Surpasses OpenAI's ChatGPT with Multimodal-CoT Generative AI”

Read more onsangritoday.com

13 Feb 2023

Amazon Surpasses OpenAI's ChatGPT with Multimodal-CoT Generative AI

OpenAI's ChatGPT has been making waves in the AI community for the past two months, with discussions about its potential impact on various fields including business and education. However, tech giants Google and Baidu have since entered the chatbot scene, showcasing their own generative AI technologies. Now, Amazon has entered the race with a new language model that outperforms OpenAI's GPT-3.5 on the ScienceQA benchmark by 16 percentage points, even surpassing human performance.

Recent advancements in AI technology have allowed large language models (LLMs) to perform well in complex reasoning tasks through chain-of-thought (CoT) prompting. However, the current research in CoT focuses solely on the language modality, often using a multimodal-CoT paradigm to find CoT reasoning in multiple inputs such as language and vision.

Most existing methods of multimodal-CoT combine multiple inputs into a single modality before asking LLMs to perform CoT. But this can lead to information loss and produce hallucinatory reasoning patterns. To overcome these limitations, Amazon researchers have developed Multimodal-CoT, which combines visual features in a separate training framework. The framework divides the reasoning process into two parts: finding a reason and determining the answer. By incorporating vision in both stages, the model is able to provide more convincing arguments and draw more accurate conclusions.

The inference and reasoning-generating stages of the Multimodal-answer CoT use the same model architecture but differ in their inputs and outputs. In the rationale generation stage, the model is fed data from both the visual and language domains, and the rationale is then added to the language input in the answer inference step. The textual representation of the language is made through a Transformer encoder and combined with the visual representation, which is then fed into the Transformer decoder.

Tags:

Junja Choudhary Official | Verified Expert • 13 Apr, 2026Editorial Desk

Junja Choudhary serves as the Editor-in-Chief of Sangri Today. A dynamic news personality and rigorous fact-checker, he brings more than 7 years of professional experience in print and digital media. His editorial leadership is defined by a strong commitment to journalistic ethics, truth-seeking, and delivering well-researched, balanced reporting on critical issues.

Amazon Surpasses OpenAI's ChatGPT with Multimodal-CoT Generative AI

Recent advancements in AI technology have allowed large language models (LLMs) to perform well in complex reasoning tasks through chain-of-thought (CoT) prompting.

amp_stories Web Stories

bolt Key Moments

Would you like to receive news updates on WhatsApp?

Tags:

Junja Choudhary Official | Verified Expert • 13 Apr, 2026Editorial Desk

Related Posts

Honor Watch 6 Plus is Here to Challenge Your Sports Watch with 42-Hour GPS ...

Xiaomi 17T India Launch Confirmed for June 4: Dimensity 8500 Ultra, 6,500 m...

Gemini App Now Uses Compute-Based Limits Instead of Message Caps, Users Say...

Lava Bold N2 5G Launching June 3 in India: Clean Android 16, Dual Cameras, ...

Trending Posts

Popular Posts

Digital Archives

Follow Us

Recommended Posts

amp_stories Web Stories

bolt Key Moments