The field of Artificial Intelligence (AI) is evolving at breakneck speed, reshaping the industry. So, how has the month of April fared for the world of AI? This article breaks down the latest AI breakthroughs for April 2025. Let’s take a deeper look.
OpenAI’s Latest Models & Feature Updates
-
OpenAI’s New GPT-4.1 AI Models
OpenAI has rolled out not one but three new AI models. The company called this the GPT-4.1 family of AI models, with three variants:
1. GPT-4.1
2. GPT-4.1 mini
3. GPT-4.1 nanoThis new trio of Large Language Models (LLMs) can only be accessed via an Application Programming Interface (API) and will not be available through ChatGPT.
The San Francisco-based company claims that these models outperform previous versions (GPT-4o and GPT-4o mini) on various metrics, faring remarkably well in following instructions and coding. But that’s just the tip of the iceberg.
OpenAI says that these API-exclusive models will get multiple upgrades over the GPT-4o models, including higher performance, increased context window, and competitive pricing. The information and data available for these models have been enhanced, with the knowledge cutoff now expanded to June 2024.
-
OpenAI’s Mini Models
Want an AI model that can “think with images”? OpenAI’s o3 and o4-mini models represent what experts call “a step change in AI capabilities.” These two models are designed to use tools independently and can reason with images, meaning that if a user sketches rough, poor-quality diagrams, these models can comprehend and analyze them for further use.
Deemed the most intelligent and capable of the reasoning models to date in OpenAI’s “o-series,” o3 and o4-mini can:
- Integrate images directly into their reasoning process
- Search the web
- Run code
- Analyze files
- Generate images within a single task flow
Excited users can access these mini AI models on ChatGPT Plus, Pro, and Team.
-
OpenAI’s New AI Coding Agent
-
Grok Gains Vision, Multilingual Audio & Real-Time Search Features
In recent artificial intelligence news, xAI’s chatbot recently upgraded its iOS app with a new computer vision feature—Grok Vision (currently an iOS-exclusive feature)—which is on par with Gemini Live with Video and ChatGPT’s Advanced Voice with Video. Grok can now access the device’s camera and process the feed in real-time. All you need to do is point your device at any object, and Grok can answer your questions about it. How cool is that?
xAI’s April achievements don’t end here. Grok now offers real-time web search and multilingual audio support for conversations in five additional languages, including Hindi, Spanish, French, Turkish, and Japanese. This means, in addition to accepting multilingual text input and generating texts in these languages, Grok can now also understand multilingual verbal prompts and respond in the same language, making interactions smoother than ever. Additionally, the voice mode is also being upgraded with real-time web search.
Wait, there’s more. With new multilingual audio support and real-time web search, Grok enhances the app experience for iOS and Android smartphones. iPhone users can tap into these features for free. Android users, however, must pay to access them. All of these enhancements are part of the chatbot’s Voice Mode.
-
Grok’s New Memory Feature
Elon Musk’s xAI is leaving nothing to chance when it comes to refining its chatbot, Grok, to be on par with its contemporaries like Google’s Gemini and ChatGPT.
In April, the company enhanced Grok with a “memory” feature that enables the bot to remember details from past conversations with users. So, the next time you ask Grok for recommendations, you will receive personalized responses if you’ve chatted with it long enough to give the bot time to “learn” your preferences.
“Memories are transparent: You can see exactly what Grok knows and choose what to forget,” reads a post from the official Grok account on X. Grok’s new memory feature is unavailable for users in the UK and the EU. For users elsewhere, this feature is available in beta on Grok.com and the Grok iOS and Android apps.
-
xAI’s Grok Studio
The next big thing from xAI is Grok Studio, a new tool that lets users:
- Generate codes, reports, and documents
- Quickly create browser games
Grok Studio will open the content in a separate window, allowing both Grok and the user to collaborate on the content together. When users ask Grok to generate a piece of code, it will show up in a “preview” tab that displays things like HTML snippets and runs C++, TypeScript, JavaScript, Python, and bash scripts. Additionally, you can attach files like spreadsheets, slides, and documents in a few clicks with the help of its Google Drive integration. Grok Studio can be accessed by everyone for free.
Hot on the heels of the o3 and o4-mini models came Codex CLI, OpenAI’s new open-source coding agent. This new AI coding agent is capable of autonomously carrying out programming tasks —reading, modifying, and running code —locally on a user’s terminal.
Codex CLI is powered by the o4-mini model by default and integrates the company’s AI models with the client’s Command-Line Interface (CLI). However, users can select their desired AI from the company through the Responses API option.
xAI’s Grok Feature Updates
Anthropic’s Claude Feature Updates
-
Claude’s New Research Capability & Google Workspace Integration
Anthropic’s AI chatbot, Claude, could be the perfect assistant you are looking for to search and reference your emails in Gmail, documents in Google Docs, and scheduled events in Google Calendar. All this is possible because, in the latest AI updates, Anthropic’s Claude leapt into the future by integrating with Google Workspace. Currently, the integration is in beta for subscribers on Anthropic’s Max, Team, Enterprise, and Pro plans.
According to Anthropic, for managing multi-user accounts, there’s a little preliminary work to do: Administrators need to enable this integration on their end before users can connect their Google Workspace and Claude accounts. The goal is to give Claude more personally tailored responses without the hassle of repeatedly uploading files or crafting detailed, lengthy prompts.
Anthropic’s Voice Mode Feature for Claude
Anthropic is reportedly working on unveiling a much-anticipated voice chat feature for its in-house AI chatbot, Claude. This new capability, known as “voice mode,” is expected to roll out as soon as this month, adding a new dimension to the interaction with the AI. While competitors like OpenAI and Google have already incorporated a dedicated voice mode in their chatbots, Anthropic has so far only been quietly honing its text interface.
In the AI latest news, according to a Bloomberg report, Anthropic is gearing up to launch a new voice mode feature and is planning to add three distinct voices for users to choose from:
1. Airy
2. Buttery
3. Mellow
Buttery boasts a charming British accent that might just win over fans! However, it remains to be seen whether this voice feature will be accessible to all users or only to paid subscribers.
Google’s AI Model Updates
-
Google’s Veo 2 Video Generation AI Model
In an exciting leap forward, Google unveiled the Veo 2 AI model, now integrated into its Gemini chatbot. This powerful new tool is being launched globally in all the languages Gemini supports to the Gemini Advanced subscribers (users on the free tier will not have access to the model). With this AI model, eligible users can do the following: create stunning eight-second-long videos with text prompts in natural language.
Veo 2 can be selected from the model picker menu in both the web client and mobile apps of Gemini. The videos generated can be downloaded in MP4 format and are crisp at 720p resolution and come in the popular 16:9 aspect ratio, making them perfect for sharing on social media. However, there’s a limit on the number of videos you can create each month, and users will be notified as they approach the limit.
Microsoft’s Copilot Feature Updates
-
Memory, Podcasts & Agentic Features on Copilot
On the other side of the tech spectrum, Microsoft has made transformative updates to its AI chatbot, Copilot. The tech titan has packed this chatbot with several new features that promise to deliver a more personalized and engaging experience. While some of the new features were previously available only on the web client, users can now leverage these features from mobile devices and Windows desktop apps.
Among the most intriguing AI updates in April is Copilot’s memory capability, which allows it to retain/remember certain details about the user, tailoring its responses more effectively. Plus, it is introducing an agentic feature that enables Copilot to perform/complete tasks directly on the web.
And for aspiring content creators, the new podcast creation feature called “Pocasts” will let users generate personalized audio session of content based on their interests.
Meta’s AI Model Updates
-
Meta’s New Multi-Modal AI Models
Meta’s Llama 4 is a fresh collection of AI models in its Llama family—designed to redefine what we can expect from AI. This latest lineup introduces three striking new models:
1. Llama 4 Scout
2. Llama 4 Maverick
3. Llama 4 BehemothAccording to Meta, these models were trained on large volumes of unlabeled text, image, and video data to impart a broad visual understanding.
Scout and Maverick are the first-ever open models built with Mixture of Experts (MoE) architecture, available on Llama.com and from Meta’s partners, including the AI dev platform Hugging Face, while Behemoth is still in training.
Meta is also integrating Llama 4 into its suite of applications, including WhatsApp, Messenger, and Instagram, enhancing the experience for users across 40 countries with the upgraded capabilities of Meta AI.
Noteworthy AI Updates From April
1. Adobe’s New Firefly & Redesigned Firefly Web App
Adobe is revolutionizing the creative world with the launch of the latest iteration of its Firefly family of image generation AI models—an exciting model dedicated to generating stunning vectors and a sleek, redesigned web app that brings together an impressive array of AI models. Through the app, users will have access to Adobe’s proprietary AI models, including Firefly Image Model 4 and Firefly Video Model, as well as third-party AI models like Gemini and ChatGPT, created by its partners Google and OpenAI, respectively.
2. Character.AI’s AvatarFX Model
In a leap towards an animated future, Character.AI, the innovative California-based AI platform, has just unveiled its first video generation model—AvatarFX, an image-to-video model capable of generating vibrant 2D and 3D animated characters. Additionally, these characters will be complete with lifelike speech, thanks to the company’s native text-to-speech (TTS) models. AvatarFX will initially be a privilege for paid subscribers.
3. Google’s Music-Generating AI Model
Google is making waves in the music world with updates to several of its first-party media-generating AI models on the Vertex AI cloud platform. Storming onto the scene is Lyria, Google’s text-to-music model, designed to provide a fresh alternative to traditional, royalty-free music libraries. With Lyria, users can compose songs in diverse styles and genres (now available in preview for select customers).
Along with Lyria, Google has introduced an intriguing voice-cloning feature powered by Chirp 3, Google’s audio understanding model, for “allow-listed” users.
4. Google’s Gemini Code Assist With “Agentic” Abilities
On the coding front, Google’s Gemini Code Assist is stepping up its game with its new “agentic” capabilities. This AI coding assistant can now deploy new AI agents that can take multiple steps to complete/accomplish complex programming tasks. These agents can create applications from product specifications in Google Docs, for example, or perform code transformations from one programming language to another. This powerful tool is now available in Android Studio, broadening its usability for developers.
5. Amazon’s New AI Voice Model, Nova Sonic
With Nova Sonic’s debut, Amazon has taken a bold leap into the world of AI. This generative AI model is on par with frontier voice models rolled out by giants like Google and OpenAI in terms of speed, conversational quality, and speech recognition.
Nova Sonic is capable of natively processing voices and generating speech that sounds remarkably natural. Developers can harness this cutting-edge technology through Bedrock—Amazon’s developer platform—for building enterprise AI applications with the help of a new bi-directional streaming API.
6. Amazon’s Nova Reel 1.1 AI Model
Amazon released a new AI video creation model—Nova Reel 1.1—that is capable of generating video content up to two minutes with just one prompt. This model replaces the Nova Reel model released last year. While the latest video model has quality and latency enhancements, the most notable upgrade is the total duration of video content it can generate in a single run. This AI model is currently accessible to developers and general users through the Amazon Bedrock platform.
7. New ChatGPT Feature
On the other side of the AI landscape, OpenAI is enhancing the ChatGPT experience by introducing an innovative feature that makes accessing all images generated by ChatGPT much easier. This new image library consolidates all ChatGPT-generated images, enabling users to view all images in one place. This update is available for users on free, Plus, and Pro ChatGPT plans and can be accessed on the web as well as the ChatGPT mobile app. The new feature is aimed at users who extensively generate images with ChatGPT.
8. Nari Labs’ Dia
Nari Labs is a newbie to the world of AI. This two-person start-up has launched Dia, a 1.6 billion-parameter TTS model. Dia is capable of delivering naturalistic dialogues directly from text prompts; individual users can try generating speech from it on a Hugging Face Space. One of Dia’s creators claims that Dia’s performance surpasses that of competing proprietary offerings from the likes of ElevenLabs and Google’s hit NotebookLM AI podcast generation product.
9. Microsoft’s Hyper-Efficient AI Model
Microsoft is once again proving to be a formidable contender in the AI race with their latest introduction, BitNet b1.58 2B4T, the largest-scale 1-bit AI model. Also called “bitnet” to date, this model is supposedly the first of its kind with a whopping 2 billion parameters, “parameters” being largely synonymous with “weights.” This bitnet is accessible to users with an MIT license and can run on CPUs, including Apple’s M2.
The model easily outperforms all traditional models of a similar size, according to Microsoft’s researchers. This claim is no surprise since the bitnet is trained on a dataset of 4 trillion tokens, which is equal to about 33 million books!
10. Baidu’s ERNIE X1 Turbo & 4.5 Turbo
Already a top contender in the AI race, China’s Baidu unveiled ERNIE X1 Turbo and ERNIE 4.5 Turbo. As their names denote, these models are faster, highly innovative, and more affordable than their predecessors. Designed to supercharge multimodal understanding, ERNIE X1 Turbo delivers impressive performance at a fraction of the cost (only $0.14 per 1 million input tokens and $0.55 per 1 million output tokens). This cost estimate is roughly 25% cheaper than DeepSeek R1, according to Baidu.
11. Google Releases Cost-Efficient & Low-Latency Gemini 2.5 Flash AI Model
Joining Google’s Gemini 2.5 family is Gemini 2.5 Flash—a cost-efficient, low-latency model designed for tasks requiring real-time inference, conversations at scale, and those that are generalistic in nature. Interested users and developers can soon access Gemini 2.5 Flash on both the Google AI Studio and Vertex AI to build their applications and agents.
12. OpenAI Makes Its Upgraded Image Generator Available via API
OpenAI has unveiled its cutting-edge image generation feature in ChatGPT, now accessible through its API. This advancement allows developers to seamlessly integrate stunning visual creations into their applications and services. The heart of this innovation is the “gpt-image-1” model, a natively multimodal model that can generate images across a wide array of styles. Imagine generating artwork that adheres to specific guidelines, leverages vast world knowledge, and even renders text! Developers can get creative, producing multiple images simultaneously while fine-tuning both the quality and speed of their creations.
13. xAI Releases Grok 3 in API for Developers
xAI has introduced a new API, which allows developers to create applications and software using the Grok 3 AI model. On its documents page, xAI detailed the new API. Developers can access four LLMs (in total) through this new API:
- Grok 3
- Grok 3 mini (both available in beta)
- Two faster variants of the same models
It is important to note that these AI models will only work with text (text-only) and cannot create images.
Upcoming AI Developments to Keep an Eye on
1. OpenAI is Gearing Up for an Open-Source Reasoning-Focused AI Model
In the upcoming months, we can expect OpenAI to shake things up with the release of its open-source, reasoning-focused AI model—the first of its kind from the company. It is important to recall that this is OpenAI’s first open-source release since its GPT-2 model, which was introduced in November 2019.
The company is busy collecting feedback from the developer community to refine its model based on their requirements.
2. Google is Expanding Gemini Live With Camera & Screen Share Features
Google is set to take real-time interaction to the next level with the rollout of Gemini Live, with camera and screen share features that all Android users can enjoy. However, there’s a condition: these capabilities will be available only for users with a Gemini Advanced subscription.
So, here’s what the new version of Gemini Live will be capable of:
- Gemini Live with camera can access the device’s camera feed and process the visual information in real-time to have a conversation about what the user sees.
- Gemini Live with screen share enables users to let the AI chatbot access the user’s screen and offer assistance across various menus, interfaces, and apps.
3. Gemini 2.5 Pro Experimental Now Powers Agentic Deep Research Feature
Gemini 2.5 Pro Experimental AI model (expanding to other tools) from Google is all set for a significant upgrade. The company recently announced that its Gemini model will power the agentic Deep Research tool. This latest feature is currently exclusive to paid subscribers of the AI platform, while free users can still enjoy the Gemini 2.0 Flash Thinking (experimental) model. Google promises that this LLM will allow users to see significant enhancements in the tool’s analytical reasoning skills, paving the way for smarter interactions.
4. OpenAI is Reportedly Working on a Social Media Platform
In a thrilling twist, as per the report, OpenAI is said to be gearing up to integrate AI capabilities into the social app. While the specifics of the AI integration remain under wraps, OpenAI aims to shake up things in the social media space, going head-to-head with titans like X and Meta. This news comes on the heels of OpenAI’s release of the innovative GPT-4.1 family of AI models, creating anticipation for what’s to come.
5. OpenAI Rolls Out a “Lightweight” Version of ChatGPT Deep Research Tool
Users thriving on ChatGPT for in-depth web research will be delighted to use the lightweight version, which can scour the web to compile accurate and extensive research reports. OpenAI plans to make this feature available to ChatGPT Plus, Team, and Pro users. This new lightweight deep research, which will also come to free ChatGPT users, is powered by the o4-mini model. “Responses will typically be shorter while maintaining the depth and quality you’ve come to expect,” OpenAI said in a series of posts on X.
Looking Back, Moving Forward
With AI set to become the cornerstone of everything we do, staying informed about the latest developments is crucial. At Maayu.ai, we help businesses leverage these advancements for better outcomes and to stay competitive in a dynamic, AI-powered world. Explore the future of AI with Maayu.