TITLE: Microsoft Ships Three Enterprise AI Models Through Foundry DATE: 2026-04-04 COMPANY: Microsoft TOPIC: Enterprise AI SUMMARY: Microsoft launched MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 on 3 April 2026 through Microsoft Foundry. The three models cover speech-to-text, voice generation, and image creation at commercially competitive pricing, and are available immediately to enterprise developers. All three already power Microsoft's own products including Copilot, Bing, and Azure Speech. WHAT CHANGED: On 3 April 2026, Microsoft announced three new foundational models under its MAI (Microsoft AI) series, available immediately through Microsoft Foundry. MAI-Transcribe-1 is Microsoft's first-party speech recognition model, supporting 25 languages with a 3.8 percent Word Error Rate, which Microsoft reports as the lowest among its competitive set. The model delivers batch transcription speeds 2.5 times faster than Microsoft's existing Azure Fast offering at approximately 50 percent lower GPU cost. Pricing is set at $0.36 per audio hour. The model is engineered for real-world audio conditions including varied accents, background noise, and long-form recordings. MAI-Voice-1 is a speech generation model capable of producing 60 seconds of expressive audio in under one second on a single GPU. The model preserves speaker identity across long-form content and supports custom voice creation from just a few seconds of recorded audio. It is already powering the voice experiences in Copilot's Audio Expressions and podcast features. Pricing is $22 per one million characters. MAI-Image-2 is Microsoft's highest-capability text-to-image model, debuting at number 3 on the Arena.ai leaderboard for image model families. The model excels at natural lighting, accurate skin tones, and clear in-image text rendering. Pricing starts at $5 per one million text input tokens and $33 per one million image output tokens. All three models are immediately available through Microsoft Foundry. The MAI Playground, which offers a no-code interface for testing all three models, is currently restricted to US-based users. WHY IT MATTERS: Microsoft has moved from reselling OpenAI models to shipping its own foundational capabilities across three core modalities, reducing its dependency on external providers Pricing is set below or at parity with leading alternatives, making enterprise multimodal AI substantially more accessible for mid-sized organisations Consolidating speech, voice, and image AI onto a single governed platform (Foundry) simplifies procurement, security review, and compliance for enterprise buyers MAI-Transcribe-1's $0.36 per hour rate makes automated transcription viable at scale for businesses that previously could not justify the cost Custom voice creation from seconds of audio opens branded audio production to organisations without dedicated voice talent or recording infrastructure The models already run inside Microsoft's own products, giving enterprise customers an immediate proof point for production reliability DAVID & GOLIATH ANALYSIS: The story here is not just three new models. It is the platform underneath them. Microsoft is building a unified AI infrastructure layer that competes directly with OpenAI's API, Google Cloud, and AWS Bedrock, and it is doing so from inside an ecosystem that hundreds of millions of businesses already use daily. For operators running lean organisations, this matters for a specific reason: every new AI capability that lands inside Microsoft Foundry is one fewer vendor relationship to manage. Speech transcription, voice generation, and image creation have historically required three separate tool evaluations, three separate contracts, and three separate security reviews. That friction is a real barrier for small and mid-sized teams. Consolidation onto Foundry removes it. The immediate play is MAI-Transcribe-1. At $0.36 per audio hour, automated transcription of meetings, client calls, and internal briefings is now economically trivial. Any organisation spending time on manual note-taking or paying a third-party transcription service should run a direct cost comparison this week. The performance benchmarks are strong. The pricing is competitive. The integration pathway for Microsoft 365 customers is straightforward. RELEVANT SYSTEMS: AI Growth Engine, Employee Amplification Systems SOURCE URL: https://davidandgoliath.ai/daily-ai-briefing/microsoft-mai-models-enterprise-multimodal-ai FEED URL: https://davidandgoliath.ai/daily-ai-briefing/feed --- Published by David & Goliath | https://davidandgoliath.ai Daily AI Briefing: one AI development per day, decoded for business operators. This is a structured companion file optimised for LLM retrieval and citation.