AI modely & technologieSeptember 1, 2025|2 min

Microsoft MAI-Voice-1 and MAI-1 Preview

What Microsoft Introduced Microsoft AI (MAI) has introduced two new in-house artificial intelligence models as part of its mission to create AI that empowers all people...

Tým Apertia

Apertia.ai

What Microsoft Introduced

Microsoft AI (MAI) has introduced two new in-house artificial intelligence models as part of its mission to create AI that empowers all people around the world. The company released MAI-Voice-1, a speech generation model that can produce a full minute of audio in less than a second on a single GPU, and MAI-1 Preview as a foundational model trained end-to-end. After years of relying on OpenAI technology, Microsoft is finally building its own AI stack. This move has several key reasons:

Strategic independence from external AI technology
Control over innovation pace without waiting for partners
Cost optimization by eliminating API fees
Better integration with the Microsoft ecosystem

MAI-Voice-1: Technical Specifications

MAI-Voice-1 is a fast and flexible speech generation model with these key parameters:

Performance Metrics

Speed: Full minute of audio in less than a second
Hardware: Runs on a single GPU
Quality: High-fidelity expressive audio
Flexibility: Support for mono and multi-speaker scenarios

Practical Production Use

Want a Custom AI Solution?

We help companies automate processes with AI. Contact us to find out how we can help you.

Response within 24 hours
No-obligation consultation
Solutions tailored to your business

MAI-Voice-1 already powers features in several Microsoft applications:

Copilot Daily: automatic daily summaries with personalized voice
Copilot Podcasts: converting text content to audio format
Copilot Labs: a new platform where users can test expressive speech and storytelling capabilities, including creating interactive stories and personalized meditations

Competitive Advantages

Parameter	MAI-Voice-1	Competition
Speed	<1 second/minute	3-5 seconds/minute
Hardware	1 GPU	Multi-GPU cluster
Latency	Ultra-low	Standard
Integration	Native Microsoft	API calls

MAI-1 Preview: Language Model

Architecture and Training Details

MAI-1 Preview is the first Microsoft foundational model trained end-to-end on approximately 15,000 NVIDIA H100 GPUs:

Architecture: in-house mixture-of-experts (MoE) model
Design: designed to follow instructions and provide helpful answers to everyday questions
Optimization: focused on consumer use cases with emphasis on instruction following
Training approach: complete end-to-end training without relying on external components

MoE Architecture Benefits

Efficiency: activates only relevant subset of parameters, dramatically reducing computational requirements
Scalability: adding expert networks for new domains, flexible resource allocation