Skip to main contentSkip to main content
Apertia.ai
Microsoft MAI-Voice-1 and MAI-1 Preview
Umělá inteligenceSeptember 1, 2025|2 min

Microsoft MAI-Voice-1 and MAI-1 Preview

What Microsoft Introduced Microsoft AI (MAI) has introduced two new in-house artificial intelligence models as part of its mission to create AI that empowers all people...

T
Tým Apertia
Apertia.ai
Share:

What Microsoft Introduced

Microsoft AI (MAI) has introduced two new in-house artificial intelligence models as part of its mission to create AI that empowers all people around the world. The company released MAI-Voice-1, a speech generation model that can produce a full minute of audio in less than a second on a single GPU, and MAI-1 Preview as a foundational model trained end-to-end. After years of relying on OpenAI technology, Microsoft is finally building its own AI stack. This move has several key reasons:
  • Strategic independence from external AI technology
  • Control over innovation pace without waiting for partners
  • Cost optimization by eliminating API fees
  • Better integration with the Microsoft ecosystem

MAI-Voice-1: Technical Specifications

MAI-Voice-1 is a fast and flexible speech generation model with these key parameters:

Performance Metrics

  • Speed: Full minute of audio in less than a second
  • Hardware: Runs on a single GPU
  • Quality: High-fidelity expressive audio
  • Flexibility: Support for mono and multi-speaker scenarios

Practical Production Use

Want a Custom AI Solution?

We help companies automate processes with AI. Contact us to find out how we can help you.

  • Response within 24 hours
  • No-obligation consultation
  • Solutions tailored to your business
More contacts
MAI-Voice-1 already powers features in several Microsoft applications:
  • Copilot Daily: automatic daily summaries with personalized voice
  • Copilot Podcasts: converting text content to audio format
  • Copilot Labs: a new platform where users can test expressive speech and storytelling capabilities, including creating interactive stories and personalized meditations

Competitive Advantages

Parameter MAI-Voice-1 Competition
Speed <1 second/minute 3-5 seconds/minute
Hardware 1 GPU Multi-GPU cluster
Latency Ultra-low Standard
Integration Native Microsoft API calls

MAI-1 Preview: Language Model

Architecture and Training Details

MAI-1 Preview is the first Microsoft foundational model trained end-to-end on approximately 15,000 NVIDIA H100 GPUs:
  • Architecture: in-house mixture-of-experts (MoE) model
  • Design: designed to follow instructions and provide helpful answers to everyday questions
  • Optimization: focused on consumer use cases with emphasis on instruction following
  • Training approach: complete end-to-end training without relying on external components

MoE Architecture Benefits

  • Efficiency: activates only relevant subset of parameters, dramatically reducing computational requirements
  • Scalability: adding expert networks for new domains, flexible resource allocation
Ready to start?

Interested in this article?

Let's explore together how AI can transform your business.

Contact us