What Microsoft Introduced
Microsoft
AI (MAI) has introduced two new in-house artificial intelligence models as part of its mission to create
AI that empowers all people around the world. The company released
MAI-Voice-1, a speech generation model that can produce a full minute of audio in less than a second on a single GPU, and
MAI-1 Preview as a foundational model trained end-to-end.
After years of relying on OpenAI technology, Microsoft is finally building its own
AI stack. This move has several key reasons:
- Strategic independence from external AI technology
- Control over innovation pace without waiting for partners
- Cost optimization by eliminating API fees
- Better integration with the Microsoft ecosystem
MAI-Voice-1: Technical Specifications
MAI-Voice-1 is a fast and flexible speech generation model with these key parameters:
Performance Metrics
- Speed: Full minute of audio in less than a second
- Hardware: Runs on a single GPU
- Quality: High-fidelity expressive audio
- Flexibility: Support for mono and multi-speaker scenarios
Practical Production Use
MAI-Voice-1 already powers features in several Microsoft applications:
- Copilot Daily: automatic daily summaries with personalized voice
- Copilot Podcasts: converting text content to audio format
- Copilot Labs: a new platform where users can test expressive speech and storytelling capabilities, including creating interactive stories and personalized meditations
Competitive Advantages
| Parameter |
MAI-Voice-1 |
Competition |
| Speed |
<1 second/minute |
3-5 seconds/minute |
| Hardware |
1 GPU |
Multi-GPU cluster |
| Latency |
Ultra-low |
Standard |
| Integration |
Native Microsoft |
API calls |
MAI-1 Preview: Language Model
Architecture and Training Details
MAI-1 Preview is the first
Microsoft foundational model trained end-to-end on approximately 15,000 NVIDIA H100 GPUs:
- Architecture: in-house mixture-of-experts (MoE) model
- Design: designed to follow instructions and provide helpful answers to everyday questions
- Optimization: focused on consumer use cases with emphasis on instruction following
- Training approach: complete end-to-end training without relying on external components
MoE Architecture Benefits
- Efficiency: activates only relevant subset of parameters, dramatically reducing computational requirements
- Scalability: adding expert networks for new domains, flexible resource allocation