Microsoft unveiled its new mixed-reality platform, Mesh, at its March 2021 Ignite event. The splashy launch didn’t go into significant technical detail though it did show shared, cross-platform, virtual and augmented experiences and a HoloLens-based avatar conferencing service. There was a lot to see but not a lot of information about how we’d build our own code or use the service.
Despite the lack of detail forthcoming at Ignite, it’s quite easy to make an educated guess about Mesh’s components. We’ve been watching Microsoft unveil most of the services needed to build Mesh during the last couple of years, and Mesh brings all those elements together, wrapping them in a common set of APIs and development tools. Unlike many other augmented-reality platforms, Microsoft has a lot of practical experience to build on, with lessons from its first-generation HoloLens hardware, its Azure Kinect 3D cameras, and the Mixed Reality framework built into Windows 10.
Building on the HoloLens foundation
If you look at the slides from the Mesh session at Ignite, it won’t be surprising that the scenarios it’s being designed for are familiar. They’re the same set of collaborative, mixed-reality applications Microsoft has shown for several years, from remote expertise to immersive meetings, and from location-based information to collaborative design services. While they’re all familiar, they’re more relevant due to the constraints that COVID-19 has added to the modern work environment, with remote work and social distancing.
Over the years that Microsoft has been building mixed-reality tools, it’s noted a number of key challenges for developers building their own mixed-reality applications, especially when it comes to building collaborative environments. The stumbling blocks go back to the first shared virtual-reality environments, issues that prevented services like Second Life from scaling as initially promised or that held back location-based augmented-reality applications.
First, it’s hard to deliver high-definition 3D images from most CAD file formats. Second, putting people into a 3D environment requires significant compute capability. Third, it’s hard to keep an object stable in a location over time and between devices. Finally, we need to find a way to support action synchronization across multiple devices and geographies. All these issues make delivering mixed reality at scale a massively complex distributed-computing problem.
It’s all distributed computing
Complex distributed-computing problems are one thing the big clouds such as Azure have gone a long way to solving. Building distributed data structures like the Microsoft Graph on top of services like Cosmos DB, or using actor/message transactional frameworks like Orleans provides a proven distributed-computing framework that’s already supporting real-time events in games such as Halo.
Another aspect of Mesh is its integration with Microsoft 365, with Azure Active Directory and OneDrive in the prototype HoloLens Mesh app. Microsoft 365’s underlying Graph is key to implementing collaborative apps inside Mesh, as it’s here that you can link users and content, as well as have persisting sessions across devices and experiences.
In a session at Ignite, Microsoft discussed the Mesh developer platform at a fairly high level. At its core is a platform very similar to Azure, with tools for user and session management and billing management. That’s all integrated with Microsoft’s consumer and commercial graphs: the Microsoft Graph for user-centric services and Dynamics 365’s common data service (along with the Power Platform’s Dataverse) for commercial. Closely aligned are services to manage user identity, an audio and video platform, and the cloud-hosted infrastructure needed to deliver this.
Introducing Mesh Services
If that all sounds very familiar, it is. Microsoft launched a set of frontline worker tools for HoloLens, building on SharePoint, Dynamics 365, and Teams, and these are the services it would have needed to build them. This is a common pattern for Microsoft: It builds internal tools to deliver a set of applications and then makes those tools a product so you can build your own applications.
On top of the core platform sits a set of capabilities: immersive presence, spatial maps, holographic rendering, and multiuser synchronization. Immersive presence is perhaps the one really new aspect of the Mesh platform, building on the regularly demonstrated holoportation tools for HoloLens. However, instead of a detailed image, Microsoft is delivering less-detailed avatars for most applications, keeping bandwidth usage to a minimum. If you’re using HoloLens 2 or a similar device, face-tracking cameras deliver basic expression mapping, along with hand tracking for the arms. Avatars are positioned within the virtual environment so all users get to interact without collisions.
More complex, detailed user meshes can be delivered when you add tooling like the Azure Kinect sensors, although this requires additional hardware and a room that’s set up for mixed reality. This should give you the detail that the holoportation demos showed, as it allows real-time capture of moving 3D images and maps them to a basic skeletal model with tracking for key points of articulation.
Spatial maps are an extension of Azure’s existing spatial anchors, allowing you to fix a 3D object to a real-world position. However, things go further, with support for dynamic anchors that fix a model to a mesh overlay on a physical object. This should support overlays, say, on an engine that could be anywhere in a workshop. It’s unclear yet how that alignment will be delivered, if it depends on mesh detection from 3D cameras or lidar, or if it can be provided by using alignment marks and QR codes. Content and location data is delivered using cloud services, keeping local compute requirements to a minimum.
This approach fits well with Mesh’s holographic rendering. Again, this is based on an existing Azure service: Remote Rendering. Instead of requiring end-user devices to support a wide selection of rendering engines and file formats, along with the hardware to deliver 3D content, models can be delivered to Azure using standard formats before they’re rendered in Azure for delivery to devices as needed, using the appropriate number of polygons for the device and application.
Finally, multiuser sync uses a mix of device hardware to map current body positions and facial expressions onto avatars or skeletal meshes. Each user receives the images that are relevant to their current position—again, keeping bandwidth requirements to a minimum. Perhaps the most important aspect of this feature is its support for spatial audio. One of the biggest issues with the current generation of video conferencing is that sound is normalized; you can’t easily pinpoint who is talking. Using spatial audio, sound is transformed, using the person’s position in virtual space to make it possible to locate the source.
Putting it all together: a mixed-reality toolchain
It might be best to think of Mesh as a simplification of all the tools we’ve been using to build mixed-reality applications around Windows. Offloading much of the complexity to Azure makes a lot of sense, as it provides a hub for shared data and services. We already have many of the APIs and toolkits it uses, but they’re all delivered separately. Mesh should bring them all into a single SDK, providing a common set of controls and UI elements to give a consistent user experience.
Use of Azure’s cognitive services should improve object recognition, help with gesture and facial tracking, and produce environment maps for spatial audio. Microsoft has been demonstrating support for 3D vision with its Azure Kinect SDK, with similar sensors built into HoloLens and available to third parties.
It’s clear that much of this will be built in familiar tooling in Unity to start with, adding support for Unreal during the next year, along with further Unity support. Unity support will cover Windows (both desktop and in Mixed Reality), HoloLens, and Android. Unreal will support all these and add iOS and macOS, with Unity coming here, too. Web developers will be able to take advantage of 3D frameworks like Babylon and with React Native for UI components.
Simplifying mixed-reality development is essential if there’s to be mass adoption of these technologies, from headsets to augmented-reality views on mobile devices. Mesh certainly seems as though it could be that cross-platform tool and service; it’ll be interesting to watch Microsoft deliver it during the next 12 months.