“The pinnacle of flat computing technology and user experience has been reached, marking an inevitable transition toward spatial computing.
 The user experience and demands on the flat computing platform have encountered a bottleneck.
The prevalent computing platform of our time is the flat computing paradigm, exemplified by mobile phones. Nonetheless, it is incontrovertible that the current user requisites for flat computing platforms, epitomized by mobile phones and tablets, have entered a period of saturation.
Global smartphone shipments in 2023 are projected to be 1.16 billion units, indicating a year-on-year decline of 3.4%. Among these figures, Apple dispatched 232 million units in 2023, denoting a year-on-year increase of 2.5%, securing a market share of 24.7%. Samsung, however, recorded a decrease of 12.9% with 227 million units shipped, holding a market share of 16.3%. Xiaomi observed a year-on-year decrease of 4.7%, shipping 146 million units, while OPPO and Vivo experienced decreases of 24.1% and 54.9%, with shipment volumes of 80 million units and 40 million units, respectively.
 The relentless advancement of computational power will inevitably usher in more sophisticated computing platforms.
An undeniable trajectory in the evolution of human technological artifacts is the incessant refinement of computational capabilities. Whether propelled by the advent of AI or the progressive march of Moore’s process into the 4nm era, the paramount catalyst for stimulating consumer intent to purchase remains the continual enhancement of product computing power.
Irrespective of whether it pertains to a personal computer or a mobile phone, the prevailing robust computational power finds itself in a state of ‘superfluous computational potency’ on flat computing platforms. While specific verticals such as artificial intelligence, 3A games, and big data mining still harbor ‘exigent’ demands for computational power, the upper limit for the development of current advanced computing power on flat computing platforms has already been constrained, especially for the general populace’s entertainment and leisure. This constraint is particularly evident in video applications where audio and visual elements form the primary mode of communication.
In 2023, Apple unveiled Vision Pro, rekindling our attention. Unlike conventional categorizations such as MR, AR, or VR, Apple characterizes Vision Pro as a spatial computing platform—an entity transcending the constraints of flat computing. Unlike flat computing, spatial computing obviates the need for dimensionality reduction or enhancement when processing three-dimensional object information. Human perception becomes the epicenter of spatial computing in three-dimensional space, promising an experience closer to genuine human interaction, ensuring information integrity and immersion.
 The operational logic of interactive hardware will align more closely with human perceptual habits of the physical world. Undoubtedly, the impact of interactive hardware on human cognition and operational habits cannot be disregarded.
The trajectory of human-computing platform interaction evolves iteratively from an unnatural to a natural state. In this progression, developers must invest more effort to facilitate the interaction of ordinary users with the computing platform with minimal cognitive load.
The evolutionary trajectory of graphics characterized the early stages of PC computing platforms. In 1983, Apple introduced the Lisa computer—a pioneering personal computer endowed with a graphical user interface (GUI) and a mouse. Despite its relative expense, it set the stage for the future. However, it was the more economically viable Mac, launched in 1984, with its simplistic graphical interface and user-friendly design, that captivated the market. Notably, Microsoft adopted a more inclusive strategy, diverging from Jobs’ insistence on ‘software and hardware integration.’ Microsoft’s in-depth development and iteration of the Windows operating system propelled it to surpass Apple in the PC market.
Transitioning into the era of flat computing, characterized by software and hardware integration, the graphical revolution persisted. The advent of the iPhone, eliminating the physical keyboard entirely, and the interactive logic of iPhone 4 established paradigms for mobile computing platforms. The dominance of Android and iOS in the mobile computing era can be attributed to their comprehensive development ecosystems, fostering communication channels between developers and consumers. In 2013, global smartphone shipments officially eclipsed feature phones, solidifying their dominance in the mobile phone market.
The spatial computing era, marked by heightened ease and directness in interaction, stands in stark contrast. Apple, renowned for its commitment to ‘software and hardware integration,’ distinguishes Vision Pro through innovative interaction methods that transcend dimensions, including arm movements, eye positioning, and effortless hand gestures. Unlike the fatigue induced by prolonged mobile phone usage, spatial computing, with its eye tracking technology, inches closer to the ultimate interactive brain-computer interface.
Reviewing the progression of consumer electronics in the computing platform category—using Xbox as an exemplar:
【1】 A strategic foray into the gaming market through a low-price strategy.
Microsoft’s Xbox, a late entrant to the gaming sphere, achieved global sales of 24 million units for its first-generation product, owing to exceptional console performance, networking capabilities, and a judicious low-price strategy. Although it incurred a substantial loss of $5 billion to promote the initial Xbox, Microsoft outmaneuvered established game console companies, such as Sega and NEC, forming a triumvirate with Sony and Nintendo.
The Xbox 360, launched in 2005, vied with the PS3 and Nintendo Wii released in 2006. Both the Xbox 360 and PS3 garnered worldwide sales of 80 million units. The Nintendo Wii, with its low-price strategy and introduction of somatosensory gameplay, disrupted the game market, attracting non-gamers and surpassing the Xbox 360 and PS3 with stronger hardware performance, reaching 100 million units. To this day, Nintendo maintains a game console development strategy rooted in high-quality gameplay and innovation.
【2】 The Kinect product’s Waterloo.
Judging by results, the Xbox 360 stands as a triumph, engaging in a closely matched rivalry with the PS3. Microsoft, inspired by the success of the Wii’s motion-sensing game market, introduced the Kinect peripheral, endeavoring to capitalize on the motion-sensing player market.
Kinect achieved the status of the “fastest-selling consumer electronics product” globally, with 8 million copies sold within 60 days of its launch and a total of 35 million copies sold over its seven-year lifespan. However, Kinect’s failure can be traced to its misalignment with the core audience of Xbox players, failing to accurately address the core demands of its user base.
User research at the time revealed that games adapted to Kinect merely showcased its functionality without delivering enjoyable and continuous gameplay. Despite the freedom from controllers, Kinect’s gaming experience paled in comparison to the Wii’s, which had a robust ecological system and was cost-effective. Kinect’s bundling policy with the Xbox One prompted numerous Xbox players to switch allegiances to the PS camp.
【3】 XGP as the linchpin for driving Xbox sales growth.
Entering the competition between the Xbox Series and the PS5 next-generation consoles, Xbox’s sales performance lags behind Sony’s PS5. The primary determinant in players’ choice of a game console revolves around service, performance, and cost-effectiveness, with hardware conditions playing a secondary role.
In terms of performance, Xbox is on par with, if not superior to, the PS5, especially in frame rate. However, Xbox’s initial software ecosystem posed challenges, with the flagship title for the XS, “Halo: Infinite,” failing to showcase the prowess of this next-generation console. The divergence in sales performance stems from players placing more emphasis on service, performance, and cost-effectiveness.
The XGP subscription system, closely integrated with Windows, has enticed a substantial number of PC players to explore Xbox. While specific data from Microsoft remains undisclosed, private surveys indicate that the Xbox Series S comprises approximately 74.8% of Xbox sales in early 2022, implying that only around 25% of Xboxes are Series X. This discrepancy is attributed to the lower price of the XSS, its compatibility with XGP, and the resulting misaligned competition. Consequently, the sales volume of Microsoft’s Xbox Series S surpasses the more potent Xbox Series X.
Comparative Analysis between Vision Pro and Current Mainstream XR Technological Paradigm
Augmented computing potency—pinnacle of hardware efficacy
Vision Pro M2+R1 dual-chip quasi-all-in-one architecture – quasi-desktop-grade computational prowess
M2 chip: Founded on ARM architecture, employing TSMC’s 5nm process, housing 20 billion transistors, it was formerly utilized in the 13-inch MacBook Pro, 13-inch, and 15-inch MacBook Air (M2 chip models).
R1 chip: Positioned as a co-processor, it processes input from 12 cameras, 5 sensors, and 6 microphones to ensure real-time presentation of content to the user. The R1 chip can transmit new images to the display within 12 milliseconds, 8 times faster than blinking, thus diminishing the likelihood of “vertigo” among consumers—a crucial consideration given the tendency of many to experience dizziness in real-life 3D scenarios.
H2 chip: Both AirPods Pro and Vision Pro are equipped with the H2 chip, laying the foundation for spatial audio. The inclusion of ultra-low latency audio and spatial audio will further augment the immersive experience of Apple Vision Pro.
In 2024, Apple’s Vision Pro spatial computing platform’s primary adversary will be Qualcomm’s Snapdragon XR2+ Gen2 single-chip all-in-one architecture computational platform. The previous XR2 Gen2 generation finds use in Meta Quest3. Qualcomm Technologies affirms that the second-generation Snapdragon® XR2+ platform can serve as a benchmark for Vision Pro in terms of single-eye resolution and VST latency. The primary chip specifications adopt a single-chip architecture, supporting spatial computing with a 4.3K display resolution at 90FPS, accommodating 12 or more parallel cameras, robust end-side AI computational power, and a VST video perspective delay as low as 12ms. Overall, compared to the recently unveiled second-generation Snapdragon XR2, the GPU and CPU frequencies of the second-generation Snapdragon XR2+ will witness a 15% and 20% increase, respectively.
Presently, even without the deployment of Apple’s most sophisticated M3 chip, the extant M2+R1 dual-chip architecture is sufficiently distinguished and aligned with Apple’s ambitious plan to spearhead the spatial computing industry by 2024.
Innovative Interaction—Precision and Stability
In terms of interaction, Apple has earnestly dedicated itself to product research and development, culminating in a seamless user experience.
High-precision tracking + handle-free operation, embodying greater user-friendliness, portability, and proximity to the zenith of the brain-computer interface
To circumvent fatigue, Apple has embraced a core interaction design grounded in eye movements, complemented by gestures. Vision Pro stands as a beacon of innovation, liberating user interaction from the confines of the screen to allow gestures and pinching in a spatial domain, affording users unprecedented freedom and comfort. The pinnacle of high-precision tracking is realized through state-of-the-art infrared cameras.
Users can engage with the device in their most comfortable posture, thereby actualizing Apple’s pursuit of the ultimate in “ease” of interaction.
Hand tracking method, RS+TS algorithm: Vision Pro, not being a gaming console, currently lacks support for gaming experiences like “Genshin Impact” and “Honor of Kings” on mobile devices, primarily due to the absence of a physical controller and the constraints in functionality, even when executing large-scale games with flat 4K and 60Hz graphics. These limitations include potential frame drops.
Why eschew controllers? The controller serves as a hurdle for the average consumer. In essence, Apple envisions Vision Pro as a product accessible to all. Incidentally, Vision Pro will integrate with Apple Arcade, promising novel forms of entertainment. Under the aegis of spatial computing, lightweight games blending reality with virtuality, such as “Fruit Ninja,” or interactive experiences like “Mario Kart Live” and “Paper Toss,” are ideal choices to showcase the capabilities of Vision Pro.
The eye gaze function is actualized through the 4 infrared cameras and LED illuminators within the headset. This feature can be expanded into three applications: dynamic gaze rendering technology, eye tracking and control, and iris recognition, collectively referred to as Eyesight. The implementation utilizes the pupillary corneal reflex method.
Dynamic gaze point rendering technology involves adjusting the resolution of different areas based on changes in the user’s gaze, optimizing computing power and reducing rendering costs. Iris recognition facilitates user identification and authentication, enhancing security for assets like Alipay and WeChat. Eyesight, on the other hand, enriches user “social presence” and introduces personalized customization, including mapping virtual human expressions to enhance interaction in the virtual realm.
Vision Pro Application Ecosystem—Years of Accumulation and Preparation for the Present Confrontation
In 2022, numerous domestic internet companies began exploring VR products, exemplified by ByteDance’s acquisition of PICO. However, the content ecosystem for early VR products was inadequately developed, impeding the formation of a user base and profitability. Unlike smartphones, VR lacks indispensability in daily life for consumer users.
Reviewing early VR equipment, we observe superior sales on the B-side compared to the C-side, primarily driven by deterministic scenarios. Microsoft Hololens 2, for instance, significantly enhances efficiency in various industries, such as assembly processes, technician travel, and construction costs. While VR finds extensive application in education, healthcare, engineering, manufacturing, and more, its impact on consumers’ daily lives is less pronounced than that of mobile phones. Consequently, during the phase of inflated expectations for VR, demand failed to materialize, leading to a rapid burst of the bubble and a trough in the VR industry.
Echoing our examination of game consoles in the preceding section, spatial computing devices, much like game consoles, prioritize the integration of computing platforms and platform services over pure hardware parameters. Apple’s Vision Pro has been strategically positioned for years, evident in its extensive publicity and sophisticated software development. The post-launch approach involves propelling PGC (Professionally Generated Content) and UGC (User-Generated Content) content to address diverse application scenarios.
Basic Connectivity: Integration with other Apple devices is seamless. Vision OS supports compatibility mode, enabling the execution of over one million iPhone and iPad apps from the App Store, appearing as “mini windows” in compatibility mode. While iPadOS, iOS, and MacOS native applications are fully transferable, the non-native environment may not ensure an optimal experience.
Gaming and Entertainment: The Arcade service ranks second only to Steam and Nintendo Switch Online in weekly active players (23Q4), more than doubling the figures for Ubisoft+.
Streaming Media: As of January 18, Apple announced Vision Pro’s compatibility with streaming applications like Apple TV+, Disney+, and HBO Max, offering an 8K 180° viewing experience with exclusive content. Moreover, over 150 3D movies will be available on Apple TV for Vision Pro users.
Development Team: Globally, eight Vision Pro developer labs are operational, spanning Shanghai, Cupertino, London, Munich, Singapore, Tokyo, New York, and Sydney. Development tools like Reality Composer, Xcode, and VisionOS SDK were introduced in June of the previous year.
ARKit Development: Commencing in 2017, Apple has consistently unveiled new versions of ARKit annually at the WWDC event. Over seven years, ARKit has spawned over 14,000 applications for Apple. It evolved from ARKit 1.0, exemplified by Pokémon GO, to ARKit 3.5 supporting LiDAR, and the latest ARKit 5, RealityKit 2, featuring face tracking, 3D scanning, and more.
3D Film and Television: Spatial video/camera functionality, initiated with the inclusion of LiDAR in the iPhone 12 Pro and expanding to the iPhone 15 Pro, which can serve as a spatial video recording device. This democratizes 3D film and television production, historically a domain of professional cameras, aligning with the trend of civilianizing high technology. With high dissemination potential, 3D film and television, akin to Douyin and TikTok, transforms static media and empowers users with simplified and accessible 3D photography.
Productivity: While specific breakthroughs in production development with Vision Pro are yet to be publicly disclosed, projections indicate a focus on 3D-related productivity facets like scanning and modeling. Vision Pro’s capability to scan users’ faces and reconstruct personas in real time lays the groundwork for breakthroughs in future productivity applications. The absence of technical impediments suggests that innovative applications within the productivity track will emerge imminently.
Vision Pro commands a premium price. When can we anticipate a mass-market MR product?
Primarily, the current iteration of Vision Pro doesn’t cater to the general consumer market; it decidedly deviates from Apple’s final form for the product.
Presently, Vision Pro functions more as a 2B (business-to-business) product, serving as a showcase for developers to explore the platform’s capabilities. The associated costs are elevated, and consumer-grade attributes are not the primary focus. Consumer use primarily revolves around entertainment and audio-visual experiences, with deterministic application scenarios for productivity currently being limited.
1. “Volume First, Price Advocacy”
Meta’s strategy for VR hardware aligns more with internet logic than consumer hardware logic, prioritizing user growth over optimizing user experience. Meta Quest, as a head-mounted display device, is squarely targeted at leisure and entertainment, a discernible and high-demand market segment. Meta has introduced over 500 dedicated applications for Quest products, with pricing positioned reasonably. However, the challenge lies in transitioning to high-end offerings, where Meta struggles to compete with traditional consumer electronics manufacturers in terms of supply chain control.
2. “Performance First, Gradual Price Reduction”
As a consumer electronics manufacturer, Apple’s Vision Pro is positioned as a top-tier MR product, emphasizing the provision of an unparalleled user experience. Currently, the associated costs are high, with an anticipated shipment of 500,000 units this year and 1 million units next year. Apple aims to leverage the developer ecosystem to enrich Vision Pro with a diverse array of applications. This mirrors the classic approach of consumer electronics technology manufacturers.
Despite the relatively limited number of Vision Pro applications at present, the guarantee lies in the direct transplantation and use of iPhone iOS-side applications with Vision Pro.
In the future, possibly within a five-year timeframe, Apple will likely complete iterations of subsequent Vision Pro generations, achieving price reductions. Concurrently, industry competitors such as Huawei, Xiaomi, PICO, Meta, and others may propose optimizations based on Vision Pro’s solutions and potentially launch genuine consumer-grade MR products ahead of Apple.
In conclusion, Vision Pro is poised to be an industry trailblazer. However, its ability to sustain technological leadership hinges on the continued development of MR ecological applications and the exploration of evolving consumer needs. The prospect of rapid iteration and potential overtaking by other manufacturers cannot be dismissed.