You are sitting with your friends and chatting, throwing your phone on the table. You ask him, what should I do if the heating in the house is not hot? At night, you read “Teach you to deflate the heating” in the information stream of Xiaohongshu. You are startled and think, “Could it be tapped by the phone again?” Because you have never actively searched for similar topics.
”App monitors conversations” is mentioned repeatedly because users can always find “coincidences”, but it is difficult to prove them. Relevant commercial companies only denied it, and could not produce rebuttal evidence. If the app really listens to us through the microphone, is it feasible?
Voice assistants are questioned first
People have indeed caught the handle of smart speakers and voice assistants. Amazon Alexa once misunderstood commands and sent “overheard” conversation recordings to friends in the user’s address book, proving that microphone eavesdropping is not imaginary.
In 2020, a product manager wrote an article on hackernoon (a technology sharing community) to explain this confusion. He took Siri as an example, saying that Siri is indeed “listening”, but it “doesn’t understand”, and it doesn’t start to understand your instructions until it is triggered by “Hey Siri”.
Because Siri just understands that you are calling it, it has already expended a lot of “strength”. The sound falling on the microphone will take 0.01 second as a frame, 20 frames (0.2s) each time, and will be input into the deep neural network for local calculation. A deep neural network converts these sounds into a probability density function. When the function value reaches the threshold, the main processor is activated. Before the main processor is activated, it is the co-processor that handles the sound.
The so-called coprocessor can be understood as an auxiliary processor with limited functionality and power consumption, allowing users to access some “always on” functions when the screen is off. When the “Hey Siri” feature launched, it helped Siri process voice. Siri understands that if you don’t want it to hear the sound, it “goes in the left ear and out the right ear”.
Will the App be the same as the voice assistant? What if it goes “in the left ear” but “out the right ear”?
”Hey Siri” can be regarded as Apple’s “underlying application”, which is written inside the system. In contrast, when an application on the iOS system invokes system permissions, the user will not be so “insensitive”.
Technically yes, but not worth it
In 2019, a domestic developer team wrote an Android app. In the demonstration, after the app obtained the “recording” permission, it locked the screen and listened in the background. The developer said to the microphone: “What’s for dinner tonight?” The server received the voice-to-text message uploaded by the App. In order to prove, regardless of all restrictions, only from a technical point of view, it is feasible for the app to listen to what the user says in the background.
Why the emphasis on “only technically”? Because it is very difficult to completely bypass the user’s attention and realize “stealing” listening. In 2017, García Martínez, head of Facebook advertising, wrote in Wired. If Facebook recorded everything heard through a microphone, it would be functionally equivalent to users and Facebook being “on the phone all the time,” he said.
Someone did an experiment, recording for one hour and consuming 6% of the power. Long-time recording with low power consumption, the power consumption is not very large. If multiple apps and multiple SDKs use this method at the same time, the phone will become hot and hot.
You can hardly imagine how much user data “real-time transmission” can generate. Martinez assumes that users use mobile phones for half a day every day. Calculated at that time, the average one-way transmission rate of “Internet calls” is 24kbps, and each person transmits about 130MB of data per day. At that time, Facebook only had 150 million daily active users in the United States, generating about 20PB of data every day. There are 300PB of data stored in the Facebook database, and the amount of data processed every day is about 600TB. In this way, the audio data generated by monitoring is 33 times that to be processed every day. Even a company as large as Facebook can hardly bear its weight.
Maybe someone still has questions. Now that this path is no longer feasible, mobile phone and App manufacturers can convert “speech to text” locally, filter and extract valuable information, and then upload it to the background server. Aside from doing background calculations that will “swallow” the CPU and cause performance degradation of the mobile phone, the cost is not low.
Google sells its “speech-to-text” service to third parties for $0.006/15 seconds. (One user) The cost of transcribing 24 hours a day is $12,614, and even transcribing 1 hour a day costs $525.
Come on, let’s simply test
I did a test from an individual consumer to confirm two suspicions, whether these three mainstream apps have recorded my conversation without perception (without enabling the microphone); and use this for App’s personalized recommendation and advertising system.
For this reason, I avoided other variables as much as possible, and used an iPhone 7 Plus that erased all content and settings and upgraded to iOS15 or above for testing. And registered new accounts of the above three apps.
At least in this experiment, the App could not be unnoticed when calling the microphone, nor did it “transform” my conversation content, and the surrounding environment sounds, into personalized recommendations.
Thought the mic was eavesdropping, is that my problem?
Just like the “heating” example mentioned at the beginning, based on the understanding of users on social media, you may have been accurately labeled as “young women” and “Beijing drift”. The post could have been tweeted to 100,000 people with the same hashtag, including you, rightly thinking you were being listened to.
This could be due to confirmation bias. Confirmation bias is a psychological concept, it is like a filter in the brain, leaving what you think is correct, and unconsciously filtering out other information that is not relevant to needs, selective attention. It is also called “retina effect” or “maternity effect”.
Similar examples are very common. You and your boyfriend are out for a walk, discussing a recent move and what brand of projector to buy. As a result, you go home at night and swipe your phone. Not only are you recommended on social media by an intermediary to view the house, Taobao also pushes you a new projector. At this time, you are just about to show off to your boyfriend, “I said, the mobile phone must be bugging us”. Wait, calm down for a while, recalling that I didn’t bring my mobile phone with me when I went out.
This is happening, perhaps because your boyfriend has already browsed online, how to pick a projector. And because you are friends with each other on the Internet, you have been monitored for mutual forwarding and other behaviors, and have been marked as “common interests” by the advertising system.
So is the ad system already this mature? no need! Really unnecessary!
At least in the field of commercial consumption, the quality of user data obtained through “eavesdropping” is not necessarily high, but the cost is extremely high, and the company does not need to pay for it.
Most mainstream apps have expressed similar positions. Because for third-party apps, it is almost impossible for them not to be noticed by users, or to break through system permissions to monitor.
In 2019, a reporter from The Paper found that after turning off programmatic ads in the Toutiao app, the number of ads users saw remained the same, but the relevance of the ads would decrease. Narada evaluated 50 top apps in the 2020 “Personal Information Security Annual Report” and found that six of them did not provide the option to close personalized recommendations. Two years later, these top apps all added this option.
Well, now the phone will not “rush to answer”.