After sitting in a wheelchair, Hawking could also use the finger to tap the keyboard to output text. Later, the finger could not move. I thought about using eyeball cultivation and brainwave recognition technology to help output information. But in the end, due to the disease and technical problems, failed to complete, and the use of infrared detection glasses, this is the most advanced language synthesis technology. Hawking didn’t directly convert his brainwaves directly into language until the end, but this technology is coming soon. From another perspective, we continue to explore the “brain machine interface.”
Brain wave “typing” paving
In order for brain waves to speak, it is necessary to establish a connection between brain waves and letters. Therefore, “typing” is a topic that cannot be avoided. At the World Robotics Congress in 2018, the “Dynamic Window Steady-State Visual Evoked Potential Brain-Computer Interface System” developed by Tsinghua University provided participants with a competition platform: focus on the letters in the virtual keyboard on the computer screen. The brain waves will be captured and the corresponding letters will be displayed on the screen.
This visual typing system is a visual evoked potential typing system. Each target character in the virtual keyboard will have a specific frequency to flash during the running process, and the target frequency is different. When we look at the target, the brain pillow Near the visual cortex of the area, an EEG signal corresponding to the scintillation frequency is generated. The stimulation is different and the response is different. Therefore, by collecting human EEG signals, the corresponding targets can be found, thereby achieving typing. The leader of the project said that the average correct rate of this system can reach 91%. This input efficiency is roughly equal to the normal person’s handwritten 28 English letters per minute, and the fastest person can play 60 characters per minute.
In fact, the principle of this system is similar to Hawking’s infrared glasses. The infrared glasses capture signals according to Hawking’s small muscles. Both of them use the strong reaction of human pile letters to achieve language output. Of course, Hawking’s infrared glasses equipment is more mature and the typing efficiency is faster.
“Dynamic window steady-state visual evoked potential brain-computer interface system” wants to enter the next stage, what needs to be solved is the “sensitivity” problem – brain waves are very sensitive and active. On the one hand, if people’s attention is not highly concentrated, the system will be difficult to locate the letters; on the other hand, when people see the letters, they often involuntarily generate associations. For example, seeing “c” will think of “copy”. Or the word “car”, which also interferes with system identification. In addition, Chinese has to undergo more conversions than English, and it takes more time.
Everything is ready, the model has become
At present, the brainwave language conversion in the industry is mainly divided into four major steps: sample collection, signal conversion, virtual channel, and output.
The first step of sample collection is easy to understand, it requires the activity of different language words in the brain area to create a database for subsequent mapping. The “soundness” of the establishment of this database directly affects the accuracy of the brainwave conversion language. This also has to consider the different pronunciations, intonations, etc. of the same word. It is very difficult to build a “universal” database.
The second step is the continuous learning of the circulating neural network (RNN), which converts the brain’s neural signals into signals of vocal organ movements that are directly related to the movements of the vocal organs, such as the lips, chin, tongue, and throat.
However, the learning problem of neural networks has always been a headache for the industry, and the content involved is very complicated. The current popular scene is that although the voice output is fast, only half of the sentences are recognized.
The third step is the virtual channel. The vocal movements produced by virtual different sounds are just like the different vocal parts that need to be mobilized in Chinese, such as double lip, lip, and middle notes. Normally, if the simulated pronunciation movement mode is the same as when people speak normally, Then the sounds sent must be the same. This step is done, and then you can output it.