ChatGPT: The Horror Possibilities

“AI Stefanie Sun” exploded, how can technology reproduce the sound?

“AI Stefanie Sun” was born

  Who is the hottest Chinese singer in 2023? Not Jay Chou, nor Stefanie Sun, but “AI Stefanie Sun”.
  Recently, on major video platforms, the “unpopular singer Stefanie Sun” relied on various AI “clones” to cover songs of different styles. From pop to rock, to “magic”, there is no style of music that AI Stefanie Sun can’t control.

  ”This song is so top-notch, I can listen to it on a loop all night.” In the past two days, “Zi Fan” (Sun Yanzi’s fan) who often visits Station B will often express such emotion. Various classic tracks. In particular, Jay Chou’s “Hair Like Snow” has reached 1.06 million views, and “Peninsula Iron Box” and “Love in BC” have also exceeded 600,000 views… I have never spoken, and it easily occupies
  half of the Chinese music scene, even senior fans Can’t quite hear the difference. Behind Stefanie Sun’s silent “capture” of the Chinese music scene is artificial intelligence.
Technology and hard work in the music industry

  In addition to “AI Stefanie Sun”, there are also “AI Jay Chou” and “AI Wang Xinling”… It is reported that these cover songs were made by many UP masters through open source projects and then uploaded.
  The cover song of “them”, made by the creator through the open source project “so-vits-svc” and then uploaded.
  According to the current technology, it is still difficult to completely imitate the singer’s singing, skills and style, but the timbre can basically be copied 1:1. And we also found that the core technology used by AI Stefanie Sun mainly comes from an open source project called so-vits.
  With the popularity of AI singers, AI training courses such as “Teach you to create your own AI Stefanie Sun” and “Let your favorite singer sing for you” are also launched quickly, and the threshold for making such songs is getting lower and lower.
  Under the AI ​​cover video of “Rainy Day”, netizens at station B commented, “After that, I can hear Huang Jiaju and Leslie Cheung singing new songs.”

  Currently, this project has iterated to version 4.0. Compared with previous VITS, soft-vc, VISinger2 and other projects, the use of so-vits is greatly simplified. With only a few pieces of audio, a generative model can be used to synthesize the audio of the target tone and train the acoustic model that the user wants. This model can preserve pitch and pitch, and can also be sung in a different language.
  It takes 4 steps to make a song sung by an AI singer: download the one-key start package, input a suitable dry voice (pure human voice without music), train the acoustic model (the longer the training time, the better the effect), and enter the audio editing software Compositing and post. At present, teaching videos can be seen everywhere on the Internet, and bloggers can teach the whole process of AI audio production in only 3 minutes.

  The actual operation is actually not that simple. At present, there are two popular open source projects, namely so-vits-svc and RVC. They both use a model called vits, which was originally used to generate speech from text. , but after modification, the timbre features can be directly used as input without conversion into text. This enables timbre shifting of any song, not just the lyrics.
  AI Stefanie Sun uses these technologies to extract the timbre characteristics of Stefanie Sun, and then uses it to cover songs of other singers. This process is not simple, and requires certain algorithm-related experience and results, as well as a large amount of data collection and experiments.
  It is reported that Rcell, one of the creators of AI Stefanie Sun, said that he and his team did hundreds of experiments for half a year before arriving at the current optimal solution. They collected Stefanie Sun’s four albums, Stefanie Sun’s Album of the Same Name, Kepler, Backlight, and It’s Time, with a total of about 100 songs as training data. They also tried the timbre conversion of other singers, such as Jay Chou, JJ Lin, Faye Wong, etc., but the effect was not as good as Stefanie Sun’s.
  Previously, to commemorate the 22nd anniversary of Teresa Teng’s death, the Japanese program “Golden SMA” used holographic projection technology to “resurrect” a generation of singers. Fans trained the models of Leslie Cheung, Yao Beina and other deceased singers by themselves, allowing the deceased to reappear in the form of “digital life”. Through these long-lost voices, the audience can feel the unique warmth of humanism in the hustle and bustle of the Internet .
  In March of this year, singer Chen Shanni released a new song “Teach Me How to Be Your Lover”. After listening to the new song, fans almost all praised the song’s singing level as before. But a week later, Chen Shanni published a long article stating that her new song was actually sung by an “AI model”, and even the cover of the single was generated by AIGC. In the process of song production, she did enough work to train AI to sing, and the workload was no less than or even much higher than singing in person.
  And Chen Shanni also expressed that she hopes that through this song, all those who care about artistic creation will be inspired to think – if the era of AI is bound to come, what a creator should care about may not be “whether we will be replaced”, but “What else can we do”.
Unavoidable copyright issues

  The copyright issues involved in AI cover mainly include: Does AI cover infringe the singer’s voice copyright? Does the AI ​​cover infringe the music copyright of other singers? Does the song covered by AI have its own copyright?
  There are still no clear legal provisions and judicial precedents on these issues in our country, and there are many controversies and difficulties.
  Abroad, there have been cases of AI “invading” the music industry. For example, a TikTok user used AI Rihanna to cover Beyoncé’s hit single “Cuff It”, which attracted the attention of Universal Music, the copyright owner of the song, and triggered an infringement lawsuit; American rock band Nirvana sued the song “Drowned in the The production team of “Sun” claimed that the team used AI technology to imitate their style; American rapper Jay-Z sued the website VocalSynthesis, claiming that the website used his voice to read “Hamlet” and other literary works, infringing his voice copyright and Portrait Rights.
  With the large model as a bridge, non-programmers can create exclusive AI tools, which is undoubtedly an important step towards general artificial intelligence. However, when the application threshold of AI in music, painting and other fields is gradually lowered, corresponding copyright issues are bound to come one after another.

  Interestingly, the developer of the so-vits-svc model has deleted the library from the source code hosting service platform Github, and said that the main reason for deleting the library is that the project has stopped maintenance and updates, and declared the project’s disclaimer clause, emphasizing that the project It is an open source, offline project. All members and contributors do not have any control over the project, nor do they know the purpose and method of users using the project. Therefore, all members and contributors of the project based on the AI ​​model trained by the project and the synthesized audio irrelevant.
  On May 9, Douyin also released a platform specification and industry initiative on artificial intelligence-generated content. It is mentioned that when creators, anchors, users, merchants, advertisers and other platform ecological participants apply generative artificial intelligence technology to Douyin, publishers should clearly mark the content generated by artificial intelligence to help other users distinguish between virtual and reality , especially in confusing scenarios; publishers are responsible for the corresponding consequences of artificial intelligence-generated content, no matter how the content is generated; avatars need to register on the platform, and avatar technology users need real-name authentication; use of generative artificial intelligence is prohibited. Create and publish infringing content with intelligent technology, including but not limited to portrait rights, intellectual property rights, etc. Once found, the platform will strictly punish.
  The sound produced by AI technology may make you feel novel, but it may also confuse you. When listening to such songs, it is best not to forget the real singer, real song, and real music.

