Life,  Tech

Google Releases New AI Video Generation Model W.A.L.T While Controversies Surround Previous Gemini Model, Competition Heats Up in AI Video Space

Just a week after releasing its latest AI model, Genmini, Google has announced its latest AI research results.
On December 12, Google announced that it had partnered with Li Feifei, the world’s top computer vision expert and Chinese AI godmother, and his student team to launch the AI video generation model W.A.L. T (full name for Window Attention Latent Transformer).
Similar to PIKA 1.0, which was developed by the daughter of A-share Xinyada Chairman, W.A.L.T is also an AI video generation model.
Earlier on the evening of December 6, Google released its latest generation of multimodal AI model Gemini, and simultaneously released a demonstration video.
However, shortly after Gemini was released, it was revealed that its demonstration video had deliberately beautified the model effect by editing and other means. Because of this, Google has also been caught in the “fake” allegations.
Just six days later, Google has targeted AI video generation by releasing W.A.L.T, one of the hottest areas for AI applications today.
picture

Joining hands with Chinese AI goddess, Google’s AI Wensheng video is similar to the previously popular Pika 1.0. W.A.L. T also supports Wensheng video, picture generation video, 3D video generation and other functions.
In terms of video effects, according to the demonstration video and the paper, W.A.L.T. can generate a 3-second video with 8 frames per second and a resolution of 512 x 896 through natural language prompts.

Industry insiders say the W.A.L.T’s effect is “much better than Pika 1.0, with very good clarity and action.”
Interestingly, Guo Wenjing, the founder of Pika and daughter of Xinyada Chairman, actually has a lot of connections with Li Feifei.
Before dropping out of school to start a business, Guo Wenjing studied for his doctorate in Stanford University AI Laboratory (NLP& Graphics), while Li Feifei was Stanford University’s first Sequoia Chair Professor and also worked in Stanford University AI Laboratory.
Compared with Guo Wenjing, a rising star, Li Feifei can be regarded as the founder and technology leader in the field of global computer vision. He is also a talent resource contested by global technology manufacturers, including Google.
According to public information, Li Feifei was born in Beijing in 1976 and grew up in Chengdu. In 1992, at the age of 16, Li Feifei went to settle in the United States with her parents and entered Princeton University three years later to study physics.
During his later career, Li Feifei gradually established his interest in AI research and shifted his research focus to the field of computer vision, which was very unpopular at that time. In 2007, Li Feifei started his first project ImageNet (teaching machines to recognize image datasets) despite a shortage of funds.
At that time, AI image recognition models could only recognize four types of objects: cars, planes, leopards, and faces, because researchers used to train models for only these four types of objects. If you want AI to recognize an object, you need to manually mark the target in the picture, and then “feed” a large number of such pictures to AI for training.
Li Fei Fei’s idea is that if there is a large enough labeled data set, it can train a theoretical “omniscient” computer vision model.
ImageNet was officially released in 2009 and quickly became a repository for training and testing almost all visual models. Li Feifei also became famous in World War I and had titles such as “Chinese AI Godmother”. ImageNet remains one of the most well-known large vision databases in the global AI industry.
Whether it’s two models in a week or working with Li Feifei’s team, Google is working hard on multi-modal AI model development.
Picture AI video “Immortal fight”, how do domestic players think?
In the past period of time, AI video generation track is very lively. In addition to Pika 1.0 and W.A.L. T, there are many AI video generation tools that have emerged intensively or have been updated.
For example, in early November, Runway, an American generative AI unicorn enterprise, updated its self-developed video generation model Gen-2 to improve the fidelity and consistency of the generated results.
In mid-November, Meta, a tech giant that started as a social product, released Emu Video models.
At the end of November, American Wensheng image startup Stability AI launched a video generation model called Stable Video Diffusion, which provides two models: SVD and SVD-XT.

On the domestic side, ByteDance, Ali, Baidu and other big technology companies have already entered the market.
Among them, ByteDance launched PixelDance, a literary video model, on November 18, and proposed a video generation method based on text guidance + first and last frame picture guidance, which makes video generation more dynamic.
Soon after, Ali launched the Animate Anyone model. The user only needs to provide a static character image and some preset actions (or pose sequences) to the model, and the animated video of the character can be obtained.
According to previous public information, similar functions of Baidu Wenxin Big Model are in internal testing and will soon be opened in the form of plug-ins.
The positive end of players at home and abroad, to some extent, shows that AI video generation track will become the next benefit direction in this round of AI technology upgrade process. Jim Fan, a senior research scientist at NVIDIA who worked at OpenAI, wrote on social media: “2022 is the year of images, 2023 is the year of sound waves, and 2024 (will be) the year of video!”
CITIC Securities Research pointed out: “With reference to the application of Wensheng Map in the advertising field, Wensheng Video is also expected to promote the productivity revolution, reduce production costs, create thresholds, and accelerate the industrialization process of AIGC technology.” We believe that from the perspective of ability, Wensheng Video is expected to take the lead in Short Video and animation.”
However, the other side of technological innovation is the impact on existing formats.
Leo, who works for a video creation tool company in China, told City: “Earlier this year we thought AIGC was mainly in the field of graphic creation, but it would take a year or two to meet commercial video requirements.” He adds that the commercial video requirements mentioned here include consistency, continuity, etc. of objects in storyboard scripting.
And now it seems that video generation tools are iterating at a multiple of the expected rate. Under the pressure of technological advances, existing market participants have also had to take the initiative and layout of automated generation functions. Otherwise, what you face may be the end of being abandoned by the times.

error: Content is protected !!