
MosaicML Acquired for $1.3B, Major AI Startup Deals Kick Off
Recently, big data giant Databricks announced the acquisition of generative AI start-up MosaicML for US$1.3 billion (approximately RMB 9.4 billion). This acquisition, which took place in Silicon Valley, USA, is the largest announcement in the field of generative AI this year. attracted great attention from the industry.
MosaicML was established in San Francisco, USA in 2021. It successfully completed the first round of financing shortly after its establishment. Well-known venture capital DCVC, Lux Capital, Future Ventures and other investment institutions participated. MosaicML received a total of 37 million US dollars in financing.
At the time of the first round of financing, MosaicML was valued at US$220 million, but in this acquisition, MosaicML’s valuation was directly increased by nearly 6 times, which surprised the industry.
Founded less than two years ago, the valuation is so high. What kind of “housekeeping skills” does this generative AI startup have?
0 1
AI model service enterprise
Product quality and low price
According to public information, MosaicML’s product portfolio includes open source, commercially licensed MPT Foundation series models and MosaicML reasoning and training services.
Its MosaicML Composer open source deep learning library provides 20 methods for computer vision and natural language processing, including models, datasets and benchmarks. The launch of the MosaicML Explorer helps developers explore and understand the time, performance and cost between different cloud services and hardware options to simplify and evaluate implementation options. The launched MosaicML AI development platform provides cost-effective model deployment and customized training, while ensuring data security, enabling users to have ownership of the model, etc.
It is worth mentioning that the MPT basic model series is a series of open source, commercially available large-scale language models provided by MosaicML, which can be used as the basis for users to build their own generative AI applications.
MosaicML’s MPT basic model series includes two models, MPT-7B and MPT-30B, with 7 billion and 30 billion parameters respectively.
MPT-7B is a ChatGPT-like open source large language model released by MosaicML on May 5 this year. MPT-7B was trained on the MosaicML platform for 9.5 days with zero manual intervention at a cost of only $200,000. The model has technical advantages such as commercialization, high performance, low resource consumption, 1T training data, and code generation.
Well-known manufacturers such as AI2, Generally Intelligence, Hippocratic AI, Replit, and Scatter Labs use the MPT-7B to develop various generative AI products.
Up to now, the MPT-7B open source project has been downloaded more than 3 million times. According to the acquirer Databricks, this is also one of the important reasons for its acquisition of MosaicML.
After the launch of another model, MPT-30B, it also attracted the attention of the industry and was very popular. Its training cost is much lower than that of other competitors, which is expected to promote the application of AI models in a wider range of fields and gradually reduce training costs.
MosaicML CEO and co-founder Naveen Rao said that the training cost of MPT-30B is only US$700,000, which is far lower than the tens of millions of US dollars required for similar products such as GPT-3. The model can be trained more quickly due to its low cost and small size, and is more suitable for deployment on local hardware.
MosaicML also introduced that the company trained MPT-30B for 2 months, pre-trained through data mixing, collected 1T pre-trained data tokens from 10 different open source text corpora, and used the EleutherAI GPT-NeoX-20B tokenizer to segment the text, and sampled according to the above ratio.
It should be noted that developers can download and use the open source MPT-30B base model from Hugging Face, and can also use their own data to fine-tune on local hardware.
MosaicML also stated that expanding the model parameters to 30 billion is only the first step, and then they will launch larger and higher-quality models at a lower cost.
Another bright spot product of MosaicML is the enterprise-oriented MosaicML reasoning launched this year.
“Several startups are already using MosaicML’s models and tools to build natural language front-ends and search systems,” said Naveen Rao, CEO and co-founder of MosaicML. “MosaicML allows enterprises to use the company’s model architecture to train models on their own data, and then deploy models through its inference API. If customers train a model, they can rest assured that they own all iterations of the model and that the model is theirs. The cost is 4 times lower than LLM with OpenAI, and the cost of image generation is 15 times cheaper than DALL-E 2 with OpenAI.”
“We want as many people as possible to understand and use this technology, and that’s our goal. It’s not exclusive. It’s not elitism,” Naveen Rao said at the same time.
0 2
Founded by ex-Intel executives
MosaicML starting point is not low
As a start-up company, MosaicML has a lot to do with its founders why it launches explosive products one after another.
MosaicML was founded by Naveen Rao, a former AI product leader at Intel and co-founder of Nervana Systems, and Hanlin Tang, a senior director of Intel AI Labs.
Naveen Rao, the founder of MosaicML, graduated from Duke University with a major in computer science in 1997, and later obtained a Ph.D. in neuroscience from Brown University. Naveen Rao has long been committed to the learning and development of artificial intelligence neural networks. He worked as a researcher on neuromorphic machines at Qualcomm and founded the artificial intelligence company Nervana Systems in 2014. The company was later acquired by Intel for $408 million in 2016.
Hanlin Tang graduated from Harvard University, researching recurrent neural networks in human vision. He received a bachelor’s degree in physics from Princeton University, and then a doctorate in biophysics from Harvard University, studying recurrent neural networks in human vision. Hanlin Tang spent his youth in Taipei. After joining Intel, he served as a senior director at Intel AI Labs, during which Hanlin Tang was responsible for algorithm engineering and deep learning research, and participated in the development of MLPerf benchmarks.
Hanlin Tang has published a number of papers in top international journals and conferences, covering areas such as computational neuroscience, computer vision, natural language processing, and reinforcement learning.
Another notable team member is MosaicML chief scientist Jonathan Frankle, a postdoctoral fellow at the MIT Computer Science and Artificial Intelligence Laboratory and an affiliated faculty member at the Harvard Kempner Institute. Jonathan Frankle’s research direction is the learning dynamics and training algorithms of neural networks, aiming to improve the efficiency of large language models (LLM) while reducing training costs. This research direction is also the core competitiveness of MosaicML. It can be said that Jonathan Frankle is the key figure for MosaicML to sell 9.4 billion.
Naveen Rao and Hanlin Tang were able to work at Intel because Nervana Systems developed Neon, a high-performance deep learning framework, and later launched the Nervana Cloud deep learning cloud platform and Nervana Engine dedicated hardware accelerator. Intel believed that these products were very valuable, so they acquired Nervana Systems. Naveen Rao and Hanlin Tang also joined Intel, one became the head of the AI product group, and the other became the senior director of the AI laboratory.
However, in 2020, Intel announced that it would abandon the originally planned Nervana server-side AI acceleration chip and spend $2 billion to acquire the products of the Israeli company Habana.
After Intel decided to “abandon” Nervana, Naveen Rao and Hanlin Tang, former core employees of Nervana, also left Intel together, and the two founded today’s MosaicML separately. According to LinkedIn information, Hanlin Tang is currently the CTO of MosaicML.
0 3
Databricks acquires MosaicML
Strong alliance?
Databricks acquired MosaicML not only because of commercial value, but also because the two companies can join forces to achieve technological breakthroughs and increase the size of AI models.
Let’s first take a look at the acquirer Databricks, a giant company in the field of data storage and analysis, co-founded by several founders of the Spark big data processing system at the AMP Laboratory of the University of California, Berkeley. Databricks customers span large and small businesses, as well as various industries. As of March 2023, it has more than 9,000 corporate users worldwide. Including AT&T, Shell, Burberry, Toyota, Walgreens, Adobe, Condé Nast, and Regeneron Pharmaceuticals, among others.
In 2021, Databricks won a US$1.6 billion Series H round of financing led by Counterpoint Global, a subsidiary of Morgan Stanley. On April 18, 2023, Databricks was selected into the “2023 Hurun Global Unicorn List” with a valuation of US$29.8 billion, ranking seventh.
Industry experts said that after the acquisition, MosaicML will become part of the Databricks Lakehouse platform, and the entire team and technology of MosaicML will be included in Databricks, providing a unified platform for companies to manage data assets and helping Databricks better develop generative AI technology. At the same time be able to use their own proprietary data to build, own and protect their own generative AI models.
Ali Ghodsi, CEO of DataBricks, also said that the acquisition of MosaicML will further enhance DataBricks’ data analysis platform.
The reason that DataBricks acquired MosaicML is to increase the size of the AI model is the mainstream view. The reason is that MosaicML is recognized for its cutting-edge MPT large language model. MPT-7B and MPT-30B are both explosive products developed this year, with downloads exceeding one million.
It is worth mentioning that MosaicML’s automatic optimization of model training makes the training speed 2-7 times faster than standard methods, and the near-linear expansion of resources allows the training of multi-billion parameter models in a few hours.
With their joint product, Databricks and MosaicML aim to reduce the cost of training and using LLMs from millions to thousands of dollars.
It can be seen that Databricks is trying to increase the AI model to challenge the market position of big companies such as OpenAI, Microsoft, and Google, and bring new choices to the industry.
However, there are also opposing views that the value proposition of Databricks integrating LLM is not clear, because Databricks is mainly engaged in Lakehouse and mainly uses Spark to process large-scale cluster data, so the value of integrating large languages is not clear. Some people in the industry believe that Databricks is taking advantage of the current enthusiasm for large-scale models to hype, and the acquisition will not have an obvious breakthrough in technology, and MosaicML will be abandoned by Databricks sooner or later.
Whether this acquisition can achieve good results and achieve technological breakthroughs while realizing commercial value may still need to wait for time to verify.
Financial experts believe that the acquisition of MosaicML may be the AI unicorn company laying the foundation for the road to IPO.
0 4
AI large model mergers and acquisitions kicked off
The birth of ChatGPT at the end of last year opened the curtain of the AI competition. Half a year later, there was another wave of AI mergers and acquisitions.
The reason is nothing more than the fact that after a period of brutal growth of generative AI, large enterprises have made some progress and at the same time discovered the shortage of existing technology and talents. Those AI start-ups are relatively more professional, with both talents and technology, but there are problems such as insufficient funds and scarce resources. Therefore, the emergence of AI mergers and acquisitions is inevitable, and it is also positive and beneficial to the entire industry.
In addition to the acquisition of MosaicML by Databricks described in this article, in May of this year, cloud computing giant Snowflake announced the acquisition of Neeva, a generative AI search startup founded by two former Google employees. Industry experts believe that this acquisition will enable Snowflake to take cutting-edge search technology and inject it into the data cloud to meet the needs of customers, partners and developers.
Notably, members of Neeva’s leadership team were instrumental in creating products such as YouTube Monetization and Google’s Search Ads. If nothing else, this acquisition will take search and conversations in Snowflake to the next level. However, the amount of the acquisition was not disclosed.
On June 26, Thomson Reuters, the world’s largest provider of professional information services, announced that it had acquired AI start-up Casetext for US$650 million in cash. The company’s main business is to provide AI assistant services for legal professionals.
According to public information, Casetext has 104 employees, and its clients include more than 10,000 law firms and corporate legal departments. Its main product, CoCounsel, is an AI-powered legal assistant launched this year powered by GPT-4. The acquisition will effectively complement Thomson Reuters’ existing AI roadmap.
On June 29, AI start-up company Inflection announced the completion of US$1.3 billion in financing. This round of financing was led by Microsoft, Nvidia, etc., and its total financing reached US$1.525 billion.
Looking at China’s AI market again, on June 29, Meituan issued an announcement announcing that it had completed the acquisition of 100% of the equity of domestic and foreign entities light years away, at a cost of 2.065 billion yuan.
Regarding this acquisition, Meituan stated in the announcement that Light Years Away is a leading AGI innovator in China, and its current management and technical team has high-level experience in developing deep learning frameworks. Through the Acquisition, the Company can acquire leading AGI technology and talents, and has the opportunity to strengthen its competitiveness in the fast-growing artificial intelligence industry.
Meituan said that after the completion of the merger, it will support the Lightyear team to continue research and exploration in the field of AI large models.
Coincidentally, on June 16 this year, Kunlun Wanwei announced that its holding subsidiary, Star Group, plans to issue shares to acquire the entire equity of Singularity AI.
Singularity AI is committed to the realization of general artificial intelligence. It is currently focusing on the research and development of large-scale natural language pre-training models and developer APIs. Its main products and services include general developer APIs, chatbots, and knowledge extraction.
The wave of AI mergers and acquisitions abroad has released a strong industry signal, which means that the development of foreign AIGC has begun to upgrade, and disruptive innovations may appear at any time, whether it is technology, business, scenario or commercialization.
However, we must also see that China is the only country outside the US market that has a complete AIGC industrial chain. In this fourth industrial revolution, everyone has no way out.

