How the US Continues to Lead in Developing Powerful Large Language Models and Their Growing Industrial Applications

Since the beginning of the year, OpenAI has launched a global AI large model craze with ChatGPT. But the big AI models in the United States are far more than OpenAI’s ChatGPT.

Blowout development

Based on various data, although China is developing rapidly, the United States is still the country that releases the most large models in the world. By May 2023, its basic large models with more than 1 billion parameters have exceeded 100.

The Economist reports that the total investment in large models in the United States in 2022 will reach US$47.4 billion, which is about 3.5 times that of the second-placed China (US$13.4 billion), and continues to surge. Goldman Sachs further predicts that investment in large-scale models in the United States will reach hundreds of billions of dollars in 2025, about half of the world’s total.

A survey by Goldman Sachs shows that 16% of Russell 3000 companies mentioned large models in their 2023 earnings meetings, and its economists estimate that large models will increase overall labor productivity by 1% within ten years and serve as benchmarks. The S&P 500 Index brought about 14% growth.

In addition to ChatGPT, the representative general model companies in the United States now include: Anthropic, Cohere, and Google.

Among them, Anthropic, founded by former OpenAI executives Dario and Daniela Amodei in 2021, is currently valued at US$30 billion and is a general large-model enterprise second only to OpenAI (valued at approximately US$86 billion).

Anthropic has many former OpenAI core employees who have participated in the development of GPT-2 and GPT-3. Its large model product Claude2 is also considered to be a classic masterpiece second only to ChatGPT-4. Some analysts even believe that the performance of Claude2 is better than ChatGPT-4.

For example, Claude2 can handle data sets of up to about 75,000 words, while ChatGPT is about 3,000, which means it can process and output more complex content, and can also be applied to more challenging fields, such as generating thousands of words. Long text content of words.

What makes Claude2 even more popular is that it is directly open to the public for free, instead of requiring payment like GPT-4.

Anthropic’s excellent founding team and strong product performance have made Anthropic highly sought after by capital. Google, one of South Korea’s largest mobile operators SK Telecom (SKT), and Amazon have become its investors, with Amazon alone investing up to US$4 billion. .

In addition to Anthropic, another commendable company is Cohere.

In June this year, Cohere, founded in 2019, received US$270 million in investment from NVIDIA, Oracle, Salesforce Ventures, etc., becoming a unicorn with a valuation of US$2 billion. It is also a basic large-model company with a valuation second only to OpenAI and Anthropic.

Cohere has also attracted much attention in the industry with its strong founding team. One of its founders, Aidan Gomez, is the youngest author of the seminal paper “Attention is All You Need” in the field of large language models. It was this article that first proposed the famous Transformer architecture. , became the basic model for the development of general large models, and ChatGPT was born on the basis of this architecture.

▲ Coral, the first generative AI application launched by Cohere

The products provided by Cohere are similar to those provided by OpenAI, but it saw the market opportunity of “data privacy” and differentiated itself from OpenAI’s positioning. It chose the ToB track and firmly followed the route of commercial large models. Its basic product capabilities include three major categories: text retrieval, text generation and text classification, and can focus on customer needs, emphasizing security, privacy and customized services.

Another big selling point of Cohere is that it is not restricted by any cloud platform, thus ensuring the privacy and security of data. It provides flexible storage and data privacy protection paths, allowing users to implement local deployment to meet the needs of different locations for customer data storage.

Cohere’s ability to quickly pivot and find its own differentiated positioning is inseparable from the unique talent outlook and entrepreneurial philosophy of Aidan and his co-founders.

Aidan once said that Cohere is looking for people with different backgrounds but who are very interested and ambitious in AI: he does not necessarily have a beautiful resume from a big company, but he must have a very high interest and enthusiasm for the field he focuses on, and not only can write Thesis requires practical skills.

Differentiated product strategy and unique team background make Cohere a breath of fresh air in the field of general large models.

Recently, Cohere released the world’s first publicly available multi-language understanding model, which is trained based on real data from native speakers and can read and understand more than 100 of the most commonly used languages ​​in the world.

Let’s look at the giant Google.

On December 6, Google DeepMind launched the multi-modal AI model Gemini, which can learn and understand across multiple modalities such as text, s, videos, and program codes at the same time.

Taking the application of customer service robots as an example, using Gemini as a model can not only understand customers from the literal meaning of the conversation, but also receive the intentions of customers’ words from expressions and tones at the same time, and can process audio, code, images, videos, etc. content.

According to actual test results, Gemini is the first model to surpass human experts in large-model multi-task language understanding, and in 32 AI tests, 30 test results exceeded GPT-4.

With its powerful performance, Gemini quickly emerged from the circle and created huge buzz for its parent company Alphabet. On December 7, the share price of Alphabet, the parent company of Google, rose 5.31% to close at US$136.93, with the total market value reaching US$1.72 trillion. Google plans to gradually integrate this model into its search, advertising and other services.

But when it comes to American large models, what deserves more attention is its application progress in the industry and its future imagination.

Accelerate industrial implementation

The “2023 Artificial Intelligence Index Report” released by Stanford University shows that in 2022, among the 35 large models in the United States, only 3 large models will come from laboratories, and 32 will be born in industry. This year, this trend is still maintained.

On March 30, 2023, when the outside world was still immersed in the carnival of the emergence of General Motors’ large models, Bloomberg single-handedly focused everyone’s attention on the new industry track. On the same day, it announced that it had built the largest financial field data set to date, trained LLM specifically for large language models in the financial field, and developed a language model with 50 billion parameters – BloombergGPT.

With the aura of the world’s first large-scale financial model, BloombergGPT relied on Bloomberg’s large number of financial data sources to build a data set of 363 billion tags. According to the analysis of Gaojin Think Tank, it can greatly improve the work efficiency and stability of financial institutions and help reduce costs and increase efficiency.

At the level of cost reduction, BloombergGPT can reduce personnel investment in investment research, R&D programming, risk control and process management; at the level of efficiency improvement, it can automatically generate high-quality financial reports and financial reports based on given topics and contexts. Analyze reports and prospectuses, while assisting in accounting and auditing work. It can also refine and sort out financial news or financial information, releasing professional manpower to areas that require more human expertise.

Tianfeng Securities pointed out in the report that because BloombergGPT has more professional training corpus than ChatGPT, it will show stronger capabilities than general large models in financial scenarios, which in turn marks the beginning of the GPT revolution in the financial field.

BloombergGPT is just a typical case. At present, there are three obvious “schools” of large financial models in the United States: one is independent full-stack self-research, emphasizing autonomy and controllability; the other is combining one’s own data and scene fine-tuning based on others. Form a large financial model that suits you; the third is to call it from the cloud and access various large model APIs on demand for privatized deployment. Small and medium-sized financial companies with weak technological foundations mostly use this method.

According to relevant statistics, U.S. financial AI accounts for approximately 6.7% of the overall financing in the AI ​​field.

The medical industry is another hot spot for the application of large-scale models in the United States. Technology giants such as Google and Microsoft, medical technology companies such as Sensely and Enlitic, biomedical start-ups such as AbSci and Exscientia, and CXO (pharmaceutical outsourcing) companies such as Senius, are all involved.

New drug research and development services such as compound synthesis and target discovery, as well as hospital diagnosis and treatment services such as electronic medical records and auxiliary consultations, are common scenarios for the application of large medical models in the United States. Medical devices such as CT (computed tomography) and MRI (magnetic resonance imaging) are used in It is further enhanced with the help of large models.

Among the many large medical models, Google’s Med-PaLM2 is the focus of attention. It is the first large model to reach the level of “expert” candidates on the MEDQA dataset of the United States Medical Licensing Examination (USMLE), with an accuracy of over 85 points; it is also the first to achieve accuracy on questions including Indian AIIMS and NEET medical exams. The artificial intelligence system that achieved a passing score on the MEDMCQA data set scored 72.3 points.

Med-PaLM2 is also having a transformative impact on the industry.

Through Med-PaLM2, large-scale biomedical data can be analyzed to discover disease-related genes, proteins and metabolic pathways, identify potential targets, and help screen potentially active drug molecules, thereby narrowing the scope of candidate drugs and prioritizing them. Compounds with higher activity were selected for subsequent experimental verification. The time-consuming research and development of new drugs will therefore shorten the research and development cycle and reduce research and development costs.

The success of Med-PaLM2 also stimulated Google to invest more in the field of medical large models.

For example, it cooperated with the medical software company Epic to develop a tool based on ChatGPT that can automatically send professional medical information to patients; Google’s partner and care provider Carbon Health also launched an AI tool Carby based on GPT-4 , which can automatically generate diagnostic records based on conversations between doctors and patients, greatly improving doctors’ efficiency and diagnostic experience. At present, Carby has been used by more than 130 clinics and more than 600 medical staff. A clinic in San Francisco said that after using Carby, the number of its patients increased by 30%.

In addition to Google, AI chip giant NVIDIA has also been deploying in the field of medical large models for many years.

In 2021, Nvidia announced a strategic partnership with Schrodinger (a US medical information technology company) to improve the speed and accuracy of its computing platform to achieve rapid and accurate assessment and accelerate the development of new treatment methods.

In September 2022, NVIDIA released BioNeMo, a large-scale biomolecule language model used to train and deploy supercomputing scale, to help scientists better understand diseases and find optimal treatments. BioNeMo also provides cloud API services to support pre-training AI models. . In July this year, Nvidia invested another US$50 million in the biotechnology company Recursion to support the development and training of basic AI models in the fields of biology and chemistry.

The education field is also one of the important scenarios for the implementation of large-scale model applications in the United States. Its core applications are mainly focused on three levels: language learning, online courses, and assisted learning. Its landmark case is the AI ​​teaching assistant Khanmigo, which was released in April by the American online education organization Khan Academy and is based on the GPT-4 model and has functions such as tutoring, lesson plan generation, writing training, and programming exercises.

Currently, Khan Academy has achieved commercial operation, and the payment standard is US$9/month or US$99/year. Among them, tutoring teaching can provide one-to-one tutoring for students. Khanmigo will take the initiative to explain the ideas for answering questions, and guide students to engage in thinking training until the students calculate the correct answers by themselves. In addition, Khanmigo can also serve as a writing instructor, prompting and suggesting students to use different methods based on specific details such as character characteristics, story background, etc. Entry points for writing, debating, etc. to unleash students’ creativity.

With powerful intent understanding and natural language communication capabilities, as well as text and image generation capabilities, Khanmigo can truly understand students, provide students with personalized learning suggestions, and significantly improve the supply of teaching materials, including educational and entertaining content. Courseware, rich extracurricular materials, etc., make it possible to realize the “thousands of people, thousands of faces” of education, and are also having an important impact on the industry.

Taken together, American large models are still accelerating their integration and development with industry, and a new industrial revolution is taking place as a result.

error: Content is protected !!