Google > 实例探究 > Magic's Ultra-Long Context Models: Revolutionizing Software Development with 100M Token Context Windows

Magic's Ultra-Long Context Models: Revolutionizing Software Development with 100M Token Context Windows

公司规模

Large Corporate

地区

America

国家

United States

产品

LTM-2-mini
Magic-G4
Magic-G5

技术栈

Custom CUDA
NVIDIA H100 Tensor Core GPUs
NVIDIA GB200 NVL72

实施规模

Enterprise-wide Deployment

影响指标

Digital Expertise
Innovation Output

技术

分析与建模 - 机器学习
分析与建模 - 预测分析

适用行业

Software

适用功能

产品研发

服务

软件设计与工程服务
系统集成

关于客户

Magic is a company focused on advancing AI technology, particularly in the domain of software development. They are pioneering the use of ultra-long context models, which can handle up to 100 million tokens of context during inference. This capability allows for more effective code synthesis and reasoning, as the models can consider a vast amount of information, including code, documentation, and libraries, that are not publicly available on the internet. Magic is committed to pushing the boundaries of AI by developing models that can perform complex tasks with minimal human intervention. They are also focused on building supercomputers to support their AI models, partnering with Google Cloud to leverage NVIDIA's advanced GPU technology. With significant funding and a dedicated team, Magic aims to revolutionize the way AI models are trained and deployed, emphasizing the importance of inference-time compute as the next frontier in AI development.

登录后查看完整内容

挑战

The challenge in the AI field has been the limited context windows during inference, which restricts the ability of models to learn and reason effectively. Traditional models rely heavily on training due to the short context windows available, which limits their ability to synthesize code and perform complex reasoning tasks. Current evaluation methods for long context models, such as the Needle In A Haystack eval, have inherent flaws that allow models to perform well without truly understanding or storing large amounts of information. These methods often provide semantic hints that make it easier for models to retrieve information, thus not accurately reflecting real-world tasks. Additionally, the memory and computational requirements for handling ultra-long context windows are significant, posing a challenge for scaling and practical application.

登录后查看完整内容

解决方案

Magic has developed ultra-long context models, such as the LTM-2-mini, which can handle up to 100 million tokens of context. This allows the models to perform more complex reasoning and code synthesis tasks by considering a vast amount of information during inference. To address the flaws in current evaluation methods, Magic designed HashHop, a new evaluation method that eliminates semantic hints and requires models to store and retrieve maximum information content. This method involves prompting models with hash pairs and asking them to complete a chain of hashes, which tests their ability to attend and jump across multiple points in the context. Magic has also partnered with Google Cloud to build supercomputers, Magic-G4 and Magic-G5, powered by NVIDIA's advanced GPUs, to support the training and deployment of their models. With significant funding and a focus on innovation, Magic is committed to advancing AI technology and setting higher regulatory standards for AI safety and cybersecurity.

登录后查看完整内容

运营影响

Magic's ultra-long context models allow for more effective code synthesis by considering a vast amount of information during inference.
The HashHop evaluation method provides a more accurate measure of a model's ability to store and retrieve information without semantic hints.
Magic's partnership with Google Cloud enables the building of supercomputers to support the training and deployment of their AI models.
The development of ultra-long context models emphasizes the importance of inference-time compute as the next frontier in AI development.
Magic is committed to advancing AI technology and setting higher regulatory standards for AI safety and cybersecurity.

登录后查看完整内容

数量效益

LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B1 for a 100M token context window.
Running Llama 3.1 405B with a 100M token context requires 638 H100s per user, whereas LTM requires a small fraction of a single H100's HBM per user.

登录后查看完整内容