• >
  • >
  • >
  • >
  • >
Google > 实例探究 > Magic's Ultra-Long Context Models: Revolutionizing Software Development with 100M Token Context Windows

Magic's Ultra-Long Context Models: Revolutionizing Software Development with 100M Token Context Windows

Google Logo
公司规模
Large Corporate
地区
  • America
国家
  • United States
产品
  • LTM-2-mini
  • Magic-G4
  • Magic-G5
技术栈
  • Custom CUDA
  • NVIDIA H100 Tensor Core GPUs
  • NVIDIA GB200 NVL72
实施规模
  • Enterprise-wide Deployment
影响指标
  • Digital Expertise
  • Innovation Output
技术
  • 分析与建模 - 机器学习
  • 分析与建模 - 预测分析
适用行业
  • Software
适用功能
  • 产品研发
服务
  • 软件设计与工程服务
  • 系统集成
关于客户
Magic is a company focused on advancing AI technology, particularly in the domain of software development. They are pioneering the use of ultra-long context models, which can handle up to 100 million tokens of context during inference. This capability allows for more effective code synthesis and reasoning, as the models can consider a vast amount of information, including code, documentation, and libraries, that are not publicly available on the internet. Magic is committed to pushing the boundaries of AI by developing models that can perform complex tasks with minimal human intervention. They are also focused on building supercomputers to support their AI models, partnering with Google Cloud to leverage NVIDIA's advanced GPU technology. With significant funding and a dedicated team, Magic aims to revolutionize the way AI models are trained and deployed, emphasizing the importance of inference-time compute as the next frontier in AI development.
挑战
The challenge in the AI field has been the limited context windows during inference, which restricts the ability of models to learn and reason effectively. Traditional models rely heavily on training due to the short context windows available, which limits their ability to synthesize code and perform complex reasoning tasks. Current evaluation methods for long context models, such as the Needle In A Haystack eval, have inherent flaws that allow models to perform well without truly understanding or storing large amounts of information. These methods often provide semantic hints that make it easier for models to retrieve information, thus not accurately reflecting real-world tasks. Additionally, the memory and computational requirements for handling ultra-long context windows are significant, posing a challenge for scaling and practical application.
解决方案
Magic has developed ultra-long context models, such as the LTM-2-mini, which can handle up to 100 million tokens of context. This allows the models to perform more complex reasoning and code synthesis tasks by considering a vast amount of information during inference. To address the flaws in current evaluation methods, Magic designed HashHop, a new evaluation method that eliminates semantic hints and requires models to store and retrieve maximum information content. This method involves prompting models with hash pairs and asking them to complete a chain of hashes, which tests their ability to attend and jump across multiple points in the context. Magic has also partnered with Google Cloud to build supercomputers, Magic-G4 and Magic-G5, powered by NVIDIA's advanced GPUs, to support the training and deployment of their models. With significant funding and a focus on innovation, Magic is committed to advancing AI technology and setting higher regulatory standards for AI safety and cybersecurity.
运营影响
  • Magic's ultra-long context models allow for more effective code synthesis by considering a vast amount of information during inference.
  • The HashHop evaluation method provides a more accurate measure of a model's ability to store and retrieve information without semantic hints.
  • Magic's partnership with Google Cloud enables the building of supercomputers to support the training and deployment of their AI models.
  • The development of ultra-long context models emphasizes the importance of inference-time compute as the next frontier in AI development.
  • Magic is committed to advancing AI technology and setting higher regulatory standards for AI safety and cybersecurity.
数量效益
  • LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B1 for a 100M token context window.
  • Running Llama 3.1 405B with a 100M token context requires 638 H100s per user, whereas LTM requires a small fraction of a single H100's HBM per user.

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

相关案例.

联系我们

欢迎与我们交流!
* Required
* Required
* Required
* Invalid email address
提交此表单,即表示您同意 Asia Growth Partners 可以与您联系并分享洞察和营销信息。
不,谢谢,我不想收到来自 Asia Growth Partners 的任何营销电子邮件。
提交

感谢您的信息!
我们会很快与你取得联系。