November 26, 2024
Iterative Combinatorial Brain Surgeon: Scalable Pruning of Large Language and Vision Models (LLVMs)
The Challenge
State-of-the-art large language and vision models (LLVMs) have seen tremendous success, but their massive scale comes with a hefty price in terms of computational resources. The need to balance performance and efficiency has led to a growing interest in model compression techniques. By using methods like pruning, quantization, or distillation, researchers aim to streamline these models without sacrificing their impressive accuracy.
The Impact
With the integration of advanced methods — such as the one proposed below — and specialized hardware support for sparse models, we can significantly decrease the computational power and energy required to run AI models, all while maintaining their original performance. This can enable the deployment of smaller, more efficient models directly on devices, rather than relying on server-side processing — ultimately helping to enhance data privacy.
The Outcomes
We proposed iterative Combinatorial Brain Surgeon (iCBS), a scalable iterative pruning algorithm that optimizes over small blocks of weights in neural networks using block gradient descent. This blockwise approach can allow iCBS to scale to very large models, including LLVMs with billions of parameters, while helping to achieve higher performance compared to existing one-shot pruning techniques.
For further details on this project, read the full paper.
References & Disclaimers
1176959.1.0
Related posts
Find “Product Market Fit,” or That’s It
Jacob Kozhipatt
May 15, 2023
In his conversation with FCAT, Levine speaks on how the current macro-economic climate impacts the future of start-ups, the radical importance he places on start-ups finding product-fit, why he doesn’t yet view generative AI as truly disruptive technology, and why great entrepreneurs should focus on the problem they are attempting to solve, rather than the solution.
Technology & Society, Artificial Intelligence
Can a Machine Be Moral? A Q&A with Jean-Francois Bonnefon
Sarah Hoffman
April 13, 2022
FCAT recently hosted a presentation by psychologist and author Jean-Francois (JF) Bonnefon on his latest book, “The Car That Knew Too Much”. The book discusses a groundbreaking experiment, the Moral Machine, that allowed millions of people from over 200 countries and territories to make choices about life-and-death dilemmas posed by driverless cars. Should they sacrifice passengers for pedestrians? Save children rather than adults? Kill one person so many can live? Following his presentation, FCAT’s Sarah Hoffman caught up with JF to ask a few additional questions about this largest experiment in moral psychology.
Humanity-Centered Design: An Interview with Don Norman
John Dalton
February 14, 2023
FCAT had the pleasure of welcoming Don Norman, author of Design for a Better World, for a speaking event where he presented an eye-opening diagnosis of how human behavior has led to numerous societal crises from collapsing social structures to climate change. Norman, both a scientist and business executive, proposes how we can reconsider what’s important in life and how that new way of thinking can help save humanity. As a sneak peek to his new book, FCAT’s VP of Research, John Dalton interviewed Norman to dive into his philosophy of humanity-centered design.