Hi Maarten, I wonder why is not possible to prune non active parameters of a model with MoE during run time so as to reduce memory requirements. Thanks!
Hi Maarten, I wonder why is not possible to prune non active parameters of a model with MoE during run time so as to reduce memory requirements. Thanks!
Hi Maarten, I wonder why is not possible to prune non active parameters of a model with MoE during run time so as to reduce memory requirements. Thanks!
During inference, any expert may be chosen, so these have to remain in memory ready to use for when they are called upon.