Mixture of Experts (MoE)

A neural network architecture where multiple specialized sub-models (experts) handle different aspects of the input, with a gating mechanism deciding which experts to activate. This allows for larger model capacity while keeping inference costs manageable. Used in models like Mixtral and Grok.