Mixture-of-experts (MoE) architectures could achieve impressive computational efficiency with expert parallelism, which relies heavily on all-to-all communication across devices. Unfortunately, such communication overhead typically constitutes a significant portion of the total runtime, hampering the scalability of distributed training and inference for modern MoE models (consuming over $40\%$ of the runtime in large-scale training). In this paper, we first define \textit{collaborative communication} to illustrate this intrinsic limitation, and then propose system- and algorithm-level innovations to reduce communication costs. Specifically, given a pair of experts co-activated by one token, we call them as \textit{collaborated}, which comprises $2$ cases as \textit{intra-} and \textit{inter-collaboration}, depending on whether they are kept on the same device. Our pilot investigations reveal that augmenting the proportion of intra-collaboration can accelerate expert parallel at scale. It motivates us to strategically \uline{\texttt{o}}ptimize \uline{\texttt{c}}ollaborative \uline{\texttt{c}}omm\uline{\texttt{u}}nication for acce\uline{\texttt{l}}era\uline{\texttt{t}}ed MoE training and inference, dubbed \textbf{\texttt{Occult}}. Our designs are capable of \uline{either} delivering exact results with reduced communication cost, \uline{or} controllably minimizing the cost with collaboration pruning, materialized by modified fine-tuning. Comprehensive experiments on various MoE-LLMs demonstrate that \texttt{Occult} can be faster than popular state-of-the-art inference or training frameworks (over $50\%$ speed up across multiple tasks and models) with comparable or superior quality compared to the standard fine-tuning.
{}