MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC
Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions forTransformers can increase the inference latency by more than 60× or significantly compromise the quality of inference results. In this paper, we design the framework MPCFORMER using secure multi-party computation (MPC) andKnowledge Distillation (KD). It can be used in tandem with many specificallydesigned MPC-friendly approximations and trained Transformer models. MPC-FORMER significantly speeds up Transformer model inference in MPC settings while achieving similar ML performance to the input model. We evaluate MPC-FORMER with various settings in MPC. On the IMDb dataset, we achieve sim-ilar performance to BERTBASE, while being 5.3× faster. On the GLUE bench-mark, we achieve 97% performance of BERTBASE with a 2.2× speedup. We showthat MPCFORMER remains effective with different trained Transformer weightssuch as ROBERTABASE and larger models including BERTLarge. In particular,we achieve similar performance to BERTLARGE, while being 5.93× faster on theIMDb dataset.