Efficient deep learning inference on end devices