Adapting Transformer to End-to-End Spoken Language Translation