Enhancing Transformer for End-to-end Speech-to-Text Translation