This paper proposes a new image encryption algorithm. First, time-domain and frequency-domain features of the user’s voice are extracted to generate a voice key. Second, the key is iterated through a chaotic map multiple times to map the key data to the chaotic oscillation region, and, subsequently, the parameters of the oscillation area are used to encrypt the user’s image. Third, at the time of decryption, the user’s latest voice data are re-extracted to generate a new voice key and decrypt the encrypted image. The encrypted image cannot be successfully decrypted if there are differences between the two extracted voices in the time or frequency domain. Finally, the experiments are performed using 80 groups of face images and voice data, all of which pass the encryption and decryption experiments. In addition, various safety tests have been carried out on the algorithm. The key sensitivity of the algorithm is verified by the normalized cross-correlation parameter Cncc. The effective anti-attack ability of the algorithm is verified by measuring the correlation between adjacent pixels, the number of changing pixel rate (NPCR) and the unified averaged changed intensity (UACI). The key space of the proposed algorithm is greater than 2100, and it has good anti-cracking ability.