unless I am blatantly wrong we are talking about different things around here
the <SOUNDEX> algorithm... given a <character string> will encode it to a four bytes string
so that misspelled words will with a certain degree of confidence encoded to the same <token>
for example the SOUNDEX
A500 will <match> ann, anne, anna, amy , amee , ...
seems that on the other side You are looking for the
Phonetic transcription (or phonetic notation), AKA visual representation of speech sounds so that when somebody looks at an english word will know how to pronounce it
see for example
http://dictionary.cambridge.org/help/phonetics.html if this is so, no algorithm will help
and... remember the phonetic convetion is not unique, there are at least a couple ( IIRC )