Speech Recognition is the process of converting spoken input to digital output, such as text. Speech recognition systems provide computers with the ability to listen to user speech and determine what is said.
The Speech Recognition process can be divided into these four steps:
- Speech is converted to digital signals.
- Actual speech sounds are extracted from the sounds (based on energy of the sounds).
- The extracted sounds are put together into ‘speech frames.’
- The speech frames are compared with words from the grammar file to determine the spoken word.
Related Project – Automatic Subtitle Generation for Sound in Videos