There has been a resurgence of attention in speech recognition over the last few decades, as researchers and industries have designed and demonstrated numerous applications of this rich field of study. Automated Speech Recognition (ASR) is a popular and challenging area of research in developing human-computer interactions. The main challenge of speech recognition lies in modeling the variations of the uttered speech, such as different geographical boundaries, social background, age, gender, occupation, etc. Speech processing is a useful task for solving many types of problems in different areas. These problems cover characterization, segmentation, clustering and classification, pattern association, and recognition. In order to develop an effective speech recognition system, it is important to understand each of these topics. Speech recognition studies intend to build devices that can accept spoken information and act appropriately upon receiving the information. Speech recognition technology has made it possible for a computer to follow human voice commands and understand human languages.
One of the most challenging aspects of performing research in speech recognition by computer is its interdisciplinary nature and the tendency of most researchers to apply a consistent strategy to individual problems. Numerous disciplines, such as signal processing, acoustics, pattern recognition, communication and information theory, linguistics, physiology, psychology, and computer science, have been applied to one or more speech recognition problems. Today’s remarkable research engagement is the successful driven of deep learning approaches in speech recognition. In recent years, many researchers have been trying to develop speech command-driven systems in their mother languages. Besides the English language, there are many research experiments and achieved results in various languages. This study focuses on Bangla speech processing and demonstrates their experiments using advanced methods and algorithms in well-established platforms.
This book presents a wide variety of speech processing approaches, with segmentation, feature extraction, classification, and recognition. This book introduces four algorithms for dynamic thresholding to segment the continuous Bangla speech sentences into words/sub-words, based on modified k-means algorithm, fuzzy µ-means algorithm, modified Otsu’s algorithm, and short-time speech feature-based algorithm. Besides, a new approach named the Blocking Black Area method is introduced to identify the voiced regions of the continuous speech in speech segmentation. In addition, the reader will be familiar with efficient speech classification and speech feature generation approaches. In this book, fundamentals of neural networks-based speech recognition have been written for students and researchers in academia and industry who are interested in using neural networks and speech recognition in several applications. The emphasis is on continuous Bangla speech processing and its features. In order to better illustrate the proposed speech processing approaches, several experiments are done in the Matlab platform wherever it is appropriate.
The order of presentation of the topics in this book was chosen to reflect the increasing complexity of speech processing approaches and related materials. The material in each chapter is large enough to focus on the related topics so that the reader can understand precisely. The contents of this book are organized into five chapters. A general overview of speech recognition research and speech and language processing with historical background is discussed in Chapter 1. The speech signals and features, covered in chapters 2 and 3, are two fundamental topics in speech recognition. Chapter 4 describes various speech segmentation and classification techniques. This chapter includes the proposed algorithms for dynamic thresholding in speech segmentation (based on k-means clustering-based thresholding algorithm, fuzzy µ-means clustering-based thresholding algorithm, Otsu’s thresholding algorithm, blocking black area method, and short-time speech features). Finally, Chapter 5 presents neural network-based speech recognition. In addition, the chapter describes various faster and improved back-propagation (BP) algorithms to implement a speech recognition system, including BP with momentum, variable learning rate BP, conjugate gradient BP, resilient BP, and Lavenberg-Marquardt algorithms. Algorithms are provided to encourage the reader to develop a thorough understanding mechanism of training and applying neural networks in Bangla speech recognition.