Kamini Sabu

About

"Passionate and innovative speech research engineer
interested in building applications beneficial to society
while shaping new talent towards to the same"

Kamini is Chief Engineer in Audio AI team at Samsung Research Institute Bangalore. She completed her PhD in Speech Processing from Electrical Engineering (IIT Bombay, Mumbai, India). Her thesis work Automatic assessment of fluency in childrens oral reading using prosody modeling with Prof. Preeti Rao later evolved into the TARA app. article in collaboration with Tata Center and LETS project.
Proficient in machine learning, her industry experience is dedicated to adapting and personalizing applications to device-specific needs and noise environments.

Her research interest lies in the following areas:

Speech Prosody and Phonetics
Signal Processing and Speech Preprocessing
Machine Learning and Deep Learning
Speech Synthesis and Foundational Models
Speech Recognition and Large Language Models
Natural Language Processing and Image Processing

She can be contacted at kaminimsabu@gmail.com

Publications

Recent Patents

K. Sabu and R. Viswanathan, "Speech-style based distinction between command and dictated text", 2025, (submitted vide) Patent Application Number:
R. K. Samal, P. K. G. Sivam, S. Viswanathan, and K. Sabu, "Wakeupless action: A method to generate and enable wakeup words detection dynamically using visual screen context", 2023, (submitted vide) Patent Application Number:

Recent Publications

R. Viswanathan, S. Shrivastava, and K. Sabu, "Using target speaker based tuning for personalized end point detection", (submitted to) Proc. ICASSP, May 2026, Barcelona, Spain.
P. Nayak, K. Sabu, and M.A. Basha Shaik, "Multi-mic echo cancellation coalesced with beamforming for real world adverse acoustic conditions", Proc. of Interspeech, Sep 2024, Kos Island, Greece. link
K. Sabu, M. Sharma, N. Tiwari, and M.A. Basha Shaik, "Regularization based incremental learning in TCNN for robust speech enhancement targeting effective human machine interaction", Proc. of International Conference on Speech and Computer, Nov 2023, Hubli-Dharwad, India. link
R. Gohil, R. Viswanathan, S. Agrawal, C.M. Vikram, M.R. Kamble, K. Sabu, M.A. Basha Shaik, and K.K.S. Rajesh, "Ensemble of incremental system enhancements for robust speaker diarization in code-switched real-life audios", Proc. of International Conference on Speech and Computer, Nov 2023, Goa, India. link
K. Praveen, B. Radhakrishnan, K. Sabu, A. Pandey, and M.A. Basha Shaik, "Language identification networks for multilingual everyday recordings", Proc. of Interspeech, Aug 2023, Dublin, Ireland. link
P. Gudepu, M.J. Koroth, K. Sabu, and M.A. Basha Shaik, "Dynamic encoder RNN for online voice activity detection in adverse noise conditions", Proc. of Interspeech, Aug 2023, Dublin, Ireland. link
P. Rao, M. Pandya, K. Sabu, K. Kumar and N. Bondale, "A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse", Proc. of Interspeech, Sep 2018, Hyderabad, India. link
P. Rao, N. Sanghvi, H. Mixdorff and K. Sabu, "Acoustic Correlates of Focus in Marathi: Production and Perception", Journal of Phonetics, 65, Nov 2017, pp. 110-125. link
S. Barhate, S. Kshirsagar, N. Sanghvi, K. Sabu, P. Rao, and N. Bondale, "Prosodic Features of Marathi News Reading Style", Proc. of IEEE TENCON, Nov 2016, Singapore. link
K. Sabu, "Speech Conversion to Devanagari Script", International Journal on Advances in Engineering Technology and Science, 1(2), 27-28, Dec 2015. link

Publications from PhD project

Patent

P. Rao, K. Sabu, N. Nayak, and B. Shreeharsha, "System for automatic assessment of fluency in spoken language and a method thereof", 2019, Patent Number: WO2021074721A2. link

Journals

K. Sabu and P. Rao, "Predicting comprehensibility of children's oral reading: A dataset and baseline system", Computer Speech and Language, 84, pp 1-23, Mar 2024. link
K. Sabu and P. Rao, "Prosodic event detection in children's read speech", Computer Speech and Language, 68, pp 1-19, Feb 2021. link
K. Sabu and P. Rao "Automatic assessment of children's oral reading using speech recognition and prosody modeling", CSI Transactions on ICT, S.I. Visvesvaraya, pp 1-5, Jun 2018, Springer. link

Conferences

M. Vaidya, K. Sabu, and P. Rao, "Deep learning for prominence detection in children's read speech", Proc. of ICASSP, May 2022, Singapore. link
K. Sabu and P. Rao, "Automatic prediction of confidence level from children's oral reading recordings", Proc. of Interspeech, Oct 2020, Shanghai, China. link
K. Sabu, K. Kumar and P. Rao, "Automatic detection of expressiveness in oral reading", Special session: Show And Tell, Interspeech, Sep 2018, Hyderabad, India. link
K. Sabu and P. Rao, "Detection of prominent words in oral reading by children", Proc. of Speech Prosody, Jun 2018, Poznan, Poland. link
K. Sabu, K. Kumar, and P. Rao, "Improving the noise robustness of prominence detection for Children's Oral Reading Assessment", Proc. of NCC, Feb 2018, Hyderabad, India. link
K. Sabu, P. Swarup, H. Tulsiani and P. Rao, "Automatic assessment of children's L2 Reading for Accuracy and Fluency", Proc. of SLaTE, Aug 2017, Stockholm, Sweden. link
A. Pasad, K. Sabu and P. Rao, "Voice Activity Detection for Children's Read Speech Recognition in Noisy Conditions", Proc. of NCC, Mar 2017, IIT Madras, India. link

Arxiv Publications

C. Vitthal, Shreeharsha B., K. Sabu, and P. Rao, "Predicting lexical skills from oral reading with acoustic measures", arxiv 2112.00635 [eess.AS], 2021. link
K. Sabu, M. Vaidya, and P. Rao, "CNN encoding of acoustic parameters for prominence detection", arxiv 2104.05488 [cs.CL], 2021. link
K. Sabu, S. Chaudhuri, P. Rao, and M. Patil, "An optimized signal processing pipeline for syllable detection and speech rate estimation", arxiv 2103.04346 [eess.AS], 2020 (Accepted at Proc. of NCC, Feb 2020, Kharagpur, India). link

Other Presentations

K. Sabu, "Survey on automatic recognition systems for behavioral assessment systems of children's speech with challenges thereof", BMI course project, Dec 2020. link link2
K. Sabu, "Automatic assessment of children's oral reading skills", at Doctoral Consortium organized by ISCA-SAC, Shanghai, China, 2020. link
K. Sabu, "Automatic assessment of children's oral reading for prosodic fluency", at Doctoral Consortium organized by ISCA-SAC, Hyderabad, India, 2018. link
K. Sabu, "Automatic assessment of children's oral reading for prosodic fluency", at Workshop for presentations of research work of Visvesvaraya PhD Scholars organized by Media Lab Asia, Ministry of Electronics and Information Technology, Vishakhapattanam, India, 2017.

Publications from MTech Project

K. Sabu and M. H. Nerkar, "Vanishing Point Estimation for On-Road Navigation", International Journal of Emerging Technology and Advanced Engineering, 5(4), 73-77, 2015. link
K. Sabu and M. H. Nerkar, "Colour Vision Based Drivable Road Area Estimation", International Journal of Innovative Research and Development, 4(5), 234-237, 2015. link
K. Sabu and M. H. Nerkar, "Use of texture orientation based vanishing point for road direction estimation", International Journal of Innovative Research in Computer and Communication Engineering, 3(7), 712-7216, Jul 2015. link

Children's Reading Assessment Project thesis

Objective : To develop an application for automatic assessment of children's oral reading for second language; for, but not limited to, native Marathi children of age group 10-14 years reading L2 English.

We collected 3000+recordings of 900+ students from 10+ schools from various regions in Maharashtra as the students read from set of 80 stories. We formulated the annotation and ratings policy after discussion with 20+ transcribers, raters and teachers. Further discussions with raters and observations from data has been considered for designing the different system modules.

The project involves various aspects of reading evaluation, viz. lexical accuracy and prosodic fluency. Speech recognition module used for lexical miscue prediction and WCPM (words read correctly per minute) determination is a TDNN model trained on speech from Indian English speakers. The same is adapted for children's speech using the children's read speech we collected. We also use the vocal track normalization and speech perturbation based data augmentation to improve the performance.

The prosodic flulency evaluation aspect refers to predicting teacher's subjective ratings of fluency and comprehensibility. We use various hand-crafted prosodic features in random forest classifier framework for the same. The project involves closely working with other project collaborators designing different modules of the overall system.

We achieved 2% improvement in prosodic event detection performance over state-of-the-art through careful feature extraction and selection procedure. We also achieved further improvement of 10% in predicting the degree of prominence with the use of deep learning architectures and lexical features. The predicted prosodic events compared with expected prosodic event positions gives prosodic event miscues. We also designed prosodic features that can flag distinguished speaking-styles like sing-song, cadence, uptalk. All these were used along with other lexical accuracy related features to predict the high-level expert ratings for comprehensibility and confidence.

We introduced automatic prediction of detecting students with poor fluency and achieved 83% accuracy. We also introduced automatic prediction of subjective ratings for confidence and comprehensibility of child using acoustic-prosodic features. We could flag low confidence readers with 82% accuracy.

Academics

Educational Qualifications

Examination	University	Institute	Year	CPI/%
PhD (Electronic Systems)	IIT Bombay	IIT Bombay	2022	9.12
M.E. (Digital Systems)	North Maharashtra University	Govt. College of Engg., Jalgaon	2015	68.05
B.E. (Elect. & Telecom. Engg.)	North Maharashtra University	Govt. College of Engg., Jalgaon	2013	71.07
HSC (PCM Electronics)	Maharashtra State Board	MJ College, Jalgaon	2009	84.00
SSC	Maharashtra State Board	Nasik Division	2007	86.30

Relevant Coursework

Speech Speech Processing, Automatic Speech Recognition
Machine Learning Machine Learning, Deep Learning, Reinforcement Learning, Keras, DeepLearning.AI
I-NCUBATE conducted by The Gopalakrishnan Deshpande Center for Innovation & Entrepreneurship (GDC), IIT Madras, dedicated to the consumer survey for the PhD project (2017)
IIT Madras Received excellent performance token for the course Engineering in Speech Science: Behavioral Machine Intelligence and Applications conducted by IIT Madras under Scheme of Promotion and Academic Research Collaboration (SPARC), MHRD, Govt of India (2022)
Digital Processing Digital Signal Processing, Adaptive Signal Processing, Biomedical Signal Processing

Awards and Achievements

Deemed AI Expert on Workera platform (2026)
Curriculum Drafting for Undergraduate Elective "Digital Audio Processing" in Elec & Telecom Engg at Govt College of Engg, Jalgaon (2025)
Committee Member of the Industry Advisory Board for the Undergraduate Electronics and Telecommunication Engineering Program at Shah and Anchor Kutchhi Engineering College, Chembur, Mumbai (2024)
Received Star of the Sprint Award for 2023 Feb, Samsung Excellence Award for 2024 Q4 and 2025 Q1 and Annual Award 2024 at SRIB
Alumnus Member of the Board of Studies in Elec & Telecom Engg at Govt College of Engg, Jalgaon (2023-25)
Reviewer for the journals "Computer Speech and Langauage", "Speech Communication", "Springer Nature" and "Journal of Marketing Communications" and for the conferences "Interspeech" and "NCC".
Selected among top 20 teams for I-NCUBATE program 2019 conducted The Gopalakrishnan Deshpande Center for Innovation & Entrepreneurship (GDC), IIT Madras and dedicated to consumer survey for the PhD project.
Selected among top 20 presenters for The Third Evaluation and Presentation Workshop of Visvesvaraya PhD Fellows 2017.
Selected for PhD under prestigious Visvesvaraya PhD scheme by the Ministry of Electronics and Information Technology, India (2016)
Meritorious Student Member of the Academic Council at Govt College of Engg, Jalgaon (2013-15)
Lecturer in Electrical Engg Dept at Govt College of Engg, Jalgaon (2015) for the subject Electronics Devices and Linear Integrated Circuits (2015)
Teaching Assistant for Basic Electronics (2014)
Achieved 99.14 percentile in GATE (EC) 2013 among 256135 candidates.
Volunteered and Participated in various techincal and cultural events

Skills

Programming Languages Python, Shell, C (Java, HTML at beginner level)
Python Packages Scikit-learn, Pandas, NumPy, Keras, Matplotlib, PyTorch, SciPy, Librosa, NLTK
Sotwares Kaldi, MATLAB, SciLab, Visual Studio, Eclipse, Pspice
Other Technical Tools Praat, Audacity, \LaTeX, Microsoft Office
Strengths Positive Attitude, Quick Learner, Ability to explain in multiple ways, Truth and Honesty, Punctuality, Sincerity and Regularity, Empathy
Language Proficiency Can understand, read, write and speak English, Hindi, Marathi, Marwadi, and can understand and read Gujarati and Kannad
Interests and Hobbies
- Reading visualizing novels for their visual and auditory landscapes
- Poetry inventing rhymes
- Drawing pencil copying of pictures
- Movies movie and song critic