Xavier Anguera, Ph.D.

Professional
Personal

I am the co-founder and CTO of ELSA, an AI-powered application to help language students improve their English communication skills.

At ELSA I built the engineering and research teams and currently I focus my time on developing our AI technology.

Prior to ELSA, I created an edtech startup called Sinkronigo that published speech-enabled ebooks for language learning.

Earlier on I was one of the founding researchers in the multimedia research group at Telefonica R&D, in Barcelona, where I pursued research in speech and multimedia processing.

I hold a Ph.D. in speech processing and I am the coauthor of over 125 research publications. I am also the co-inventor in multiple patents and an active contributor to the open source community (if you worked in the area of multi-microphone speech processing you probably heard about the BeamformIt software).

I am an Electrical engineer, turned speech and multimedia researcher, turned entrepreneur.

I was born in Tarragona, an ancient Roman Empire Capital in the Mediterranean coast of Spain.

I am the single child of a family of farmers that moved to Tarragona right before I was born and established a family business selling and repairing home appliances.

I was thus raised in between TVs under test, and got to master soldering at an early age.

Currently I live in Lisbon, Portugal, with my wife and 2 kids, I enjoy Portuguese good coffee and "pasteis de nata" and how wellcoming Portuguese people are always with me.

[name_initial]+[last_name] @ gmail+[dot]+com

CV

You can find my CV in pdf version in here. In there you will find a complete list of PhD and Msc. student theses I co-directed, as well as more information on my duties in each of my positions across the years
I have been very fortunate to having worked in academia, academic and corporate research and in a startup environment.

You can also visit my linkedin profile for a summarized version of by professional path and to get updated with what I am up to. I do not publish a lot, but I like to post there from time to time.

I am always eager to learn end experience new things. Do you have a new project idea or need advice in your idea, get in touch!
my email: [name_initial]+[last_name] @ gmail+[dot]+com

Publications

Loosely ordered by topics:

Language learning

Anguera, X., Proença, J., Gulordava, K., Tarján, B., Parslow, N., Dobrovolskii, V., Valente, F. & Girard, R. (2023). "ELSA Speech Analyzer: English Communication Assessment of Spontaneous Speech”, In Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE) (pp. 95-96).

Proença, J., Raboshchuk, G., Costa, A., Lopez-Otero & P. Anguera, X. (2019). "Teaching American English pronunciation using a TTS service”, In Proc. 8th Workshop on Speech and Language Technology in Education (SLaTE)

Anguera, X. & Van, V. (2016). "English Language Speech Assistant", Show and Tell session. Interspeech 2016, San Francisco, CA, USA

Anguera, X. (2015). "Multimodal Read-aloud eBooks for Language Learning", Show and Tell session. Interspeech 2015, Dresden, Germany

Speech Segmentation and Clustering

Gracia, C., Anguera, X., Luque, J. & Artzi, I. (2014). "Phoneme-Lattice to Phoneme-Sequence matching algorithm based on Dynamic Programming", book chapter in Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science, Volume 8854, 2014, pp. 99-108. Presented at Iberspeech 2014, Las Palmas, Spain. pdf

Gracia, C., Anguera, X. & Binefa, X. (2013). "A Riemannian stopping criterion for unsupervised phonetic segmentation", in Proc. ICMLA 2013, Florida, USA. pdf

Gracia, C., Anguera, X. & Binefa, X. (2013)."Two-level clustering towards unsupervised discovery of acoustic classes", in Proc. ICMLA 2013, Florida, USA. pdf

Audio Fingerprinting

Tsai, TJ., Friedland, G. & Anguera, X. (2015). "An Information-Theoretic Metric of Fingerprint Effectiveness", in proc. ICASSP 2015, Brisbane, Australia.

Ondel, L, Anguera, X. & Luque, J. (2015). "MASK+:Data-driven Regions Selection for Acoustic Fingerprinting", in proc. ICASSP 2015, Brisbane, Australia.

Anguera, X., Garzon, A. & Adamek, T. (2012). "MASK: Robust Local Features for Audio Fingerprinting ", in Proc. ICME 2012, Melbourne, Australia. (BEST PAPER AWARD ICME 2012)pdf

Dinamic Time Warping and Applications

Ferrarons, M., Anguera, X. & Luque, J. (2014). "Flexible Stand-alone Keyword Recognition Application using Dynamic Time Warping", book chapter in Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science, Volume 8854, 2014, pp. 158-167. Presented at Iberspeech 2014, Las Palmas, Spain. pdf

Anguera, X., Luque, J. & Gracia, C. (2014). "Audio-to-text Alignment for speech recognition with very limited resources", in proc. Interspeech 2014, Singapore. pdf

Gracia, C., Anguera, X., Luque, J. & Artzi, I. (2014). "Phoneme-Lattice to Phoneme-Sequence matching algorithm based on Dynamic Programming", book chapter in Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science, Volume 8854, 2014, pp. 99-108. Presented at Iberspeech 2014, Las Palmas, Spain. pdf

Zero-Resource Speech Processing

Dunbar, E., Cao, XN., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., Anguera, X. & Dupoux, E (2017). "The zero resource speech challenge 2017”, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Szoke, I. & Anguera, X (2016). "Zero-Cost Speech Recognition Task at Mediaeval 2016", in proc. Mediaeval 2016 Benchmark evaluation, Amsterdam, Nederlands.

Versteegh, M., Thiolliere, R., Schatz, T., Cao, XN., Anguera, X., Jansen, A. and Dupoux, E. (2015). "The Zero Resource Speech Challenge 2015", Interspeech 2015, Dresden, Germany

Query-by-Example Voice Search

Anguera, X., Rodriguez-Fuentes, L-J., Buzo, A., Metze, F., Szöke, I. & Peñagarikano, M. (2015). "QUESST2014: Evaluating Query-by-example Speech Search in a Zero-Resource Setting with Real-life Queries", in proc. ICASSP 2015, Brisbane, Australia.

Szoke, I., Rodriguez-Fuentes, L-J., Buzo, A., Anguera, X., Metze, F., Proença, J., Pleva, M. & Xiong, X. (2015). "Query by Example Search on Speech at Mediaeval 2015", in proc. Mediaeval 2015 Benchmark evaluation, Wurzen, Germany

Metze, F., Anguera, X., Barnard, E., Davel, M. & Gravier, G. (2014). "Language independent search in MediaEval's Spoken Web Search task", Elsevier Journal on Computer, Speech and language, January 2014. pdf

Gracia, C., Anguera, X. & Binefa, X. (2014). "Combining Temporal and Spectral Information for Query-By-Example Spoken Term Detection", in proc. Eusipco 2014, Lisboa, Portugal. pdf

Anguera, X., Rodriguez-Fuentes, L.-J., Szöke, I., Buzo, A., Metze, F. & Penagarikano, M. (2014). "Query-by-Example Spoken Term Detection Evaluation o Low-Resource Languages", in Proc. SLTU 2014, Saint Petersburg, Russia. pdf

Anguera, X., Rodriguez-Fuentes, L.-J., Szoke, I., Buzo, A., Metze, F. & Penagarikano, M. (2014). "Query-by-Example Spoken Term Detection on Multilingual Unconstrained Speech", in proc. Interspeech 2014, Singapore. pdf

Anguera, X., Rodriguez-Fuentes, L-J., Szöke, I., Buso, A. and Metze, F. (2014). "Query by Example Search on Speech at Mediaeval 2014", in proc. Mediaeval 2014. pdf

Mantena, G. & Anguera, X. (2013). "Speed Improvements to Information Retrieval-Based Dynamic Time Warping Using Hierarchical K-means Clustering ", in Proc. ICASSP 2013, Vancouver, Canada. pdf

Metze, F., Anguera, X., Barnard, E., Davel, M. & Gravier, G. (2013). "The Spoken Web Search Task at Mediaeval 2012 ", in Proc. ICASSP 2013, Vancouver, Canada. pdf

Anguera, X. & Ferrarons, M. (2013). "Memory Efficient Subsequence DTW for Query-by-Example Spoken Term Detection ", in Proc. ICME 2013, San Jose, CA, USA. pdf

Anguera, X. (2013). "Information Retrieval-based Dynamic Time Warping ", in Proc. Interspeech 2013, Lyon, France. pdf

Tejedor, J., Toledano, D.T., Anguera, X., Varona, A., Hurtado, L.F., Miguel, A. & Colás, J. (2013). "Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion", EURASIP Journal on Audio, Speech, and Music Processing 2013, 2013:23, September 2013. pdf

Anguera, X., Skázel, M., Vorwerk, V. & Luque, J. (2013)."The Telefonica Research Spoken Web Search System for Mediaeval 2013", in Proc. Mediaeval 2013 evaluation Workshop, Barcelona, Spain. pdf

Anguera, X., Metze, F., Buso, A., Szöke, I. & Rodriguez-Fuentes, L.-J. (2013)."The Spoken Web Search Task", in Proc. Mediaeval 2013 evaluation Workshop, Barcelona, Spain. pdf

Metze, F., Rajput, N., Anguera, X., Davel, M., Gravier, G., van Heerden, C., Mantena, G., Muscariello, A., Prahallad, K., Szoke, I., & Tejedor, J. (2012). "The Spoken Web search task at Mediaeval 2011 ", in Proc. ICASSP 2012, Kyoto, Japan. pdf

Anguera, X. (2012). "Speaker Independent Discriminant Feature Extraction for Acoustic Pattern-Matching ", in Proc. ICASSP 2012, Kyoto, Japan. pdf

Anguera, X. (2012). "Telefonica Research system for the Query-by-example task at Albayzin 2012 ", in Proc. Iberspeech 2012, Madrid, Spain. pdf

Anguera, X. (2012). "Telefonica Research System for the Spoken Web Search task at Mediaeval 2012 ", in Proc. Mediaeval 2012 evaluation Workshop, Pisa, Italy. pdf

Metze, F., Barnard, E., Davel, M., van Heerden, C., Anguera, X., Gravier, G. & Rajput, N. (2012). "The Spoken Web Search Task ", in Proc. Mediaeval 2012 evaluation Workshop, Pisa, Italy. pdf

Anguera, X. (2012). "Telefonica System for the Spoken Web Search Task at Mediaeval 2011 ", MediaEval Workshop, November 2011, Pisa, Italy. pdf

Anguera, X., Macrae, R. & Oliver, N. (2010). "Partial Sequence Matching Using an Unbounded Dynamic Time Warping Algorithm", in Proc. ICASSP 2010 pdf

Sports analytics

Duxans, H., Anguera, X. & Conejero, D. (2009). "Audio-Based Soccer Game Summarization", in Proc. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB09) pdf

Gyarmati, L. & Anguera, X. (2015). "Automatic Extraction of the Passing Strategies of Soccer Teams", in Proc. 2015 KDD Workshop on Large-Scale Sports Analytics, Sidney, Australia

Voice Biometrics

Bonastre, J.-F., Anguera, X., Bousquet, P.-M. & Matrouf, D. (2012). "Discriminant Binary Data Representation for Speaker Recognition", in Proc. ICASSP 2011, Prague, Check Republic. pdf

Bonastre, J.-F., Anguera, X., Sierra, G.H. & Bousquet, P.-M. (2012). "Speaker modeling using local binary decisions ", in Proc. Interspeech 2011. pdf

Anguera, X. & Bonastre, J.-F. (2010). "Novel binary key representation for biometric speaker recognition", , in Proc. Interspeech 2010, Makuhari, Japan.pdf

Anguera, X. (2009). "MiniVectors: an Improved GMM-SVM Approach for Speaker Verification", in Proc. Interspeech 2009 pdf

Anguera, X., Obrador, P. & Oliver, N. (2009). "Multimodal video copy detection of social media", in Proc. first SIGMM Workshop on Social Media (WSM2009) at ACM MM09 pdf

Content-based Video-Copy Detection (CV-VCD)

Anguera, X. & Adamek, T. (2012). "Multimodal Video Copy Detection using local features ", in IEEE COMSOC MMTC E-Letter. pdf

Anguera, X., Adamek, T., Xu, D. and Barrios, J.M. (2012). "Telefonica Research at TRECVID 2011 Content-Based Copy Detection ", NIST-TRECVID workshop 2011. pdf

Barrios, J.M., Bustos, B. and Anguera, X. (2012). "Combining Features at Search Time: PRISMA at Video Copy Detection Task ", NIST-TRECVID workshop 2011. pdf

Anguera, X., Barrios, J.M., Adamek, T. & Oliver, N. (2012). "Multimodal fusion for video copy detection ", in Proc. ACM Multimedia 2011. pdf

Younessian, E., Anguera, X., Adamek, T., Oliver, N. & Marimon, D. (2010). "Telefonica Research at TRECVID 2010 Content-Based Copy Detection", NIST Trecvid Workshop notebook paper.pdf

Anguera, X., Obrador, P., Adamek, T., Marimon, D. & Oliver, N. (2009). "Telefonica Research Content-Based Copy Detection TRECVID Submission", NIST Trecvid 2009 Workshop notebook paper pdf

Multimedia & Mobile Computing

Macrae, R., Neumann, J., Anguera, X., Oliver, N. & Dixon, S. (2012). "Real-Time synchronization of multimedia streams in a mobile device", in Proc. ADMIRE Workshop within ICME 2011, Barcelona, Spain. pdf

Anguera, X., Pérez, N., Urruela, A. & Oliver, N. (2012). "Automatic Synchronization of Electronic and Audio Books via TTS Alignment and Silence Filtering", in Proc. Hot Topics in Multimedia within ICME 2011, Barcelona, Spain. pdf

Flamary, R., Anguera, X. & Oliver, N. (2012). "Spoken Wordcloud: clustering recurrent patterns in speech", in Proc. CBMI 2011, Madrid, Spain. pdf

Macrae, R., Anguera, X. & Oliver, N. (2010). "MuViSync: Realtime Music Video Alignment", in Proc. ICME 2010 pdf

Wang, J., Anguera, X., Chen, X. & Yang, D. (2010). "Enriching Music Mood Annotation by Semantic Association Reasoning", in Proc. AdMiRe Workshop, in ICME 2010 pdf

Anguera, X., Cherubini, M. & Oliver, N. (2010). "Unrestricted Voice Annotations and Search of Personal Photographs in a Mobile Phone", in Proc. Of Spoken Query 2010 Workshop on voice search, in ICASSP 2010 pdf

Cherubini, M., Anguera, X., Oliver, N. & de Oliveira, R. (2009). "Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones", in Proc. MobileHCI, Bonn, Germany, September 2009, (best paper award nominee) pdf

Duxans, H., Conejero, D. & Anguera, X. (2009). "Audio-Based Automatic Management of Audio Commercials", in Proc. ICASSP 2009, Taipei, Taiwan. April 2009 pdf

Obrador, P., Anguera, X., de Oliveira, R. & Oliver, N. (2009). "The role of tags and image aesthetics in social image search", in Proc. first SIGMM Workshop on Social Media (WSM2009) at ACM MM09 pdf

Conejero, D. & Anguera, X. (2008). "TV advertisements detection and clustering based on acoustic information", in proc. International Conference on Computational Intelligence for Modelling, Control and Automation - CIMCA08, Viena, Austria, December 2008 pdf

Anguera, X. & Oliver, N. (2008). "MAMI: Multimodal Annotations on a Camera Phone", in Proc. MobileHCI, Amsterdam, September 2008 pdf

Urdapilleta, U., Conejero, D., Anguera, X., Cacenabes, D. & Caminero, F.J. (2008). "Sistema de Indexación Automática de Contenidos Multimedia", in Proc. XVIII Jornadas Telecom I+D, Bilbao, Spain pdf

Anguera, X., Oliver, N. & Cherubini, M. (2008). "Multimodal and Mobile Personal Image Retrieval: A User Study", in Proc. Workshop on Mobile Information Retrieval, MOBIR'08, Singapore pdf

Anguera, X., J.Xu & Oliver, N. (2008). "Multimodal Photo Annotation and Retrieval on a Mobile Phone", in Proc. ACM Intl. Conference on Multimedia Information Retrieval, Vancouver, Canada. 2008 pdf

Hernando, D., Hernando, J. & Anguera, X. (2005). "PETRA: Advanced Oral Interfaces for Unified Messaging Applications", Buran magazine, IEEE Barcelona student branch. Number 22, September 2005.

Speaker Diarization - Multiple channels

Pardo, J.M., Anguera, X. & Wooters, C. (2007)."Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information", IEEE Transactions on Computers, September 2007, volume 56, number 9, pp. 1189-1224. pdf

Anguera, X., Wooters, C. & Hernando, J. (2007). "Acoustic beamforming for speaker diarization of meetings", IEEE Transactions on Audio, Speech and Language Processing, September 2007, volume 15, number 7, pp.2011-2023. pdf

Anguera, X., Wooters, C., Pardo, J.M. & Hernando, J. (2007)."Automatic Weighting for the Combination of TDOA and Acoustic Features in Speaker Diarization for Meetings", ICASSP, Hawaii, USA, April 2007. pdf

Luque, J., Anguera, X., Temko, A., & Hernando, J. (2007). "Speaker Diarization for Conference Room: The UPC RT07s Evaluation System", RT07s Rich Transcription evaluation workshop, Washington, May 2007 pdf

Gallardo, A., Anguera, X. & Wooters, C. (2006). "Multi-Stream Speaker Diarization Systems for the Meetings Domain", Interspeech-ICSLP, Pittsburgh, Pensilvania, USA, September 2006. pdf

Pardo, J.M., Anguera, X. & Wooters, C. (2006). "Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features And Inter-Channel Time Differences", Interspeech-ICSLP, Pittsburgh, Pensilvania, USA, September 2006. pdf

Pardo, J.M., Anguera, X. & Wooters, C. (2006). "Speaker Diarization for Multi-Microphone Meetings Using only Between-Channel Differences", In S. Renals and S. Bengio, editors, Machine Learning for Multimodal Interaction: Third InternationalWorkshop (MLMI 2006), Lecture Notes in Computer Science. Springer pdf

Anguera, X., Wooters, C. & Pardo, J.M. (2006). "Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system", Interspeech-ICSLP, Pittsburgh, Pensilvania, USA, September 2006. pdf

Anguera, X., Wooters, C. & Hernando, J. (2005). "Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion", Automatic Speech Recognition and Understanding (ASRU). Puerto Rico, November 2005. pdf

Anguera, X., Wooters, C., Peskin, B. & Aguilo, M.. (2005). "Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System", Machine Learning for Multimodal Interaction: Second International Workshop (MLMI 2005), Lecture Notes in Computer Science. Springer pdf

Speaker Diarization and clustering - Core algorithms

Patino, J., Delgado, H., Evans, N. & Anguera, X. (2016). "EURECOM submission to the Albayzin 2016 Speaker Diarization Evaluation", IberSPEECH 2016

Delgado, H., Anguera, X., Freouille, C. & Serrano, J. (2015). "Improved Binary Key Speaker Diarization System", in proc. EUSIPCO 2015, Nice, France

Delgado, H., Anguera, X., Fredouille, C. & Serrano, J., (2015). "Novel Clustering Selection Criterion for Fast Binary Key Speaker Diarization", in proc. Interspeech 2015, Dresde, Germany

Delgado, H., Anguera, X., Fredouille, C. & Serrano, J. (2014). "Global Speaker Clustering towards Optimal Stopping Criterion in Binary Key Speaker Diarization", book chapter in Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science, Volume 8854, 2014, pp. 59-68. Presented at Iberspeech 2014, Las Palmas, Spain. pdf

Friedland, G., Janin, A., Imseng, D., Anguera, X., Gottlieb, L., Huijbregts, M., Knox, M.T. & Vinyals, O. (2012). "The ICSI RT-09 Speaker Diarization System", Transactions on Audio, Speech and Language Processing (TASLP), special issue on New Frontiers in Rich Transcription, July 2011. pdf

Stafylakis, T., Anguera, X., Katsouros, V., Carayannis, G. (2012). "Closed-Form Expressions vs. BIC: a Comparison for Speaker Clustering", in Proc. ICASSP 2011, Prague, Check Republic. pdf

Anguera, X. & Bonastre, J.-F. (2012). "Fast Speaker Diarization Based on Binary Keys", in Proc. ICASSP 2011, Prague, Check Republic. pdf

Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., & Vinyals, O. (2012). "Speaker Diarization: a review of recent research", Transactions on Audio, Speech and Language Processing (TASLP), special issue on New Frontiers in Rich Transcription. pdf

Bozonet, S., Evans, N., Anguera, X., Vinyals, O., Friedland, G. & Fredouille, C. (2010). "System output combination for improved speaker diarization", in Proc. Interspeech 2010, Makuhari, Japan.
Stafylakis, T. & Anguera, X. (2010). "Improvements to the equal-parameter BIC for Speaker Diarization", in Proc. Interspeech 2010, Makuhari, Japan.pdf

Anguera, X., Shinozaki, T., Wooters, C. & Hernando, J. (2007). "Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization", ICASSP, Hawaii, USA, April 2007. pdf

Anguera, X., Wooters, C. & Hernando, J. (2006). "Purity Algorithms for Speaker Diarization of Meetings Data", ICASSP 2006, Toulouse, France, May 2006. pdf

Anguera, X., Wooters, C. & Hernando, J. (2006). "Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization", In S. Renals and S. Bengio, editors, Machine Learning for Multimodal Interaction: Third International Workshop (MLMI 2006), Lecture Notes in Computer Science. Springer pdf

Anguera, X., Wooters, C. & Pardo, J.M. (2006). "Robust Speaker Diarization for Meetings: ICSI RT06s Meetings Evaluation System", In S. Renals and S. Bengio, editors, Machine Learning for Multimodal Interaction: Third International Workshop (MLMI 2006), Lecture Notes in Computer Science. Springer pdf

Anguera, X., Wooters, C. & Hernando, J. (2006). "Frame Purification for Cluster Comparison in Speaker Diarization", MMUA 2006, Toulouse, France, May 2006. pdf

Anguera, X., Aguilo, M., Wooters, C., Nadeu, C. & Hernando, J. (2006). "Hybrid Speech/Non-Speech Detector Applied to Speaker Diarization of Meetings", Speaker Odyssey 2006, San Juan de Puerto Rico, USA, June 2006. pdf

Anguera, X., Wooters, C. & Hernando, J. (2006). "Friends and Enemies: A Novel Initialization for Speaker Diarization", Interspeech-ICSLP, Pittsburgh, Pensilvania, USA, September 2006. pdf

Anguera, X. (2005). "XBIC: Real-Time Cross Probabilities Measure for Speaker Segmentation", International Computer Science Institute Technical Report TR-05-008. pdf

Wooters, C., Fung, J., Peskin, B. & Anguera, X. (2004). "Towards Robust Speaker Segmentation: The ICSI-SRI Fall 2004 Diarization System", EARS Program RT-04 Workshop, nov 7-10 2004. pdf

Anguera, X. & Hernando, J. (2004)."Evolutive Speaker Segmentation using a Repository System", Interspeech-ICSLP, Korea 2004. pdf

Anguera, X., Farrús, M., Hernando, J. & Abad, A. (2004). "Segmentació de locutor per a la indexació automàtica de bases de dades multimèdia en català", II Congrés d'enginyeria en llengua catalana, Andorra 2004. pdf

Anguera, X., Hernando, J. & Anguita, J.. (2004)."XBIC: Nueva Medida para Segmentación de Locutor hacia el Indexado Automático de la Señal de Voz", III Jornadas en Tecnología del Habla, Valencia, 17-10 Nov 2004.pdf

Speech Recognition

Stolcke, A., Anguera, X., Boakye, K., Çetin, O., Janin, A., Magimai-Doss, M., Wooters, C. & Zheng, J. (2007). "The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System", RT07s Rich Transcription evaluation workshop, Washington, May 2007 pdf

Janin, A., Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Frankel, J. & Zheng, J. (2006). "The ICSI-SRI Spring 2006 Meeting Recognition System", Machine Learning for Multimodal Interaction: Third International Workshop (MLMI 2006), Lecture Notes in Computer Science. Springer pdf

Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C. & Zheng, J. (2005)."Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System", Machine Learning for Multimodal Interaction: Second International Workshop (MLMI 2005), Lecture Notes in Computer Science. Springer pdf

Farrús, M, Anguita, J., Anguera, X., Crego, J.M., de Gispert, A., Hernando, J. & Nadeu, C.. (2004). "Els sistemes de reconeixement de veu i traduccio automatica en catala: present i futur", II Congres d'enginyeria en llengua catalana, Andorra 2004. pdf

Miscelaneous

Llimona, Q., Luque, J., Anguera, X., Hidalgo, Z., Park, S. & Oliver, N. (2015). "Effect of gender and call duration on customer satisfaction in call center big data", in Proc. Interspeech 2015, Dresden, Germany.

Costa Pereira, J., Luque, J. & Anguera, X. (2014). "Sentiment Retrieval on Web Reviews Using Spontaneous Natural Speech", in Proc. ICASSP 2014, Florence, Italy. pdf

Harsha Yella, S., Anguera, X. & Luque, J. (2014). "Inferring Social Relationships in a Phone Call from a Single Party's Speech", in Proc. ICASSP 2014, Florence, Italy. pdf

Luque, J. & Anguera, X. (2014). "On the Modeling of Natural Vocal Emotion Expressions Through Binary Key", in proc. Eusipco 2014, Lisboa, Portugal. pdf

Gonzalez, S. & Anguera, X. (2013). "Perceptually Inspired Features for Speaker Likability Classification ", in Proc. ICASSP 2013, Vancouver, Canada. pdf

Larson, M., Said, A., Shi, Y., Cremonesi, P., Tikk, D., Karatzoglou, A., Baltrunas, L., Geurts, J., Anguera, X. & Hopfgartner, F. (2013). "Activating the Crowd: Exploiting User-Item Reciprocity for Recommendation", in Proc. CrowdRec: Crowdsourcing and Human Computation for Recommender Systems Workshop, ACM RecSys 2013. pdf

Anguera, X., Movellan, E. & Ferrarons, M. (2012). "Emotions recognition using binary fingerprints ", in Proc. Iberspeech 2012, Madrid, Spain. pdf

Ph.D. Thesis

In 2006 I defended my Ph.D. Thesis titled "Robust Speaker Diarization for Meetings".
The research for the thesis was done in between UPC Barcelona (under supervision of Prof. Javier Hernando) and ICSI Berkeley (under supervision of Dr. Chuck Wooters).
I arrived at ICSI right when the Speaker Diarization and Speech Recognition communities started to shift focus from analyzing single-channel Broadcast News recordings to multi-channel meeting recordings.
My first important contribution was to propose adding a signal preprocessing step to any speech analysis to obtain a single (enhanced) speech recording, obtained via the weighted combination of all available channels, with an acoustic beamforming algorithm. From this work I later released the open source tool BeamformIt software which is still currently considered a baseline in this and related areas.
In addition, I worked on many algorithmic improvements to the Agglomerative Speaker Diarization system we used at ICSI, resulting in our system being the top performer during the NIST Speaker Diarization Campaigns of 2004 and 2005, when I led the ICSI submissions for Diarizartion.

You can browse the document online (see link above, there might be some pdf->html conversion errors) or download the pdf file.