02/07/2018

PLN

PLN – Módulo I: Processamento de Texto nos dias de hoje: Tarefas, Métodos e Recursos

Gustavo Henrique Paetzold

Responsável: Sandra Aluisio

Resumo: Em tarefas de Processamento de Texto, objetiva-se explorar, interpretar, manipular ou até gerar segmentos de texto. A forma como estas tarefas são endereçadas evoluiu muito recentemente, e os principais responsáveis por isso são os impressionantes avanços científicos nas áreas de Linguística Computacional e Aprendizado de Máquina alcançados entre 2012 e 2018 os quais resultaram em uma grande gama de novos métodos e recursos que permitiram estes avanços, como as Redes Neurais Convolucionais e Recorrentes, e Modelos de Vetores de Palavras. Neste curso serão apresentados tais métodos e recursos de forma simplificada, de forma a qualificar os participantes a empregá-los em tarefas de Processamento de Texto pertinentes aos seus projetos.

Currículo resumido: Pós-Doutor na área Processamento de Linguagem Natural na Universidade de Sheffield, é doutor em Processamento de Linguagem Natural pela Universidade de Sheffield e graduado do Curso de Ciência da Computação da Universidade Estadual do Oeste do Paraná. É autor de dezenas de contribuições acadêmicas nos tópicos de Simplificação de Texto, Estimativa de Qualidade, Compiladores e Robótica. Atualmente atua como Professor Adjunto no curso de Engenharia de Software da Universidade Tecnológica Federal do Paraná, onde realiza pesquisa nas áreas de Linguística Computacional e Aprendizado de Máquina.

Website pessoal: https://gustavopaetzold.wordpress.com

PLN – Módulo II: Speech Processing: Feature Selection and Extraction for Machine Learning

Abstract: Speech processing has a long tradition of feature engineering and even in deep architectures, well selected and annotated features contribute to high quality results. Recently, the demand for large, annotated data greatly exceeds the available supply in an industry with a wide range of tasks for the most basic form of human communication. Because of the complexity of the tasks associated with their applications, semi-automatic approaches are often desirable. In this course, we will explore some use cases like speech and sentence boundary recognition, as well as hands-on examples of linguistic feature extraction like prosodic and phonemic features using both traditional hidden markov models models and deep convolutional neural networks. The advantages and disadvantages of these technologies will also be explored so that researchers will be better equipped to work with data of this nature in real-world applications.

Short Bio: PhD candidate in Computer Sciences from the University of São Paulo in the area of speech processing and a graduate of the Ohio State University in computational linguistics. His background includes the areas of laboratory phonetics and acoustics, intelligent computer assisted language learning, natural language and speech processing, machine learning and personal assistants. Currently, he is designated as the overseas research center representative in the artificial intelligence and the technical leader/project coordinator for the Bixby Speech Solutions team at the Samsung Research Institute (SIDI) in Campinas. Since 2007, Christopher has held positions as a researcher at CPqD (R and D Center in Telecommunications), Vocalize Speech Solutions and the Ohio State University. His research focuses on robust machine learning architectures for natural language and speech processing which generalize well, even with little data and adverse conditions, where good feature selection and modern deep architectures can mutually benefit from one another.