Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa

Ferreira Neto, José Carlos

Use este identificador para citar ou linkar para este item: http://repositorio.ufla.br/jspui/handle/1/58857

Título:	Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa
Título(s) alternativo(s):	Development of language models for aspect extraction in portuguese
Autores:	Ferreira, Danton Diego Ferreira, Danton Diego Barbosa, Bruno Henrique Groenner Pereira, Denilson Alves Cardoso, Paula Christina Figueira Vitor, Giovani Bernardes
Palavras-chave:	Processamento de linguagem natural Extração de aspectos BERT Modelos de linguagem Natural language processing Aspect extraction Bidirectional Encoder Representations from Transformers Language models
Data do documento:	29-Jan-2024
Editor:	Universidade Federal de Lavras
Citação:	FERREIRA NETO, J. C. Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa. 2023. 93 p. Dissertação (Mestrado em Engenharia de Sistemas e Automação)–Universidade Federal de Lavras, Lavras, 2023.
Resumo:	The identification and extraction of aspects are essential in text analysis for discerning opinions and emotions. However, there is a gap in applying these techniques to Portuguese. This work aims to adapt approaches originally developed for English to this language in the TV and ReLi datasets. The goal of this work is to evaluate the application of language models for aspect extraction in Portuguese in the context of TV device reviews and literary reviews in the TV and ReLi datasets. To achieve this goal, models based on the BERT architecture were employed, both in the pre-trained form for general domains (BERTimbau) and for specific domains (BERTtv and BERTreli). Additionally, a double embedding technique was implemented, combining general and specific domain models. Large Language Models (LLMs) were also evaluated, including variants of GPT-3 via the OpenAI API and a variant of LLaMa, Cabrita, which is trained for the Portuguese language. To optimize hardware resource demand, efficient fine-tuning techniques such as LoRA (Low-Rank Adaptation) for BERTimbau and QLoRa (Quantized Low-Rank Adaptation) for Cabrita were applied. The results showed that the BERTimbau model adjusted with LoRA was superior in both datasets, achieving F1 scores of 0.846 for the TV dataset and 0.615 for ReLi. In contrast, the Cabrita model showed inferior performance, with less favorable results for both datasets, 0.68 for TV and 0.46 for ReLi. This study, therefore, offers a valuable contribution to research in aspect extraction in Portuguese, demonstrating the feasibility and effectiveness of adapting and optimizing techniques and models originally developed for other languages.
URI:	http://repositorio.ufla.br/jspui/handle/1/58857
Aparece nas coleções:	Engenharia de Sistemas e automação (Dissertações)

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
DISSERTAÇÃO_Desenvolvimento de modelos de linguagem para extração de aspectos em língua portuguesa.pdf		1,1 MB	Adobe PDF	Visualizar/Abrir

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons