Contact person: Andrey Kutuzov
Keywords: Large Language Models, Machine learning, Natural Language Processing, Computational Linguistics
Research group: Integreat
Department of Informatics
During the training, modern large language models learn both the language structure and the knowledge about the world as one indivisible whole. Decoupling these two components is a challenging, but extremely promising field of research. Having more control over what is stored in the model weights should allow to optimize the model better. In particular, it might be possible not to learn world knowledge every time from scratch, when training a model for a particular language.
The question is whether it is possible to develop an architecture where a neural network is limited to learning the linguistic structure, while the knowledge about the world is stored in and retrieved from an external knowledge graph. Will such a model focus on learning language-specific linguistic skills without spending its parameters on time-dependent and language-agnostic factual knowledge? Is it also true that such language models will be less prone to hallucinations and bias?
References:
- [1] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rockt?schel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474.
- [2] Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., Van Den Driessche, G. B., Lespiau, J.-B., Damoc, B., Clark, A., et al. (2022). Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
- [3] Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X. (2023). Unifying large language models and knowledge graphs: A roadmap.
Mentoring and internship will be offered by a relevant external partner.