Transcription factor classification for vertebrates and other taxonomic groups in the JASPAR database

Supervision team information: Computational Biology & Gene Regulation group, Norwegian Centre for Molecular Biosciences and Medicine (NCMBM), UiO
Supervisor: Anthony Mathelier IBV
Supervisor: Pierre Chymkovitch
Co-supervisor: Roza Berhanu Lemma e-mail address: anthony.mathelier@ncmbm.uio.no

Introduction

Transcription factors (TFs) play a crucial role in regulating gene expression to maintain proper development and homeostasis [1]. Despite the vast number of identified TFs, several vertebrate TFs currently lack structural classification in the widely utilized TFclass database [2]. Further, we lack a systematic effort to determine the TF structural classification of other taxonomic groups highlighted in the JASPAR database. JASPAR is a widely used and regularly maintained open-access database that stores manually curated, non-redundant TF binding profiles for various species in six taxonomic groups. The lack of comprehensive TF classification poses a significant gap in our understanding of TF diversity and function, as well as impacts on downstream applications in genomics and molecular biology. This Master’s project aims to address this gap by developing a systematic classification framework for TFs that are not yet represented in TFclass, thereby expanding our knowledge in both vertebrates and other taxa [3].

Aim

The aims of this project are:

  • Determine TF classification for vertebrate TFs for which we do not have classification information stored in TFclass.
  • Generate TF classification for taxa other than vertebrates.

Methods

The proposed study will utilize computational methodologies described in [2,4–7] in conjunction with the TFclass database. These include multiple sequence alignment (MSA), structural comparison of DNA-binding domains (DBDs), and phylogenetic analysis. Previous studies successfully employed these methodologies to determine TF classifications (see Wingender et al. (2018, 2015, 2013)), which provide a robust foundation for classifying vertebrate TFs based on their genetic and structural characteristics. Similarly, a recent effort identified and assigned TF structural classifications to plant TFs, leading to the identification of eight new classes and 37 new families corresponding to DBD structures in plants but not in mammals [8], further highlighting the importance of determining TF structural classification in other taxonomic groups. Specifically, the Master student will examine the criteria established for classifying eukaryotic TFs, as outlined in Wingender's early works (1997), and adapt them to encompass other taxonomic groups for which we have TF binding profiles stored in our JASPAR database. To achieve this, the Master student will explore and apply state-of-the-art computational methods to build a framework to perform:

  • MSAs (both sequence-based and structure-based MSA tools),
  • TF protein structure incorporation (experimentally solved or modeled structures), and
  • phylogeny calculations
  • possibly alphafold and LLMs for structural annotations

In addition, the student(s) will have an opportunity to apply a recent pipeline to perform structure-informed MSA [9]

Learning Outcomes

The results from this project will contribute to parts of the task in our ongoing efforts to curate and store high-quality, non-redundant TF DNA binding profiles in the open-access JASPAR database [3]. In addition, it contributes to a more comprehensive understanding of TF diversity, paving the way for future studies on TF functionality and evolutionary biology, as well as comparative studies to enhance insights into gene regulation across species.

The project will expose the students to the following knowledge and skills:

  • Transcription factors and transcriptional gene regulation
  • Protein sequence and structure comparisons
  • Phylogentics
  • Visualization of MSAs, protein structures, and phylogenetic trees
  • Various protein sequences, structures, and TF databases
  • Programming skills in bash, R, and Python
  • In addition, if the student(s) is/are interested, workflow management tools (e.g., Snakemake

In conclusion, this Master's project will contribute to the field by filling the gap in TF structural classification, particularly in taxa other than vertebrates.

Host Environment

The selected candidate(s) will be part of the Computational Biology and Gene Regulation group, Norwegian Centre for Molecular Biosciences and Medicine Norway (NCMBM), UiO, led by Prof. Anthony Mathelier. The group combines dry and wet laboratory studies to investigate gene expression regulation and its disruption in diseases such as cancer. In addition, the group contributes to the international consortium for the JASPAR database. Principal supervisors are Prof. Anthony Mathelier and Dr. Roza Berhanu Lemma, with close day-to-day supervision provided by Roza.

References

1. He H, Yang M, Li S, Zhang G, Ding Z, Zhang L, et al. Mechanisms and biotechnological applications of transcription factors. Synth. Syst. Biotechnol. 2023;8:565–77.

2. Wingender E, Schoeps T, Haubrock M, Krull M, D?nitz J. TFClass: expanding the classification of human transcription factors to their mammalian orthologs. Nucleic Acids Res. 2018;46:D343–7.

3. Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon JA, Ferenc K, Kumar V, et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2024;52:D174–82.

4. Wingender E, Schoeps T, Haubrock M, D?nitz J. TFClass: a classification of human transcription factors and their rodent orthologs. Nucleic Acids Res. 2015;43:D97–102.

5. Wingender E, Schoeps T, D?nitz J. TFClass: an expandable hierarchical classification of human transcription factors. Nucleic Acids Res. 2013;41:D165–70.

6. Wingender E. Criteria for an updated classification of human transcription factor DNA-binding domains. J. Bioinform. Comput. Biol. 2013;11:1340007.

7. Wingender E. Classification of eukaryotic transcription factors. Mol. Biol. (Mosk.) 1997;31:584–600.

8. Blanc-Mathieu R, Dumas R, Turchi L, Lucas J, Parcy F. Plant-TFClass: a structural classification for plant transcription factors. Trends Plant Sci. 2024;29:40–51.

9. Crauwels C, Heidig SL, Díaz A, Vranken WF. Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper. Bioinformatics [Internet] 2024;40. Available from: https://doi.org/10.1093/bioinformatics/btae276

 

 

Publisert 3. sep. 2025 10:22 - Sist endret 3. sep. 2025 10:22

Veileder(e)

Omfang (studiepoeng)

60