=====gtf中gene_biotype和transcript_biotype的含义===== 可参考网站 https://www.gencodegenes.org/pages/biotypes.html 和 https://asia.ensembl.org/info/genome/genebuild/biotypes.html Biotype: A gene or transcript classification. IG gene: Immunoglobulin gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/. IG C gene: Constant chain immunoglobulin gene that undergoes somatic recombination before transcription IG D gene: Diversity chain immunoglobulin gene that undergoes somatic recombination before transcription IG J gene: Joining chain immunoglobulin gene that undergoes somatic recombination before transcription IG V gene: Variable chain immunoglobulin gene that undergoes somatic recombination before transcription Nonsense Mediated Decay: A transcript with a premature stop codon considered likely to be subjected to targeted degradation. Nonsense-Mediated Decay is predicted to be triggered where the in-frame termination codon is found more than 50bp upstream of the final splice junction. Processed transcript: Gene/transcript that doesn't contain an open reading frame (ORF). Long non-coding RNA (lncRNA): A non-coding gene/transcript >200bp in length 3' overlapping ncRNA: Transcripts where ditag and/or published experimental data strongly supports the existence of long (>200bp) non-coding transcripts that overlap the 3'UTR of a protein-coding locus on the same strand. Antisense: Transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand. Macro lncRNA: Unspliced lncRNAs that are several kb in size. Non coding: Transcripts which are known from the literature to not be protein coding. Retained intron: An alternatively spliced transcript believed to contain intronic sequence relative to other, coding, transcripts of the same gene. Sense intronic: A long non-coding transcript in introns of a coding gene that does not overlap any exons. Sense overlapping: A long non-coding transcript that contains a coding gene in its intron on the same strand. lincRNA (long intergenic ncRNA): Transcripts that are long intergenic non-coding RNA locus with a length >200bp. Requires lack of coding potential and may not be conserved between species. ncRNA: A non-coding gene. miRNA: A small RNA (~22bp) that silences the expression of target mRNA. miscRNA: Miscellaneous RNA. A non-coding RNA that cannot be classified. piRNA: An RNA that interacts with piwi proteins involved in genetic silencing. rRNA: The RNA component of a ribosome. siRNA: A small RNA (20-25bp) that silences the expression of target mRNA through the RNAi pathway. snRNA: Small RNA molecules that are found in the cell nucleus and are involved in the processing of pre messenger RNAs snoRNA: Small RNA molecules that are found in the cell nucleolus and are involved in the post- transcriptional modification of other RNAs. tRNA: A transfer RNA, which acts as an adaptor molecule for translation of mRNA. vaultRNA: Short non coding RNA genes that form part of the vault ribonucleoprotein complex. Protein coding: Gene/transcipt that contains an open reading frame (ORF). Protein coding CDS not defined: Alternatively spliced transcript of a protein coding gene for which we cannot define a CDS. Pseudogene: A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function. IG pseudogene: Inactivated immunoglobulin gene. Polymorphic pseudogene: Pseudogene owing to a SNP/indel but in other individuals/haplotypes/strains the gene is translated. Processed pseudogene: Pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome. Transcribed pseudogene: Pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression. These can be classified into 'Processed', 'Unprocessed' and 'Unitary'. Translated pseudogene: Pseudogenes that have mass spec data suggesting that they are also translated. These can be classified into 'Processed', 'Unprocessed' Unitary pseudogene: A species specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species. Unprocessed pseudogene: Pseudogene that can contain introns since produced by gene duplication. Readthrough: A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs). Stop codon readthrough: The coding sequence contains a stop codon that is translated (as supported by experimental evidence), and termination occurs instead at a canonical stop codon further downstream. It is currently unknown which codon is used to replace the translated stop codon, hence it is represented by 'X' in the protein sequence TEC (To be Experimentally Confirmed): Regions with EST clusters that have polyA features that could indicate the presence of protein coding genes. These require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies. TR gene: T cell receptor gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/. TR C gene: Constant chain T cell receptor gene that undergoes somatic recombination before transcription TR D gene: Diversity chain T cell receptor gene that undergoes somatic recombination before transcription TR J gene: Joining chain T cell receptor gene that undergoes somatic recombination before transcription TR V gene: Variable chain T cell receptor gene that undergoes somatic recombination before transcription Ensembl release 110 - July 2023 © EMBL-EBI