Introduction
LncRNA, or Long Non-coding RNA, are typically defined as RNA that is longer than 100 nt, but without apparent protein coding potential. It consists a majority part of genome transcripts. In decades, scientists tried to find the function of these non-protein coding RNA, but with little success, some may just consider them ‘transcription noise’. But in recent year, with the development of technology and further research in post-transcript modification, more and more evidences show that LncRNA might play a very important part in post-transcript modifications, Including DNA methylation, post-translational histone modifications, and DNA structure modification.
However, the definition of lncRNA is still controversial. In fact, up to 90% of the early reports of RNA-producing eukaryotic genomes did not provide much evidence that lncRNA was functional, indicating that a large number of RNAs generated outside the coding region originated from transcriptional noise or artifacts in sensitive detection methods The Thus, if there is at least a functional evidence that meets the “causal character” criteria, the RNA can only be classified as lncRNA. The definition of lncRNA also requires that it be independent of its coding potential. This is important because it is assumed that non-coding RNA may encode a polypeptide that messenger RNA may have a function independent of the coding protein. In addition, various lncRNAs generally do not have a common evolutionary origin, a biological function or a molecular mechanism. Therefore, the term “lncRNA” should be used with caution to avoid suggesting mechanical, functional or evolutionary protection.
There are many lncRNAs with various documented functions that are growing rapidly. It has been shown that lncRNA controls chromatin levels across the eukaryotic kingdom genome activity. For example, mammalian Xist RNA controls chromatin-mediated X chromosome inactivation while lncRNA HOTAIR recruits chromatin modifying enzymes and mediates mammalian specific target loci histone modifications. Recent study reported that LncRNA can also function as a modulator to the chromosome structure level, including chromosome looping and nucleosome positioning.
eRNA, or Enhancer RNA, is the LncRNA that transcribed from the known enhancer site. eRNA is mostly un-polyadenylated nascent RNA molecules with low copy number in nucleus. Studies has shown that many eRNA cells will plays an essential role in the regulation function of the enhancer that it is transcribed from. There are also enhancer-like RNA, or some will call them activating RNA(RNA-a), that act in an enhancer-like way, regulating nearby genes in cis. Typically, RNA-a is mature RNA molecule that is polyadenylated and spliced into different isoform like mRNA, and is reported accumulating in the site of its interaction in high copy number. Recently, evidences suggest that these rwo subclasses of LncRNA are all involved in the regulation of gene expression by modulating chromosome structure.
Chromosome looping refers to the change in the high-order structure of chromosome. Recent studies indicate that chromosome looping is essential in many gene regulation processes. For instance, the enhancer-promoter looping formed between promoters and distal enhancers is one important way to gain spatial proximal and activate promoter by distal enhancers. CTCF, Mediator Complex and Cohesin are reported to be key factors in the establishing and maintaining of chromosomal looping. Different kinds of LncRNA has been reported to interact with CTCF and Mediator, thus participate in the forming and stabilization of chromosome looping. Evidence also suggests that there could be structure changes between different chromosomes with the binding of LncRNA, forming spatial proximity among locus on different chromosome. But the mechanism of this multi-chromosome structure change is still not well-studied.
Nucleosomes are fundamental unit of chromatin whereby DNA is wrapped around histone octamers. Tight interaction with histone cores can strongly affect DNA accessibility. Moreover, nucleosome positioning is a critical factor in controlling gene expression, and it is determined by a combination of local DNA features and active remodeling. Recent studies suggest that LncRNA could modulate nucleosome positioning by recruitment of ATP-dependent remodeler and transcription-mediated nucleosome stabilization.
In this review, we will introduce the history and methods of LncRNA research, discuss how LncRNA can control genome activity by affecting chromosome structure, including DNA looping and nucleosome positioning, with examples across eukaryotic kingdom.
Genetic Discovery of LncRNA
At the late 20th century and the beginning of 21st century, with the development of ‘human genome project’, scientists are eager to find out how many genes are there in human genome, and is it possible to explain the complexity of different organisms by the sheer number of classic protein coding genes, and the splicing diversity. With the automated Sanger sequencing application in 1990’s, scientists are able to access the mapping of expressed sequence tags(ESTs) that demonstrate the fragments of genomic regions that were being actively transcribed. And thus lead to the study in the field of ‘transcriptome’. In 1996, scientists were able to find an intriguing new notion that many ‘genes’ were mapped in yet undefined regions of the human genome. But because of the limitation of short sequence reads of Sanger sequencing at that time, and an incomplete reference human genome to aligned ESTs, it was remained elusive what these new ‘genes’ may encode.
Tiling Microarrays—In addition to sequencing advances, new technologies were to apply to de novo identification of new genes, and to better understand the regulation of gene-expression. Tiling microarrays is one of the novel technology, allowing the ability to survey on the scale of 20,000 gene or genomic loci. In the same time, in 1999, the first complete human chromosome sequence —the sequence of human chromosome 22 was released. With not much novel protein-coding genes discovered, the combination of human genome and microarray technology identified a wide-spread of non-coding RNA across human genome. In that time, scientists believe that at least half of the transcripts from human genome would be non-coding. And some believe that this may just be transcript noise that will not have any function at all.
Therefore, one of the results of the Human Genome Project is the discovery of many new RNA genes, but not new protein genes. For example, the number of human miRNAs has increased rapidly from a few to nearly one thousand. In fact, further advances in RNA sequencing, cDNA cloning, and microarray technology over the next decade have led to efforts within the coalition to define all of the transcription genes in the human genome. The conclusion is that most genomes are transcribed. Although extensive transcription is observed throughout the genome, the identification of functional RNA molecules is equivalent to finding a needle in a haystack. In fact, this extensive concept of transcription-rich has become increasingly controversial.
Chromatin marks – the key clues to capture RNA genes come from chromatin, and all eukaryotic genes are present in the DNA protein complex. With the whole genome sequence, chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) resulted in a genomic map of the chromatin structure known as the ” epigenome “. The large-scale parallel sequencing and its modification of the histone-occupying DNA sites reveals many interesting genomic domains. (K4-K36 domain) Gene promoter and histone H3 lysine 36 Trimethylation (H3K36me3) Histone H3 Lysine 4 Trimethylation occupied by the polymerase II transcription genes. In a study, the entire mouse and human genome were measured by several cell type chromatin markers, showing approximately 5,000 K4-K36 domains representing lncRNA. These lncRNAs have discrete gene loci, which are located in previously unrecognized intergenic regions prior to the protein coding gene, and thus these RNAs are designated as large insert non-coding RNAs (lincRNAs). Further analysis of these loci revealed a highly conserved promoter region that recruited key transcription factors for binding and direct regulation. LincRNA shows sequence conservation throughout the evolution of intron or untransformed gene sequences, further demonstrating its function.
RNA-seq – The appearance of deep sequencing technology has led to an unprecedented sequential cDNA sequences and sequence performance known as RNA-Seq. These methods have been combined with the computational method allowing the resolution of the reconstructed single nucleotide transcript and its isoforms. These studies provide an unbiased method to identify non-coding transcripts across different cell types and tissues.
In addition to the full-length reconstruction algorithm, there also appeared some other applications to be include in RNA-Seq. For example, a method called “3-seq” targeting the polyadenylation tail of cDNA, using more affordable short reads to quantitatively measure the abundance of transcripts. In addition, the method can use variants to accurately map the 3 ‘end of the transcripts. Recently, metabolic markers have been used to measure nascent transcript mRNA, thus providing transcription of the polymerase and dynamic pause points of view. These and many other emerging technologies are providing a deeper insight into the dynamic transcriptome.
Recent studies have estimated that gene-specific identification of different categories of large RNA using RNA sequencing and abundance of transcription. For example, a recent study by comments from many sources and RNA sequencing were combined to determine the human genome gene among 8000 large non-coding RNA (lincRNA). The study revealed several lncRNA global properties, including that lncRNA tend to locate alongside regulators, identifying thousands of orthologous lincRNAs between human and mouse rich expression patterns specific to tissue as well as genetic qualities of lincRNA positioning hundreds of genes associated with the desert. By increasing the use of sequencing depth, the length and read some of the first stage in the lncRNA characterization allowed on a global scale.
By combining the above-described techniques, it is now possible to determine all the transcription trajectories (K4-K36 chromatin domains), as well as RNA products (RNA-SEQ) as the main structure of the precise map. The information that these combinatorial layers are synergistic with chromatin – modified gene loci to identify the stable transcription of the RNA sequenced and even allow a single low – abundance transcript that can be said to be known as transcriptional noise detection. The chromatin of the additional information currently indicates a given locus (H3K4me3) and a transcriptional unit (H3K36me3), thus ending in the mapped promoter promoter region of the 5’ and 3’ of the RNA transcript. Progressive additions through additional layers of information processing procedures (eg, protection, potential coding patterns and anatomical properties) are being carried out to identify lncRNA gene families. LncRNAs have been further defined according to the anatomical properties of their gene loci. For example, antisense lncRNAs, known as overlapping of protein-coding genes, are known as introns that lncRNAs encode genes in the intron of the protein gene, and the overlap of lncRNA protein coding genes is known as transcript encoding and lincRNAs that are genes The genome is completely in the space between the protein coding loci. , Although, it is likely that many of these lncRNAs will share similar function and mechanism.
Genetic Characterization of LncRNA
Excluding the potential of protein coding
The basic to determine LncRNA is whether it could translate into protein or not. But this could be a very difficult task. Many studies try to evaluate lncRNA coding potential by translating each lncRNA in all 3 ‘frames and performing homology queries (ie BLASTX) on large protein families and domain databases (ie, Swissprot and PFAM). These information analyzes are good predictors of protein coding ability, but may miss new evolutionary protein sequences or very small open reading frames (<50 amino acids). In order to solve the previous problem, codon substitution frequency (CSF) analysis has been used to determine whether the codon of the amino acid is preferentially conserved through evolution, indicating the preservation of the protein coding potential. CSF has been used in several studies as additional information layers for determining coding potential. However, even if these two methods are combined, it is still possible to miss a small open reading frame hidden in these long transcripts. Experimental methods such as ribosomal assays, identified by ribosome binding and scanning of the putative RNA, provide a further understanding of those RNAs that may encode small peptides. In addition, the method identifies the region occupied by the ribosome, thereby further honing the potential translation region, which can be used as an accurate predictor of information input such as CSF and BLASTX. Although some of the lncRNA may encode small peptides, we note that this does not preclude the potential dual nature of lncRNA that acts through RNA and its protein products. This has been demonstrated by many mRNAs containing regulatory non-coding RNA elements (p53, Sgrs, Oskar, VegT, etc.).
Through co-expression to infer the effect of lncRNAs: by Guilt-by-Association
Mapping to thousands of lncRNA loci, the next challenge is to determine what is lncRNAs. Assume that the first step is to use the lncRNA expression pattern to identify the specific cell type or biological process associated with each candidate lncRNA. Some of the first expression studies of lncRNA identified lncRNAs that were highly expressed in certain brain regions. In situ hybridization studies further confirmed that these expression patterns showed a fine expression pattern in the specific sub-structures of mouse brain. A similar study in this group found that many lncRNAs that are closely related to pluripotent transcription factors indicate that many lncRNAs may play a role in stem cell pluripotent transcription networks.
Recently, information methods known as ” Guilt-by-Association ” allow global understanding of lncRNA and protein-coding genes that are closely co-expressed and thus may be co-regulated. This method identifies gene coding genes and pathways that are significantly associated with a given lncRNA using gene expression analysis. Thus, based on the known function of the co-expressed protein-encoding gene, the hypothesis of the function of the candidate nucleotide and the potential regulator is produced. In addition, this analysis reveals the “family” of lncRNA, based on what they do and irrelevant. This method has been predicted by the different roles of lncRNA, from stem cell pluripotency to cancer. For example, many lncRNAs that are closely related to p53 are induced in a p53-dependent manner, much more than expected. These lncRNAs also enrich the p53 binding motif in their promoter. In addition, it was found that one of these lncRNAs known as lincRNA-p21 associated with p53 was directly regulated by p53 followed by the formation of a nuclear factor of lncRNA-RNP as a function of promoting p53-mediated global transcriptional repressor. Similarly, several lncRNAs predicted to correlate with adipogenesis and pluripotency are most often considered necessary to maintain these cell states.
Other expression-related analyzes reveal additional functional effects of lncRNAs. For example, recent studies have analyzed lncRNA across more than 130 breast cancers, which include different grades of tumor and clinical information. This study identified many lncRNAs that were specifically up-regulated or down-regulated in tumor subtypes. For example, it was determined that lncRNA, called HOTAIR, encoded in the HOXC cluster was a strong predictor of breast cancer metastasis. In fact, HOTAIR’s mandatory expression is sufficient to promote breast cancer metastasis. A more comprehensive expression of the lncRNA in the protein-overlapping gene promoter region identified many lncRNAs associated with cell cycle regulation. This leads to a functional representation of lncRNA called PANDA, which plays a key role in inhibiting p53-mediated apoptosis. The ” Guilt-by-Association ” approach is universally applicable to any biological system. For example, the telomere-encoding lncRNA family of P. falciprum was identified by its phase-specific coexpression with the important virulence transcription factor of PfsiP2.
These and other related studies have begun to determine the specific role of lncRNA in global transcriptional regulation. Honing in lncRNAs in lncRNA-related pathways identified by hypothesis-driven experiments. However, lncRNA transcriptional regulation and functional full range is far from understood. To learn more about the global regulatory role of lncRNAs, it is necessary to perform functional experiments with integrated gain or loss.
High throughput loss of function by RNA inference
A recent study performed a lost-of-function study across Most of the (237) long intergenic non-coding RNA (lincRNA) expressed in mouse embryonic stem cells (ESC). The authors show that knockout lincRNA has a significant effect on gene expression patterns, comparable to the known knockdown of ESC regulators. Interestingly, this global fluorescent screen determines that lincRNA mainly affects gene expression in transcription. Perhaps more importantly, in the maintenance of multi-energy state, found dozens of lincRNA functionally required. Further studies on the molecular circuit of the ESC suggest that the lincRNA gene is regulated by a key transcription factor, and that the lincRNA transcripts physically bind multiple chromatin to influence the shared gene expression program. This study provides a first glimpse of the global lincRNA functional properties, and highlights their key role in controlling the ESC state of the circuit.
LncRNA and intra-chromosomal looping
Enhancer RNA (eRNA) Driven Enhancer-Promoter Interaction
Enhancer is a well-known DNA regulatory element that plays a central role in gene regulation. Previous studies indicated enhancer function involved the recruitment of transcription factors to promote the detangling of repressed chromatin and facilitate the assembly of transcriptional machinery on target genes. Enhancer RNA (eRNA) are a class of LncRNA that is transcript from enhancer sequences, typically non-polyadenylated nascent RNA with very low copy number. It is still unclear whether eRNAs are just by-product of enhancer activity or whether they have their function in gene regulation. Recent studies show that eRNA may play an important role in the DNA looping enhancer-promoter interaction.
The Human DHRS4 gene is an NADPH-dependent enzyme(NRDR) encoding gene located on chromosome 14q11.2 . There are two homologous downstream genes, DHRS4L1 and DHRS4L2, which could form a gene cluster with DHR4. Previously, scientists have already found that the expression of DHRS4 could be regulated by a NAT (nature antisense transcript) named DHRS4-AS1. DHRS4L1 and DHRS4L2 is highly sequence homologous with DHRS4 and all contains the same promoter sequence of DHRS4-AS1, but no evidence shows that similar NAT could be produced from the antisense template of these two genes. This suggests that there could be an enhancer-promoter interaction in the transcript of DHRS4-AS1 so that only the enhancer-interacted promoter of DHRS4 could be activated, nut not those of DHRS4L1 and DHRS4L2. Active enhancers are typically marked by high levels of H3K4me1, low levels of H3K4me3, and high levels of
H3K27ac. Based on these histone modification features, scientists were able to identify an enhancer located 13.8kb downstream of the DHRS4-AS1 TSS. It is then be confirmed that this enhancer can directly interact with DHRS4-AS1 promoter by chromosome looping by using 3-C (chromosome Conformation Capture) technique. Further evidence shows that this enhancer function by produce eRNA, named AS1eRNA, that can mediate chromatin looping between enhancer and the DHRS4-AS1 promoter, thus enhance the transcript of DHRS4-AS1 NAT.
RNA Pol II and the transcriptional coactivator p300 is key element in maintaining the chromatin looping by occupy both enhancer and target promoter regions. In HepG2 cells, the depletion of DHRS4-AS1 lead to a reduce in the binding of RNA Pol II and p300 with both promoter and enhancer. Suggests that DHRS4-AS1 may function through mediate the long-range chromatin interaction with RNA Pol II and p300 between the As1 enhancer and the DHRs4-AS1 promoter. In the working model proposed in this study, AS1eRNA may have binding affinity to Pol II and p300 that occupy promoter region and Pol II and p300 that occupy enhancer region respectively, thus mediating the spatial proximal of the promoter and enhancer.
In another study about estrogen-receptor α (ER- α), scientists report a global increase in eRNA transcription activity on enhancers that adjacent to E2 upregulated genes after E2 binding to ER- α on enhancers. This indicates that there should be connection between the activation of these E2 upregulated genes, and the increasing transcription of the adjacent enhancers. Scientists then select ten highly-upregulated transcripts for further study. Using RIP-Seq and RIP-qPCR, they find that there’s interaction between Cohesin and eRNAs. Knock-down of these eRNA may result in a decrease in Cohesin recruitment. Depletion of some of the eRNAs including SMC3 and Rad21 would cause loss of promoter-enhancer interactions and block the coding gene induced by E2. This suggests that eRNA could participate in chromosome looping by interact with the key factor cohesin, thus enhance the interaction between promoter and enhancer.
In both studies, when the enhancer is activated by ligand or other pathway, the transcription of enhancer RNA will be activated. The Nascent eRNA transcript will stay with the transcription complex, and then form interaction with other factors like p300 or Cohesin to mediate the chromosome looping in a Cis-pattern. Precise mechanism of how eRNA mediate the looping, whether these eRNA interact with key factors CTCF and Mediator remain a question.
RNA-a Driven Enhancer-Promoter Interaction
MYC gene is an oncogene located in Human chromosome 8. The interaction between MYC gene and the gene dessert located on 8q24 region is tissue-specific in cancer cells including human breast, prostate and colorectal cancer. In Colorectal Cancer(CRC), a well-characterized loop is between MYC and MYC-335, which is an enhancer 335 kb upstream of MYC gene. Human 8q24 gene dessert region has reported to express different kinds of lncRNAs in different cancer cell-lines. One of these tissue- specific LncRNA expressed in CRC is called CCAT1-L (Colorectal Cancer Associated Transcript 1, long isoform), which is 5200nt length and transcribed 515 kb upstream of MYC (MYC-515). In the study, scientists found that MYC-515 can form loop with both MYC-335 and MYC gene. Knockdown of CCAT1-L will reduce the expression level of MYC gene, indicating LncRNA CCAT1-L plays an important role in the regulation of MYC genes. Knockdown of CCAT1-L would significant reduce the interaction between not only MYC-515 and MYC-335, but also MYC-335 and MYC genes. Indicates that CCAT1-L as well as the spatial proximal of MYC-515 are essential part of the interaction between enhancer MYC-335 and promoter of MYC genes. By DNA FISH visualization, CCAT1-L shows strong colocalization with all three loci, MYC, MYC-335 and MYC-515.Different from eRNA that is nascent and non-polyadenylated, functioning when still connected to transcript factors, CCAT1-L LncRNA is fully transcribed and polyadenylated, accumulated in the loci of interaction.
Further study indicates that the looping of MYC promoter and MYC-335, MYC-515 is CTCF-mediated by ChIP-Seq study. The Knockdown of CTCF will disrupt the looping, dramatically reduce the interaction of MYC promoter and MYC-335, MYC-515. Importantly, the knockdown of CTCF will also decrease the transcription of CCAT1-L, indicates there may be a positive regulatory network of MYC including CTCF and CCAT1-L. By using biotin-labeled RNA pulled-down assays and RNA immunoprecipitation, scientists confirm that there is direct interaction between CTCF and CCAT1-L. Knockdown of CCAT1-L led to a modest reduction of CTCF binding to chromatin at their occupied chromatin sites in loop-forming regions at the MYC. This suggests that CCAT1-L lncRNA may act to locally concentrate CTCF or allosterically modify CTCF binding to chromatin to maintain the chromatin looping in the 8q24 region surrounding the MYC locus in CRC cancers.
As mentioned before, MYC gene is an oncogene in many different cancers, and the 8q24 gene dessert region reported to express several different LncRNA in different cancer. Recent study also suggests that this 8q24 region could be a cancer-specific super-enhancer that only become enhancer-like in cancer cells. (High H3K27ac, high H3K4me1, low H3K4me3). Although the genomic locus spans up to 150kb, while a typical enhancer is about ~1.5kb in length. How is this super-enhancer behave in other cancer? Would the other cell-specific LncRNA expressed in this region also have a similar function to CCAT1-L, by mediating the chromosome looping between MYC gene and its enhancer? These questions are worth considering, to find the mechanism behind 8q24 gene dessert region in regulating MYC-genes, and even the other super-enhancer region across human genome.
LncRNA and inter-chromosomal interactions
Long-range chromosome interactions occur not only on the same chromosome, but also among different loci across different chromosomes. Recently, LncRNA Firre (Functional intergenic repeat RNA element) has been reported to act as an important part in the organization of multiple chromosomes to establish a nuclear domain. Firre is a 5.8kb LncRNA transcribed on X chromosome containing 156-bp repeating sequence that form secondary structure. Previous study showed that Firre plays a role in adipogenesis. By performing RNA Antisense Purification (RAP), scientists are able to observed a 5Mb domain of Firre localized around the Firre locus. Strikingly, they also observed an enrichment of Firre on chromosome 2, 9, 15, 17, overlapping known protein-coding genes Slc25a12, Ypel4, Eef1a1, Atf4, and Ppp1r10. 4 out of 5 of these genes are previously identified regulators in adipogenesis. These observations suggest two possible models. One model is that Firre could be shuttled from its transcription site to these sites on other chromosomes. The other model would be, the focal localization of Firre to its own genomic locus could serve as a regional organizing factor to bring the trans-interacting sites into the three-dimensional proximity of the Firre locus on the X-chromosome. In further study, by using single molecule RNA co-FISH, scientists found that these Firre localized locus are spatial proximal. Moreover, the knockdown of Firre will result in the loss of co-localize of these genes. This finding suggests that the second model may be correct. Firre may play an essential role in the either the establishing or the maintaining of a high-order chromosomal architecture that bring genes located on different chromosome spatial proximal. But how this topological organization may affect the regulation of these involved genes required further research.
In another study about LncRNA CUDR (Cancer Upregulated Drug Resistant) in human liver stem cells, scientists found similar function. CUDR is a LncRNA highly-expressed in many cancer cell-lines. The overexpression of CUDR in human liver stem cell will lead to malignant transformation. In the study, CUDR was found to participate in the promoter-enhancer looping of β-catenin, by interacting with CTCF. But in this case, CUDR and β-catenin are not on the same chromosome. CUDR is on chromosome 19 and β-catenin gene CTNNB1 in located on chromosome 3. This means that CUDR, unlike CCAT1-L, is not accumulated on the transcription locus and mediating looping on the same chromosome in an enhancer-like way, but participating a distal looping that is on another chromosome, acting as a trans-element. The overexpression of CUDR will result in an increase interplay of CTCF and CUDR, an increase in the interaction between CTCF and β-catenin enhancer and promoter, an increase in the Pol II and p300 recruited, and finally an increase in the expression level of β-catenin. But in this study, it is not clear about the relation between CUDR and other factors in the enhancer-promoter looping of β-catenin besides CTCF. Is the binding of CUDR and CTCF a global interaction that occur on CTCF in different locus? Or is it a locus-specific interaction that only occurs on the β-catenin locus? If this is a global interaction, does it means that CUCR can participate in many different chromosome looping? If it is locus-specific, then how is CUCR guided to the site? These question remains unanswered.
Interestingly, CUCR does not only function as a factor in DNA looping, but also through methylation inhibitor activity. In the same study, scientists find that CUDR can induce the expression of HULC (long non-coding RNA highly upregulated in liver cancer). As the name suggests, HULC is a kind of LncRNA that is highly upregulated in human liver cancer cells. Overexpression of CUCR will reduce the methylation of HULC promoter, thus increase the expression level of HULC.
LncRNA and Nucleosome positioning
ATP-dependent Remodeler Recruitment
Nucleosomes present a major obstacle for the binding of sequence-specific DNA-binding factors, the interaction of positively charged histone tails with DNA and the masking of DNA binding sites that face in towards the histone octamer surface. All the DNA-dependent processes including transcription, replication, repair and recombination are related to the positioning of nucleosomes on the regulation site. ATP-dependent chromatin remodeling complexes, which use ATP to slide, replace or evict histone on nucleosome, is a key regulator in the nucleosome positioning and chromatin structure. In human and other mammalian cells, the role of small non-coding RNA in the regulation and targeting of the ATP-dependent chromatin remodeling complex is well-discussed, but not the LncRNA. In the meanwhile, recent studies show that LncRNA is an important factor in the nucleosome positioning in plants.
RNA polymerase V (Pol V) is a multi-subunit plant specific RNA polymerase found in nucleus. In Arabidopsis, lncRNA produced by Pol V serves as a binding scaffold for several RNA-binding proteins including INVOLVED IN DE NOVO 2 (IDN2). This protein was discovered in forward genetic screens and was shown to be required for RNA Direct DNA Methylation (RdDM). IDN2 physically interacts with SWI3B, a core subunit of the most well studied ATP-dependent chromatin remodeling complex, the SWI/SNF complex. The SWI/SNF complex regulates gene transcription as a multi-protein system that physically move nucleosomes at gene promoters. This interaction guides the SWI/SNF complex to loci transcribed by Pol V, where specific nucleosomes are stabilized. This way, lncRNA produced by Pol V in Arabidopsis is involved in active nucleosome positioning by binding IDN2 and recruiting SWI/SWF complex to the loci of Pol V transcription.
Recruitment of the SWI / SNF complex to the RdDM-targeted site may also involve additional lncRNA binding proteins. It has been shown that the binding of IDN2 to lncRNA requires a previously present ARGONAUTE4 (AGO4), which is the main Argonaute involved in RdDM in Arabidopsis. AGO4 introduces siRNA that can provide sequence specificity to a particular genomic region by base pairing between siRNA and lncRNA. Since SWI3B was recruited by IDN2, the combination of SWI / SNF and RdDM targets may require AGO4 and siRNA. Another lncRNA binding protein involved in RdDM is the inhibitor type 5-LIKE (SPT5L), which binds to silenced loci parallel to AGO4. Although the function of SPT5L and its effect on the binding of IDN2 to lncRNA is still unknown, it is required for transcriptional silencing at least on a subset of the RdDM target. This suggests that SPT5L may also participate in SWI / SNF recruiting chromatin. Similarly, maize homologs that have been shown to be RNA-dependent RNA polymerase required for siRNA production affect nucleosome localization on a particular locus. Although there is no indication that RdDM-mediated recruitment of SWI / SNF in maize, this further indicates that additional RdDM components are involved in nucleosome positioning.
This way, lncRNA produced by Pol V in Arabidopsis is involved in active nucleosome positioning by binding IDN2 and recruiting SWI/SWF complex to the loci of Pol V transcription. The binding of AGO4 and SPT5L with the Pol V transcribed LncRNA may also affect the recruitment of SWI/SWF complex. The SWI/SNF complex positions nucleosomes, which affect Pol II transcription by facilitating DNA methylation and/or restricting protein access to DNA.
Similar mechanisms exist in yeast. In S. pombe, pericentromeric and other heterochromatic regions are transcribed into LncRNAs, and these LncRNAs are bound by Seb1, a homolog of the conserved RNA binding protein Nrd1. Seb1 recruited the SHREC complex, which contains the putative Snf2 chromatin remodeler Mit1, which is necessary for proper nucleosome positioning. SHREC eliminates the nucleosome-free regions and establish histone H3 lysine 9 dimethylation(H3K9me2). Thus, the transcription initiation site may become inaccessible, and Pol II association may be inhibited. These results together show that in the fission yeast, lncRNA regulates the location of nucleosomes by recruiting ATP-dependent chromatin remodeling factors. This mechanism is similar to that in plant RdDM, where chromatin recombination is raised by heterochromatic lncRNA. An important difference is that the SHREC recruitment does not involve siRNA or Argonaute, which seems to work in parallel with RNAi. Although several evidences suggest that lncRNAs control the positioning of nucleosomes in various organisms, whether it is the main mechanism of the recruitment of chromatin remodelers causing this phenomenon remain unknown.
Impairing the binding of Remodeler
Previously, we discussed the role of LncRNA in the recruitment of ATP-dependent remodeler to the promoter nearby the transcription site of LncRNA in cis. In another study, scientists report that LncRNA could also act as an inhibitor in the recruitment of remodeler to promoter in an in trans way.
In this study, scientists characterize a novel LncRNA SChLAP1 (Second Chromosome Locus Associated with Prostate-1), which overexpressed in a subset of prostate cancer. In vitro and in vivo experiment indicates that this LncRNA should pay a critical role in cancer cell invasiveness and metastasis. When performing the knockdown of SNF5(also known as SMARCB1), an essential subunit of SWI/SWF, facilitating SWI/SWF binding to histone proteins, it shows opposite effect on the expression level of genes that also regulated by SChLAP1. This indicates that SChLAP1 functions antagonistically to SWI/SWF. Mechanistically, the knockdown of SChLAP1 have no impact on the expression level of SNF5, demonstrating that SChLAP1 is not acting through directly regulation of the expression of SWI/SWF, but in a post-transcriptional way. Further using RIP assays for SNF5, scientists are able to observe that SNF5 are co-immunoprecipitated with SChLAP1 but no other LncRNAs. In a ChIP-Seq of SNF5, 6235 genome-wide binding sites were found for SNF5, highly enriched for sites which are near gene promoters. When SChLAP1 is overexpressed, a dramatic decrease of SNF5 binding is found in these 6235 sites, and when SChLAP1 is knockdown, an increase of binding is observed in these sites. Overall, these data suggest that SChLAP1 may antagonize the function of SWI/SWF by disrupt the genomic binding activity of this complex, thus impairing its ability to regulate gene expression. Unlike the recruitment function we discussed formerly, which mostly function near its site of transcription, this SChLAP1 appear to function across genome in trans. It will directly interact with the SWI/SWF complex, thereby decrease its ability of promoter binding. Interestingly, in this study, the decrease of SWI/SWF binding by overexpressed SChLAP1 will results in a primarily downregulate of genes nearby the binding site, thought the function of SWI/SWF complex is known as regulating the gene expression in either direction.
Direct Nucleosome positioning through transcription
Transcription-mediated silencing, also referred to as ‘transcriptional interference’ (TI), is defined here as a case in which the act of transcription of one gene can repress in cis the functional transcription of another gene. The DNA in the nucleus is organized into chromatin, and the tissue scaffold consists of nucleosomes, each with two copies of H3, H4, H2A and H2B histones. Nucleosomes can be densely packed, interfering with protein-DNA interactions or relaxation, and promoting these interactions. The transcriptional process of RNAPII along the gene locus can directly influence nucleosome positioning. Thus, lncRNA transcription can cause TI by depositing nucleosomes in a manner that is not conducive to TF binding on the promoter or enhancer. An example of this mechanism is the silencing of the yeast SER3 pc gene by the transcription of LncRNA SRG1. The SRG1 transcription process free up the space for binding by moving the pre-occupying nucleosome aside, increases the density of nucleosomes on overlapping SER3 promoters, thus block the binding of TF on the SER3 promoter. The deletion of three transcriptional elongation factors SPT16, SPT6, SPT2, which are associated with nucleosome positioning will abolish the silencing effect on SER3 without termination the transcription of SRG1. In contrast, the depletion of epigenetic modifiers including histone methyltransferase and DNA methylation factors doesn’t affect the silencing of SER3, which means that SRG1 doesn’t function through a methylation modification pathway, but the nuclear positioning. These finding indicates that the transcription process of SRG1 can directly change the density of nucleosome on the nearby promoter site of SER3 gene, thereby block the binding of TF to the promoter, and silence the SER3 gene. Although the role of SRG1 LncRNA molecule is not excluded in the silencing of SER3 gene, scientists suggest that the transcription process alone can explain the silencing.
The transcriptional interference by nucleosome repositioning may be a general mechanism in yeast, because the RNAPII elongation and chromatin organization factors responsible for SER3 silencing, are also known to be involved in the suppression of transcription initiation from cryptic promoters within the body of actively transcribed genes. Since genes controlling RNAPII elongation and chromatin organization are largely conserved, it is possible that lncRNAs could use similar nucleosome repositioning silencing in mammals.
Useful Techniques
ChIP-Seq
ChIP sequencing, also known as ChIP-seq, is a method for analyzing the interaction of proteins with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with large-scale parallel DNA sequencing to identify DNA-associated protein binding sites. Used to accurately map the global binding site to obtain any protein of interest. ChIP-seq may be used to determine how transcription factors and other chromatin-related proteins affect phenotypic effects. It is absolutely indispensable to determine how proteins interact with DNA to regulate gene expression. This is an epigenetic information that is complementary to genotype and expression analysis. ChIP-seq technology is currently considered an alternative to the ChIP chip that requires hybrid arrays. This is some positive because the array is limited to a fixed number of collisions. In contrast, sorting is considered to have a small deviation, although sequence alignment of different sequencing techniques has not yet been fully understood.
Specific DNA loci that interact directly with transcription factors and other proteins can be isolated by chromatin immunoprecipitation. ChIP produces a library of target DNA sites that bind to the protein of interest in vivo. A large number of parallel sequence analyzes were used in conjunction with the whole genome sequence database to analyze the pattern of interaction of DNA or any apparent genetic chromatin modification. This can be applied to encodeable proteins and modifications, such as transcription factors, polymerases and transcription mechanisms, structural proteins, protein modifications and DNA modifications. As an alternative to specific antibody dependencies, different methods have been developed to find supersets of the active regulatory regions of all nucleosome deletions or nucleosomal destruction in the genome, such as DNase-Seq and FAIRE-Seq.
RIP-Seq
RIP is an antibody-based technique for mapping in vivo RNA-protein interactions. Interested RNA binding proteins (RBP) are Transact detected by real-time PCR, microarray or sequencing. When we begin to realize that interest in RNA-protein interactions is booming, the role of RNA in it is not only in mature processes such as transcription, splicing and translation, but also in newer fields such as RNA interference and non-coding RNA of gene regulation.
ChIRP
The chromatin isolation by RNA purification (ChIRP) is a strategy for mapping the full length genome length at high resolution based on the affinity of the target lncRNA: chromatin complex by tiling antisense oligonucleotides And capture, and then on the antisense oligomer to produce genomic binding sites of the spectrum resolution of up to hundreds, high sensitivity, low background. ChIRP is suitable for many lncRNAs because the design of the affinity probe is direct in the RNA sequence and does not require knowledge of the structure or domain of the RNA. ChIRP is a novel and rapid technique for mapping long non-coding RNA (lncRNA) genomic binding sites. The method utilizes the specificity of the antisense spliced oligonucleotides to allow for the counting of genomic sites that bind to lncRNA.
RAP
RNA-centric biochemical purification is a general way to study the function and mechanism of non-coding RNA. RNA antisense purification (RAP) is a method for selectively purifying an endogenous RNA complex from a cell extract that allows mapping of RNA to chromatin interactions. In RAP, the user cross-links the cells to immobilize the endogenous RNA complex and purifies these complexes by hybridization with biotinylated antisense oligonucleotides. Identification of DNA sites interacting with the target RNA using high throughput DNA sequencing. RNA antisense purification (RAP) is a method for in vivo purification of large amounts of non-coding RNA (lncRNA) complexes. RAP uses a biotinylated antisense probe to hybridize to the target RNA to purify endogenous RNA and its associated proteins, RNA and genomic DNA from the cross-linked cell lysates. RAP is designed to achieve explicit purification of chromatin associated with the target lncRNA, to achieve a high resolution map of the relevant DNA target site by sequencing the captured DNA, and to capture any lncRNA with minimal optimization. To achieve high specificity, RAP utilizes 120 nucleotides of antisense RNA to form a very strong hybrid with a target using denaturation conditions that disrupt non-specific RNA-protein interactions and nonspecific hybridization with RNA or genomic DNA. To achieve high resolution, RAP uses DNase I to digest genomic DNA to the ~ 150 bp fragment, which provides a high resolution map of binding sites. In order to capture LncRNA strongly, RAP uses multiple probe pools tiled on the entire length of the target RNA to ensure capture even in the context of extensive protein-RNA interactions, RNA secondary structures, or partial RNA degradation.
Chromosome conformation capture
Chromosome conformation capture techniques (commonly abbreviated as 3C techniques or 3C-based methods) are a set of molecular biological methods for analyzing the spatial organization of chromatin in cells. These methods quantify the number of interactions between genomic loci near the 3-D space, but can be separated by many nucleotides in the linear genome. This interaction may be caused by biological functions, such as promoter-enhancer interactions, or from random polymer cycling, where the inadvertent physical movement of chromatin results in the locus colliding with each other. The interactive spectrum can be analyzed directly or converted into a place for reconstructing a three-dimensional structure.
The balance between the 3C-based methods is their range. For example, in 3C, the interaction between the two grains is quantified. In contrast, Hi-C quantifies the interaction between all right pairs
All 3C methods begin with similar steps that are performed on the cell samples. First, the cell genome is cross-linked, and which developers reduce the “freeze” interaction between genomic loci. Then cut the genome. Next, make a random connection. This quantifies the proximity of the fragment, and the fragment may be connected to subsequent fragments.
Subsequently, the ligated fragments are quantified using one of a number of techniques.
3C (one-vs-one)
The chromosome conformation capture (3C) experiment quantifies interactions between a single pair of genomic loci. For example, 3C can be used to test a candidate promoter-enhancer interaction. Ligated fragments are detected using PCR with known
4C (one-vs-all)
Chromosome conformation capture-on-chip (4C) captures interactions between one locus and all other genomic loci. It involves a second ligation step, to create self-circularized DNA fragments, which are used to perform inverse PCR. Inverse PCR allows the known sequence to be used to amplify the unknown sequence ligated to it. In contrast to 3C and 5C, the 4C technique does not require the prior knowledge of both interacting chromosomal regions. Results obtained using 4C are highly reproducible with most of the interactions that are detected between regions proximal to one another. On a single microarray, approximately a million interactions can be analyzed.
5C (many-vs-many)
Chromosome conformation capture carbon copy (5C) detects interactions between all restriction fragments within a given region, with this region’s size typically no greater than a megabase. This is done by ligating universal primers to all fragments. However, 5C has relatively low coverage. The 5C technique overcomes the junctional problems at the intramolecular ligation step and is useful for constructing complex interactions of specific loci of interest. This approach is unsuitable for conducting genome-wide complex interactions since that will require millions of 5C primers to be used.
Hi-C (all-vs-all)
Hi-C uses high-throughput sequencing to find the nucleotide sequence of fragments.The original protocol used paired end sequencing, which retrieves a short sequence from each end of each ligated fragment. As such, for a given ligated fragment, the two sequences obtained should represent two different restriction fragments that were ligated together in the random ligation step. The pair of sequences are individually aligned to the genome, thus determining the fragments involved in that ligation event. Hence, all possible pairwise interactions between fragments are tested.
Workflow