<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Jothi, Raja</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">ChIP-Seq data analysis: identification of protein-DNA binding sites with SISSRs peak-finder</style></title><secondary-title><style face="normal" font="default" size="100%">Methods in molecular biology</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2012</style></year><pub-dates><date><style  face="normal" font="default" size="100%">JAN</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">802</style></volume><pages><style face="normal" font="default" size="100%">305-22</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Protein-DNA interactions play key roles in determining gene-expression programs during cellular development and differentiation. Chromatin immunoprecipitation (ChIP) is the most widely used assay for probing such interactions. With recent advances in sequencing technology, ChIP-Seq, an approach that combines ChIP and next-generation parallel sequencing is fast becoming the method of choice for mapping protein-DNA interactions on a genome-wide scale. Here, we briefly review the ChIP-Seq approach for mapping protein-DNA interactions and describe the use of the SISSRs peak-finder, a software tool for precise identification of protein-DNA binding sites from sequencing data generated using ChIP-Seq.</style></abstract><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">1.69</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Taher, Leila</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Ovcharenko, Ivan</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Clare: cracking the language of regulatory elements</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2012</style></year><pub-dates><date><style  face="normal" font="default" size="100%">FEB</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">4</style></number><publisher><style face="normal" font="default" size="100%">OXFORD UNIV PRESS</style></publisher><pub-location><style face="normal" font="default" size="100%">GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND</style></pub-location><volume><style face="normal" font="default" size="100%">28</style></volume><pages><style face="normal" font="default" size="100%">581-583</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;CLARE is a computational method designed to reveal sequence encryption of tissue- specific regulatory elements. Starting with a set of regulatory elements known to be active in a particular tissue/process, it learns the sequence code of the input set and builds a predictive model from features specific to those elements. The resulting model can then be applied to user-supplied genomic regions to identify novel candidate regulatory elements. CLARE's model also provides a detailed analysis of transcription factors that most likely bind to the elements, making it an invaluable tool for understanding mechanisms of tissue- specific gene regulation.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">4</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">5.323
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">MuMoD: a bayesian approach to detect multiple modes of protein-DNA binding from genome-wide ChIP data</style></title><secondary-title><style face="normal" font="default" size="100%">Nucleic Acids Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2013</style></year><pub-dates><date><style  face="normal" font="default" size="100%">JAN</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">1</style></number><publisher><style face="normal" font="default" size="100%">OXFORD UNIV PRESS</style></publisher><pub-location><style face="normal" font="default" size="100%">GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND</style></pub-location><volume><style face="normal" font="default" size="100%">41</style></volume><pages><style face="normal" font="default" size="100%">21-32</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein. Such regions are then investigated for overrepresented sequence motifs, the assumption being that they must correspond to the binding specificity of the profiled protein. However this approach often fails: many bound regions do not contain the `expected' motif. This is because binding DNA directly at its recognition site is not the only way the protein can cause the region to immunoprecipitate. Its binding specificity can change through association with different co-factors, it can bind DNA indirectly, through intermediaries, or even enforce its function through long-range chromosomal interactions. Conventional motif discovery methods, though largely capable of identifying overrepresented motifs from bound regions, lack the ability to characterize such diverse modes of protein-DNA binding and binding specificities. We present a novel Bayesian method that identifies distinct protein-DNA binding mechanisms without relying on any motif database. The method successfully identifies co-factors of proteins that do not bind DNA directly, such as mediator and p300. It also predicts literature-supported enhancer-promoter interactions. Even for well-studied direct-binding proteins, this method provides compelling evidence for previously uncharacterized dependencies within positions of binding sites, long-range chromosomal interactions and dimerization.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">1</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">8.808
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Mehta, Nidhi</style></author><author><style face="normal" font="default" size="100%">Galande, Sanjeev</style></author><author><style face="normal" font="default" size="100%">Arjunwadkar, Mihir</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">One size does not fit all: on how markov model order dictates performance of genomic sequence analyses</style></title><secondary-title><style face="normal" font="default" size="100%">Nucleic Acids Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2013</style></year><pub-dates><date><style  face="normal" font="default" size="100%">FEB</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">3</style></number><publisher><style face="normal" font="default" size="100%">OXFORD UNIV PRESS</style></publisher><pub-location><style face="normal" font="default" size="100%">GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND</style></pub-location><volume><style face="normal" font="default" size="100%">41</style></volume><pages><style face="normal" font="default" size="100%">1416-1424</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes and regulatory elements. A critical, yet relatively unexplored, issue is the determination of the order of the Markov model. Most biological applications use a predetermined order for all data sets indiscriminately. Here, we show the vast variation in the performance of such applications with the order. To identify the `optimal' order, we investigated two model selection criteria: Akaike information criterion and Bayesian information criterion (BIC). The BIC optimal order delivers the best performance for mammalian phylogeny reconstruction and motif discovery. Importantly, this order is different from orders typically used by many tools, suggesting that a simple additional step determining this order can significantly improve results. Further, we describe a novel classification approach based on BIC optimal Markov models to predict functionality of tissue-specific promoters. Our classifier discriminates between promoters active across 12 different tissues with remarkable accuracy, yielding 3 times the precision expected by chance. Application to the metagenomics problem of identifying the taxum from a short DNA fragment yields accuracies at least as high as the more complex mainstream methodologies, while retaining conceptual and computational simplicity.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">8.808
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Multiple novel promoter-architectures revealed by decoding the hidden heterogeneity within the genome</style></title><secondary-title><style face="normal" font="default" size="100%">Nucleic Acids Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2014</style></year><pub-dates><date><style  face="normal" font="default" size="100%">NOV</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">20</style></number><publisher><style face="normal" font="default" size="100%">OXFORD UNIV PRESS</style></publisher><pub-location><style face="normal" font="default" size="100%">GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND</style></pub-location><volume><style face="normal" font="default" size="100%">42</style></volume><pages><style face="normal" font="default" size="100%">12388-12403</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;An important question in biology is how different promoter-architectures contribute to the diversity in regulation of transcription initiation. A step forward has been the production of genome-wide maps of transcription start sites (TSSs) using high-throughput sequencing. However, the subsequent step of characterizing promoters and their functions is still largely done on the basis of previously established promoter-elements like the TATA-box in eukaryotes or the -10 box in bacteria. Unfortunately, a majority of promoters and their activities cannot be explained by these few elements. Traditional motif discovery methods that identify novel elements also fail here, because TSS neighborhoods are often highly heterogeneous containing no overrepresented motif. We present a new, organism-independent method that explicitly models this heterogeneity while unraveling different promoter-architectures. For example, in five bacteria, we detect the presence of a pyrimidine preceding the TSS under very specific circumstances. In tuberculosis, we show for the first time that the spacing between the bacterial 10-motif and TSS is utilized by the pathogen for dynamic gene-regulation. In eukaryotes, we identify several new elements that are important for development. Identified promoter-architectures show differential patterns of evolution, chromatin structure and TSS spread, suggesting distinct regulatory functions. This work highlights the importance of characterizing heterogeneity within high-throughput genomic data rather than analyzing average patterns of nucleotide composition.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">20</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">15.67
</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Taher, Leila</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Ovcharenko, Ivan</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Identification and computational analysis of gene regulatory elements</style></title><secondary-title><style face="normal" font="default" size="100%">Cold Spring Harbor Protocols</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2015</style></year><pub-dates><date><style  face="normal" font="default" size="100%">JAN</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">1</style></volume><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes. © 2015 Cold Spring Harbor Laboratory Press.&lt;/p&gt;</style></abstract><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;</style></custom3><custom4><style face="normal" font="default" size="100%">0.85</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Mitra, Sneha</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">No promoter left behind (NPLB): learn de novo promoter architectures from genome-wide transcription start sites</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2016</style></year><pub-dates><date><style  face="normal" font="default" size="100%">MAR</style></date></pub-dates></dates><number><style face="normal" font="default" size="100%">5</style></number><publisher><style face="normal" font="default" size="100%">OXFORD UNIV PRESS</style></publisher><pub-location><style face="normal" font="default" size="100%">GREAT CLARENDON ST, OXFORD OX2 6DP, ENGLAND</style></pub-location><volume><style face="normal" font="default" size="100%">32</style></volume><pages><style face="normal" font="default" size="100%">779-781</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Promoters have diverse regulatory architectures and thus activate genes differently. For example, some have a TATA-box, many others do not. Even the ones with it can differ in its position relative to the transcription start site (TSS). No Promoter Left Behind (NPLB) is an efficient, organism-independent method for characterizing such diverse architectures directly from experimentally identified genome-wide TSSs, without relying on known promoter elements. As a test case, we show its application in identifying novel architectures in the fly genome.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">5</style></issue><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;</style></custom3><custom4><style face="normal" font="default" size="100%">5.766</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Mitra, Sneha</style></author><author><style face="normal" font="default" size="100%">Biswas, Anushua</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Diversity in binding, regulation, and evolution revealed from high-throughput ChIP</style></title><secondary-title><style face="normal" font="default" size="100%">PLoS Computational Biology</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">APR</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">14</style></volume><pages><style face="normal" font="default" size="100%">Article Number: e1006090</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions are typically investigated for enriched sequence-motifs, which are likely to model the DNA-binding specificity of the profiled protein and/or of co-occurring proteins. However, simple enrichment analyses can miss insights into the binding-activity of the protein. Note that ChIP reports regions making direct contact with the protein as well as those binding through intermediaries. For example, consider a ChIP experiment targeting protein X, which binds DNA at its cognate sites, but simultaneously interacts with four other proteins. Each of these proteins also binds to its own specific cognate sites along distant parts of the genome, a scenario consistent with the current view of transcriptional hubs and chromatin loops. Since ChIP will pull down all X-associated regions, the final reported data will be a union of five distinct sets of regions, each containing binding sites of one of the five proteins, respectively. Characterizing all five different motifs and the corresponding sets is important to interpret the ChIP experiment and ultimately, the role of X in regulation. We present DIVERSITY which attempts exactly this: it partitions the data so that each partition can be characterized with its own de novo motif. DIVERSITY uses a Bayesian approach to identify the optimal number of motifs and the associated partitions, which together explain the entire dataset. This is in contrast to standard motif finders, which report motifs individually enriched in the data, but do not necessarily explain all reported regions. We show that the different motifs and associated regions identified by DIVERSITY give insights into the various complexes that may be forming along the chromatin, something that has so far not been attempted from ChIP data. Webserver at nci.res.i, if; standalone (Mac OS X/Linux) from from https://github.com/NarlikarLab/DIVERSITY/releases/tag/v1.0.0.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">4</style></issue><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;</style></custom3><custom4><style face="normal" font="default" size="100%">&lt;p&gt;4.542&lt;/p&gt;</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Agrawal, Ankit</style></author><author><style face="normal" font="default" size="100%">Sambare, Snehal V.</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Siddharthan, Rahul</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">THiCweed: fast, sensitive detection of sequence features by clustering big datasets</style></title><secondary-title><style face="normal" font="default" size="100%">Nucleic Acids Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">MAR</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">46</style></volume><pages><style face="normal" font="default" size="100%">e29</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30 000 peaks in 1-2 h, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs. THiCweed performs best with large `window' sizes (&amp;gt;50 bp), much longer than typical binding sites (7-15 bp). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs and secondary motifs even when they occur in &amp;lt;5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIP-seq datasets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif finding to give new insights into genomic transcription factor-binding complexity.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">5</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">10.162</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sreekumar, Lakshmi</style></author><author><style face="normal" font="default" size="100%">Kumari, Kiran</style></author><author><style face="normal" font="default" size="100%">Guin, Krishnendu</style></author><author><style face="normal" font="default" size="100%">Bakshi, Asif</style></author><author><style face="normal" font="default" size="100%">Varshney, Neha</style></author><author><style face="normal" font="default" size="100%">Thimmappa, Bhagya C.</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Padinhateeri, Ranjith</style></author><author><style face="normal" font="default" size="100%">Siddharthan, Rahul</style></author><author><style face="normal" font="default" size="100%">Sanyal, Kaustuv</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Orc4 spatiotemporally stabilizes centromeric chromatin</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">APR</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">31</style></volume><pages><style face="normal" font="default" size="100%">607-621</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The establishment of centromeric chromatin and its propagation by the centromere-specific histone CENPA is mediated by epigenetic mechanisms in most eukaryotes. DNA replication origins, origin binding proteins, and replication timing of centromere DNA are important determinants of centromere function. The epigenetically regulated regional centromeres in the budding yeast Candida albicans have unique DNA sequences that replicate earliest in every chromosome and are clustered throughout the cell cycle. In this study, the genome-wide occupancy of the replication initiation protein Orc4 reveals its abundance at all centromeres in C. albicans. Orc4 is associated with four different DNA sequence motifs, one of which coincides with tRNA genes (tDNA) that replicate early and cluster together in space. Hi-C combined with genome-wide replication timing analyses identify that early replicating Orc4-bound regions interact with themselves stronger than with late replicating Orc4-bound regions. We simulate a polymer model of chromosomes of C. albicans and propose that the early replicating and highly enriched Orc4-bound sites preferentially localize around the clustered kinetochores. We also observe that Orc4 is constitutively localized to centromeres, and both Orc4 and the helicase Mcm2 are essential for cell viability and CENPA stability in C. albicans. Finally, we show that new molecules of CENPA are recruited to centromeres during late anaphase/telophase, which coincides with the stage at which the CENPA-specific chaperone Scm3 localizes to the kinetochore. We propose that the spatiotemporal localization of Orc4 within the nucleus, in collaboration with Mcm2 and Scm3, maintains centromeric chromatin stability and CENPA recruitment in C. albicans.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">4</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">&lt;p&gt;Foreign&lt;/p&gt;</style></custom3><custom4><style face="normal" font="default" size="100%">9.043</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Biswas, Anushua</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Resolving diverse protein-DNA footprints from exonuclease-based ChIP experiments</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">JUL</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">37</style></volume><pages><style face="normal" font="default" size="100%">I367-I375</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Motivation: High-throughput chromatin immunoprecipitation (ChIP) sequencing-based assays capture genomic regions associated with the profiled transcription factor (TF). ChIP-exo is a modified protocol, which uses lambda exonuclease to digest DNA close to the TF-DNA complex, in order to improve on the positional resolution of the TF-DNA contact. Because the digestion occurs in the 50-30 orientation, the protocol produces directional footprints close to the complex, on both sides of the double stranded DNA. Like all ChIP-based methods, ChIP-exo reports a mixture of different regions associated with the TF: those bound directly to the TF as well as via intermediaries. However, the distribution of footprints are likely to be indicative of the complex forming at the DNA. Results: We present ExoDiversity, which uses a model-based framework to learn a joint distribution over footprints and motifs, thus resolving the mixture of ChIP-exo footprints into diverse binding modes. It uses no prior motif or TF information and automatically learns the number of different modes from the data. We show its application on a wide range of TFs and organisms/cell-types. Because its goal is to explain the complete set of reported regions, it is able to identify co-factor TF motifs that appear in a small fraction of the dataset. Further, ExoDiversity discovers small nucleotide variations within and outside canonical motifs, which co-occur with variations in footprints, suggesting that the TF-DNA structural configuration at those regions is likely to be different. Finally, we show that detected modes have specific DNA shape features and conservation signals, giving insights into the structure and function of the putative TF-DNA complexes.</style></abstract><work-type><style face="normal" font="default" size="100%">Article; Proceedings Paper</style></work-type><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">6.937</style></custom4></record><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Biswas, Anushua</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">A universal framework for detecting cis-regulatory diversity in DNA regions</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2021</style></year><pub-dates><date><style  face="normal" font="default" size="100%">SEP</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">31</style></volume><pages><style face="normal" font="default" size="100%">1646-1662</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genomewide. These activities include transcription factor (TF)-DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/ motifs. Current approaches do not take into account this issue of combinatorial diversity. We present a new statistical framework, cisDIVERSITY, which models regions as diverse modules characterized by combinations of motifs while simultaneously learning the motifs themselves. Because cisDIVERSITY does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. For example, in enhancer predictions resulting from different assays-GRO-cap, STARR-seq, and those measuring chromatin structure-cisDIVERSITY discovers distinct modules and combinations of TF binding sites, some specific to the assay. From protein-DNA binding data, cisDIVERSITY identifies potential cofactors of the profiled TF, whereas from ATAC-seq data, it identifies tissue-specific regulatory modules. Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell-state encode information about future states, with certain modules staying open and others closing down in the next time point.</style></abstract><issue><style face="normal" font="default" size="100%">9</style></issue><work-type><style face="normal" font="default" size="100%">Article</style></work-type><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">9.043</style></custom4></record></records></xml>