<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Agrawal, Ankit</style></author><author><style face="normal" font="default" size="100%">Sambare, Snehal V.</style></author><author><style face="normal" font="default" size="100%">Narlikar, Leelavati</style></author><author><style face="normal" font="default" size="100%">Siddharthan, Rahul</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">THiCweed: fast, sensitive detection of sequence features by clustering big datasets</style></title><secondary-title><style face="normal" font="default" size="100%">Nucleic Acids Research</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2018</style></year><pub-dates><date><style  face="normal" font="default" size="100%">MAR</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">46</style></volume><pages><style face="normal" font="default" size="100%">e29</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30 000 peaks in 1-2 h, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs. THiCweed performs best with large `window' sizes (&amp;gt;50 bp), much longer than typical binding sites (7-15 bp). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs and secondary motifs even when they occur in &amp;lt;5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIP-seq datasets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif finding to give new insights into genomic transcription factor-binding complexity.&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">5</style></issue><custom3><style face="normal" font="default" size="100%">Foreign</style></custom3><custom4><style face="normal" font="default" size="100%">10.162</style></custom4></record></records></xml>