[wplug] Text searching

Wed Mar 26 23:38:26 EST 2003

I've always thought about writing some perl code that could do motif 
searches.  I know that there are some perl modules that'll let you do 
something like a BLAST search, at http://bio.perl.org.

Probably the best solution for this is to just use some program to get 
rid of all the newlines, numbers and spaces, and then run queries over 
it.  But then, you lose all the nice formatting and line numbers.

I thought about it a bit, and wrote this:

#!/usr/bin/perl -w
# Simple base pair search program
# just set the $searchstring to be the string of nucleotides to search

# The string you are looking for.
$searchstring = "ggatac";

# regular expression that means "Not 'a','t','g' or 'c', repeating"
$between = "[^atgc]*";

# basically places the expression between each character in the search 
string
$querystring = join $between,(split "",$searchstring);
#generate a regular expression out of the query string.
$query = qr/$querystring/;

# these two lines allow you to slurp the whole file into a perl variable
undef $/;
$input = <>;

# put brackets around every instance of the search string, including 
when
# it wraps around
$input =~ s/($query)/[$1]/g;

# print out the reformatted text.
print $input;

> On Mon, 2003-03-17 at 09:07, Doug Green wrote:
>> Hi all-
>>
>> I have some large text files that I need to search. [...]
>> Essentially, the format
>> looks like this (obviously, the content is different):
>>
>> 1       atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
>> atacaatagg
>> 61     atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
>> atacaatagg
>>
>> I need to be able to search within this kind of text file for a
>> string of
>> letters that is maybe 30-40 letters long, ignoring the spaces and
>> numbers.