[wplug] Text searching
Jonathan S. Billings
billings at negate.org
Wed Mar 26 23:38:26 EST 2003
I've always thought about writing some perl code that could do motif
searches. I know that there are some perl modules that'll let you do
something like a BLAST search, at http://bio.perl.org.
Probably the best solution for this is to just use some program to get
rid of all the newlines, numbers and spaces, and then run queries over
it. But then, you lose all the nice formatting and line numbers.
I thought about it a bit, and wrote this:
#!/usr/bin/perl -w
# Simple base pair search program
# just set the $searchstring to be the string of nucleotides to search
# The string you are looking for.
$searchstring = "ggatac";
# regular expression that means "Not 'a','t','g' or 'c', repeating"
$between = "[^atgc]*";
# basically places the expression between each character in the search
string
$querystring = join $between,(split "",$searchstring);
#generate a regular expression out of the query string.
$query = qr/$querystring/;
# these two lines allow you to slurp the whole file into a perl variable
undef $/;
$input = <>;
# put brackets around every instance of the search string, including
when
# it wraps around
$input =~ s/($query)/[$1]/g;
# print out the reformatted text.
print $input;
> On Mon, 2003-03-17 at 09:07, Doug Green wrote:
>> Hi all-
>>
>> I have some large text files that I need to search. [...]
>> Essentially, the format
>> looks like this (obviously, the content is different):
>>
>> 1 atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
>> atacaatagg
>> 61 atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
>> atacaatagg
>>
>> I need to be able to search within this kind of text file for a
>> string of
>> letters that is maybe 30-40 letters long, ignoring the spaces and
>> numbers.
More information about the wplug
mailing list