[wplug] Text searching
Paul Cantalupo
lupey+ at pitt.edu
Mon Mar 17 09:50:39 EST 2003
Doug Green wrote:
>
> Hi all-
>
> I have some large text files that I need to search. They are genomic
> sequences, and consist of 4 letters in a block of 10, separated by a
> space. There are 6 such blocks on a line, and each line is numbered
> for the order of the first letter (maybe 20,000+ lines per file?).
> Essentially, the format looks like this (obviously, the content is
> different):
>
> 1 atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
> atacaatagg
> 61 atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
> atacaatagg
>
> I need to be able to search within this kind of text file for a string
> of letters that is maybe 30-40 letters long, ignoring the spaces and
> numbers. The whole point is that I need to locate the position of my
> search string within the original text. Is there some fancy way to
> grep the file, ignoring spaces and numbers? Or to somehow filter out
> the spaces and numbers, creating a new file (maybe some cat option
> piped into a new file??)?
>
> Any help/suggestions are greatly appreciated! Thanks!
>
> Doug
Doug,
I don't have a nifty filter command for you but you can find the
solution to this problem somewhere in the O'Reilly book "Beginning Perl
for Bioinformatics" (http://www.oreilly.com/catalog/begperlbio/). It is
at the Carneige library. The book provides a open source perl module,
BeginPerlBioinfo.pm, at
http://examples.oreilly.com/begperlbio/BeginPerlBioinfo.pm.
Also, if you want more powerful Perl stuff for Bioinformatics, check out
www.bioperl.org.
Good luck,
Paul
More information about the wplug
mailing list