[wplug] Text searching

Doug Green Green at np.awing.upmc.edu
Mon Mar 17 09:07:07 EST 2003


Hi all-

I have some large text files that I need to search. They are genomic
sequences, and consist of 4 letters in a block of 10, separated by a space.
There are 6 such blocks on a line, and each line is numbered for the order
of the first letter (maybe 20,000+ lines per file?). Essentially, the format
looks like this (obviously, the content is different):

1       atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg
61     atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg atacaatagg

I need to be able to search within this kind of text file for a string of
letters that is maybe 30-40 letters long, ignoring the spaces and numbers.
The whole point is that I need to locate the position of my search string
within the original text. Is there some fancy way to grep the file, ignoring
spaces and numbers? Or to somehow filter out the spaces and numbers,
creating a new file (maybe some cat option piped into a new file??)?

Any help/suggestions are greatly appreciated! Thanks!

Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://penguin.wplug.org/pipermail/wplug/attachments/20030317/8d80f2bc/attachment-0001.html


More information about the wplug mailing list