[wplug] matching text between two words, regexp, perl.
Eric Cooper
ecc at cmu.edu
Wed Sep 27 13:15:24 EDT 2006
On Wed, Sep 27, 2006 at 08:34:27AM -0700, Juan Zuluaga wrote:
> Excuse my very novice question on web scraping,
> (and yes, I'm waiting for Mastering Regular Exp. to
> arrive!)
>
> What should be matching expression, to get all the
> text found between words? (it is a multiline text)
First of all, I usually use
m/pattern/is
when web-scraping -- the 's' spans multiple lines, and the 'i' makes
it case-insensitive.
As for the pattern, I'd use something like
/WORD1\s*(.*?)\s*WORD2/
The *? makes it match the shortest sequence, rather than the longest,
which prevents you from capturing too much if there are multiple
occurrences of WORD2.
--
Eric Cooper e c c @ c m u . e d u
More information about the wplug
mailing list