[wplug] matching text between two words, regexp, perl.

Eric Cooper ecc at cmu.edu
Wed Sep 27 13:15:24 EDT 2006


On Wed, Sep 27, 2006 at 08:34:27AM -0700, Juan Zuluaga wrote:
> Excuse my very novice question on web scraping, 
> (and yes, I'm waiting for Mastering Regular Exp. to
> arrive!) 
> 
> What should be matching expression, to get all the
> text found between words? (it is a multiline text)

First of all, I usually use
    m/pattern/is
when web-scraping -- the 's' spans multiple lines, and the 'i' makes
it case-insensitive.

As for the pattern, I'd use something like
    /WORD1\s*(.*?)\s*WORD2/
The *? makes it match the shortest sequence, rather than the longest,
which prevents you from capturing too much if there are multiple
occurrences of WORD2.

-- 
Eric Cooper             e c c @ c m u . e d u


More information about the wplug mailing list