[lug] vi wildcards becomes mod_perl/apache/asp
tkil at scrye.com
Fri May 25 17:13:05 MDT 2001
>>>>> "John" == John Starkey <jstarkey at advancecreations.com> writes:
John> I had to delete several other tags. I was just using the above
John> as an example. I'm assuming your code /? would delete the
John> closing tags also. (will check it out, thanks).
yes; /? means "zero or one occurrance of a slash". since we also
allow for zero characters after the tag name, the regex
will match <span class="foo"> and </span>.
if you are removing multiple tags, you can do them all at once (unless
they interfere in weird ways):
perl -i .bak -lpwe 's:</?(span|div|br)[^>]*>' file1 file2 ...
i've actually done this sort of thing to remove excess <font> tags and
you can do other transformations if you like, too; you're not limited
just to a single s/// operation. i could have done that as:
perl -i .bak -lpwe 's:</?span[^>]*>; s:</?div[^>]*>; s:</?br[^>]*>' \
file1 file2 ...
some wysiwig html editors tend to leave zero-content turds around,
like <b></b> ... it's easy enough to filter those out too. more
complex is handling cases where the pair of elements, or even a single
tag, is split over two lines. i often use this style for long URLs:
the bizarre line breaking is because the presence or absence of
whitespace *is* significant in the content of some elements,
especially for purposes of underlining links or splitting lines.
whitespace within elements themselves, on the other hand, is
guaranteed to not matter.
John> Cool. Thanks. LWP was a problem. 01mailrc.txt was what it was
John> trying to get from Tokyo. Looking at it, it's a bunch of email
John> addies and aliases. I think I'll do it by hand :}
hm. if you install LWP by hand (which can be lots of fun; the
Bundle::LWP might help there, but i don't know if i've ever done that
one by hand), set your proxy, and configure CPAN to use an HTTP
mirror, you might have some luck.
More information about the LUG