[lug] grep question
chip at pupman.com
Mon Jun 11 07:21:38 MDT 2007
Yes, thanks for the explanation!
On Mon, 11 Jun 2007, Jeffrey Haemer wrote:
> Okay, now I have time. Here's a little more background, in five, easy
> (1) In Unix, collation (the order of characters) and expressions built on
> collation order ("[A-Z]") used ASCII collating order.
> A few things made people re-think that assumption.
> The obvious thing was character sets with more than 128 characters. Only a
> few languages can be written without funny letters. Of modern languages, I
> think the list is English, Indonesian, Hawaiian, and Swahili. If you do an
> ls(1), where should files that start with Thai characters sort, and what
> order should they come in? What should sort(1) do with a list of Danish
> first names? The Germans and Japanese finally got enough money that Unix
> vendors cared.
> Different, but in the same category, was EBCDIC. If you wanted to make a
> Unix work-alike -- say, grep(1) -- for an old, IBM mainframe, how should it
> behave? IBM had always had enough money, but finally started caring about
> Different, but in a different category, was the desktop market. MS-DOS had
> case-insensitive filenames, and everyone's marketing department thought that
> they could finally sell Unix to some people who'd gotten used to Windows.
> (2) To address these, POSIX invented a mechanism to specify a collating
> order that's separate from the character-set order. Used to be that if you
> wanted to sort backwards, you'd say "sort -r". Today, you can create a new
> collating sequence, install it, tell the system to use that order, and then
> call "sort" without a flag. See how much better that solution is? Me
> neither. And when's the last time anyone asked us, anyway.
> (3) This mechanism was one of several innovations that came to Unix around
> the same time, all for similar reasons. For example, your keyboard has a
> dollar sign; some keyboards have pound signs or Euro symbols; some even have
> more than one. Some places, they write ten thousand as "10000" , some as
> "10,000" , some as "10.000" some as "1,0000" . Don't you want to be able to
> tell a system how to print prices in Saudi Riyals or Kuwaiti Dinars? Yeah,
> me neither. People who make really a lot of money selling computers all do.
> (4) On systems that approximate POSIX-conformance, these behaviors are
> governed by environment variables called things like LC_MONETARY and LC_TIME
> and LC_COLLATE. There is, however, one ring that rules them all. Okay,
> two rings: LANG and LC_ALL. They differ in subtle but boring ways. Use
> LANG: it's fewer characters to type. If you try "echo $LANG" you'll see
> what rules someone has told your system you want.
> (5) To provide normal, predictable, sane behavior -- or, as it's known in
> marketing circles, "traditional Unix behavior" -- say LANG=C. You can say
> other stuff that works, too, like LANG=POSIX or LANG=XOPEN or even (I'm
> pretty sure -- all of this is from memory) unset LANG.
> The first of these, LANG=C, is the fewest characters to type.
> This help?
> On 6/11/07, karl horlen <horlenkarl at yahoo.com> wrote:
> > --- Jeffrey Haemer <jeffrey.haemer at gmail.com> wrote:
> > > export LANG=C
> > >
> > > will cure this problem.
> > >
> > > (If you want a long explanation, let me know and
> > > I'll write one tomorrow.
> > > Right now, I'm, uh, otherwise occupied.)
> > i'm not having the problem, but i'd be curious to hear
> > your explanation...
> > ____________________________________________________________________________________
> > The fish are biting.
> > Get more visitors on your site using Yahoo! Search Marketing.
> > http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php
> > _______________________________________________
> > Web Page: http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> 720-837-8908 [cell]
More information about the LUG