[lug] [Slightly OT] File Management?

Matt James matuse at yahoo.com
Sun Mar 22 23:09:22 MDT 2009

     So, I've found myself doing a bit of soul searching lately.  The
question at hand...  How do I manage all these files?

Here is an example:  I have a customer that does 3D cad stuff with a number
of subcontractors.  Each one of these "models" as we'll call them has
roughly 5,000 files involved (all very small files FWIW  >500kb.)  Net
result - half a terrabyte of stuff or ~10 million files in a little over 3
million directories, just for the raw project data.  This is starting to get
out of hand!

So here is to you, fellow LUG members, what do you do with large quantites
(terrabytes) of data?  More specifically, I'm looking for some methodology
or policy for how to really, truely, "manage" this amount of data.  And by
manage, I mean, where and when to store it - like when are the files on the
local client, and when are they on the server?  What's a reasonable
retention policy in terms of how long archival data should be stored on the
main server before it's moved off to some sort of archival system?  (and
what do you even use as an archival system these days?) How do you, shal I
say, index all of this data?  These "models" with 5,000 files are all one
big "file" and I sure don't feel like putting meta data in for each one of
the 5,000.  How do you find anything in this kind of environment?  The list
goes on.....

Are there any books out there that address this kind of stuff?  What about
classes at a university?  I'm not looking to earn a Masters in this area, I
just want to know a better way.

And I'm not talking about homogenious data here either.  I'm talking client
data, e-mail data (in the 20Gb range now - and for only 3 people), Faxes,
marketing material, music, video, pictures, and every other file you can

My brain hurts....  Help?


Matt James

