[lug] recovering a tar file that spans CDs

Sean Reifschneider jafo at tummy.com
Wed Jun 18 22:21:15 MDT 2008

Hash: SHA1

Kenneth D Weinert wrote:
> As a separate point, anyone have a better scheme for creating backups?
> I don't mind saving to CD/DVD, but perhaps one file that spans disks
> isn't the best choice, but I'd prefer to not have to sort out ahead of
> time exactly which files will fit on each disk.

As Bear said, it's a pretty well defined format, I'd start looking in the
stream for something that looks like a 512 byte tar header (I think it
contains the string "ustar", check the format specification for the
specifics or the "file" magic number database for more information).

If you can find the next tar header, you can then use "dd skip=XXX" to skip
up until that header and pipe that out to tar to extract it.

If the stream is compressed without specific support for restarting the
compression, you're probably screwed unless you can recover the bad part.

These days, as far as my own backups, I dump them to discs, one set at my
house, one set at the tummy.com hosting facility.

If you really must write tars to CDs, I wrote a tool years ago called
"pytarsplit" that you can find on ftp.tummy.com:/pub/tummy/pytarsplit/
which will take a tar file from stdin and a size and resulting file name
and split the tar so that it's smaller than that size:

   tar c . | pytarsplit 5000000 /tmp/mytarfile.%05d.tar

That doesn't really allow you to compress the files though, as they won't
consistently compress to the same size, so you're just wasting space if you
compress...  Also, it splits at file boundaries, so if you are writing 3
400MB files they will each get written on different CDs, than one of the
files being split in the middle.

Ideally what you'd want is a "pytargzip" that you could put in the middle of
that pipe that compressed each tar entry as it went by, but because the tar
header is BEFORE the file, and it has the size, you would need to spool it
off to disc, get the resulting compressed size, and then write the header
and the file...  But that would compress each file independently.

Note that "star" has a "multivolume" option that can split archives up as
well, and it looks like it may be able to split a file in the middle, but
also write a header for later extraction.  It doesn't seem to do
file-by-file compression though.

- --
Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the LUG mailing list