Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header: set entry_size() to 0 for hardlinks #314

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npajkovsky
Copy link

Fix the error: "numeric field was not a number: t 'Regard' when getting cksum for 'a text ...'. The error is caused by wrongly setting the self.next position due to accounting harlinks size.

According to man 5 tar, the size of the hard-links should be set to 0.

size Size of file, as octal number in ASCII. For regular files only,
this indicates the amount of data that follows the header. In
particular, this field was ignored by early tar implementations
when extracting hardlinks. Modern writers should always store a
zero length for hardlink entries.

But since the writer wasn't modern, the entry_size is 64700 which causes miscalculation of the self.next.

[tar-rs/src/archive.rs:372] &entry.header() = UstarHeader {
entry_size: 64700,
size: 64700,
path: "some/path",
link_name: Some(
"some/link",
),
mode: 0o640,
uid: 1058,
gid: 1061,
mtime: 1673424346,
username: Some(
"example",
),
groupname: Some(
"example",
),
device_major: Some(
9,
),
device_minor: Some(
2,
),
cksum: 24700,
cksum_valid: true,
}
[tar-rs/src/archive.rs:373] entry.header().entry_type() = Link

Closes: #313

Fix the error: "numeric field was not a number: t 'Regard' when getting cksum for
'a text ...'. The error is caused by wrongly setting the `self.next`
position due to accounting harlinks size.

According to man 5 tar, the size of the hard-links should be set to 0.

size    Size of file, as octal number in ASCII.  For regular files	only,
        this indicates the	amount of data that follows the	header.	 In
        particular, this field was	ignored	by early tar implementations
        when extracting hardlinks.	 Modern	writers	should always store a
        zero length for hardlink entries.

But since the writer wasn't *modern*, the entry_size is 64700 which
causes miscalculation of the `self.next`.

[tar-rs/src/archive.rs:372] &entry.header() = UstarHeader {
    entry_size: 64700,
    size: 64700,
    path: "some/path",
    link_name: Some(
        "some/link",
    ),
    mode: 0o640,
    uid: 1058,
    gid: 1061,
    mtime: 1673424346,
    username: Some(
        "example",
    ),
    groupname: Some(
        "example",
    ),
    device_major: Some(
        9,
    ),
    device_minor: Some(
        2,
    ),
    cksum: 24700,
    cksum_valid: true,
}
[tar-rs/src/archive.rs:373] entry.header().entry_type() = Link

Closes: alexcrichton#313

Signed-off-by: Nikola Pajkovsky <[email protected]>
@alexcrichton
Copy link
Owner

If the source of this is the man page then I would interpret that as the archive in question being an invalid archive rather than one that should be silently handled to work instead. Does the tar CLI, though, handle this case? Would you be able to find in its source code where it handles this case?

@npajkovsky
Copy link
Author

Both bsdtar and tar handles this tarball correctly.

@npajkovsky
Copy link
Author

libarchive does this

libarchive/libarchive/archive_read_support_format_tar.c

        /*
          * The following may seem odd, but: Technically, tar
          * does not store the file type for a "hard link"
          * entry, only the fact that it is a hard link.  So, I
          * leave the type zero normally.  But, pax interchange
          * format allows hard links to have data, which
          * implies that the underlying entry is a regular
          * file.
          */
         if (archive_entry_size(entry) > 0)
             archive_entry_set_filetype(entry, AE_IFREG);

         /*
          * A tricky point: Traditionally, tar readers have
          * ignored the size field when reading hardlink
          * entries, and some writers put non-zero sizes even
          * though the body is empty.  POSIX blessed this
          * convention in the 1988 standard, but broke with
          * this tradition in 2001 by permitting hardlink
          * entries to store valid bodies in pax interchange
          * format, but not in ustar format.  Since there is no
          * hard and fast way to distinguish pax interchange
          * from earlier archives (the 'x' and 'g' entries are
          * optional, after all), we need a heuristic.
          */
         if (archive_entry_size(entry) == 0) {
             /* If the size is already zero, we're done. */
         }  else if (a->archive.archive_format
             == ARCHIVE_FORMAT_TAR_PAX_INTERCHANGE) {
             /* Definitely pax extended; must obey hardlink size. */
         } else if (a->archive.archive_format == ARCHIVE_FORMAT_TAR
             || a->archive.archive_format == ARCHIVE_FORMAT_TAR_GNUTAR)
         {
             /* Old-style or GNU tar: we must ignore the size. */
             archive_entry_set_size(entry, 0);
             tar->entry_bytes_remaining = 0;
         } else if (archive_read_format_tar_bid(a, 50) > 50) {
             /*
              * We don't know if it's pax: If the bid
              * function sees a valid ustar header
              * immediately following, then let's ignore
              * the hardlink size.
              */
             archive_entry_set_size(entry, 0);
             tar->entry_bytes_remaining = 0;
         }
         /*
          * TODO: There are still two cases I'd like to handle:
          *   = a ustar non-pax archive with a hardlink entry at
          *     end-of-archive.  (Look for block of nulls following?)
          *   = a pax archive that has not seen any pax headers
          *     and has an entry which is a hardlink entry storing
          *     a body containing an uncompressed tar archive.
          * The first is worth addressing; I don't see any reliable
          * way to deal with the second possibility.
          */
         break;

@npajkovsky
Copy link
Author

@alexcrichton
Copy link
Owner

Thanks! I'm not sure if those line up though? Your second link unconditionally sets the size to zero, and the first code you copy/pasted only sets the size to zero for non-ustar archives. The PR, however, unconditionally sets it to zero. Is the copy/pasted code outdated or otherwise no longer in use?

@npajkovsky
Copy link
Author

The copy/pasted was copied directly from the libarchive git repo HEAD. I think the interesting part is

} else if (archive_read_format_tar_bid(a, 50) > 50) {
             /*
              * We don't know if it's pax: If the bid
              * function sees a valid ustar header
              * immediately following, then let's ignore
              * the hardlink size.
              */
             archive_entry_set_size(entry, 0);
             tar->entry_bytes_remaining = 0;
         }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error: "numeric field was not a number: t Regard when getting cksum for ...."
2 participants