Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SimpleS3Client::copy #1592

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

nmuntyanov
Copy link
Contributor

SimpleS3Client::copy method added

Method resolve which api it should use by itself (copyObject as atomic or copy by parts)

@nmuntyanov
Copy link
Contributor Author

@nmuntyanov nmuntyanov marked this pull request as ready for review October 25, 2023 07:19
$error = null;
foreach ($responses as $idx => $response) {
try {
/** @var CopyPartResult $copyPartResult */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is totally useless. The getter already define its return type properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the static analysis or the IDE does not properly detect the type of $response based on assignments in the array, the acceptable thing would be added @var array<int, UploadPartCopyOutput> $responses on the initialization of the empty array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the same situation,
exception is handled in the same place

src/Integration/Aws/SimpleS3/src/SimpleS3Client.php Outdated Show resolved Hide resolved
return;
}

/** @var string $uploadId */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is useless. getUploadId already define its proper return type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello.
How can I handle this errors
https://github.com/async-aws/aws/actions/runs/6642786472/job/18048337300?pr=1592
without doc blocs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if the AWS API can indeed return string|null for the upload id, you must actually handle the case of null properly instead of pretending that the type inference is wrong and forcing the type checkers to consider the variable as having the type string.
If the AWS API actually returns null, the code would still be broken at runtime and the type override added there only prevents the type checker for detecting the broken code. It does not make it less broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand from AWS DOC (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html)
If the action is successful, the service sends back an HTTP 200 response.
Upload Id will be always string, otherwise exception would be thrown

underlying client handle case with uploadId == null and throws an exception.
I don't see why this code should be broken, exception will be thown any way

the solution was taken from here
https://github.com/async-aws/aws/blob/master/src/Integration/Aws/SimpleS3/src/SimpleS3Client.php#L104

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stof any updates on this?

@stof stof changed the title Issue 1589/s3 simple client Add SimpleS3Client::copy Oct 25, 2023
@stof stof mentioned this pull request Oct 25, 2023
foreach ($responses as $idx => $response) {
try {
$copyPartResult = $response->getCopyPartResult();
$parts[] = new CompletedPart(['ETag' => $copyPartResult->getEtag(), 'PartNumber' => $idx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing about the PossiblyNullReference error reported by Psalm here is that https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html#API_UploadPartCopy_ResponseElements documents CopyPartResult as required in the output. So it is weird that the operation was generated with a nullable type. Maybe the AWS SDK has metadata that does not match the documentation there.

Copy link
Contributor Author

@nmuntyanov nmuntyanov Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reporting this to AWS as it looks like a mismatch between the human-readable documentation and the machine-readable data used to generate the SDKs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@nmuntyanov nmuntyanov Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should I do?
Add additional checking? Or will wait for AWS fix?
We are waiting for this functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stof any updates on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stof
I wanted to bring to your attention an ongoing issue regarding AWS that was posted approximately 4 months ago. It seems that progress on resolving this matter has been slower than anticipated.

Considering the delay in fixing the manifests, I'd like to propose a potential solution. Would it be feasible to remove the annotations related to this issue from the code and handle the error suppression in the Psalm baseline instead? I've come across a few similar issues in the baseline that could potentially benefit from this approach.

I believe this adjustment could provide a temporary workaround while we await a resolution from the AWS side. However, I'm open to discussing alternative solutions or any concerns you may have regarding this proposal.

@nmuntyanov nmuntyanov requested a review from stof January 19, 2024 07:50
@melya
Copy link
Contributor

melya commented Feb 14, 2024

Nice feature to have 👍

@nmuntyanov
Copy link
Contributor Author

ping

* CacheControl?: string,
* Metadata?: array<string, string>,
* PartSize?: positive-int,
* Concurrency?: positive-int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use int, this syntax is not a standard, therefore not understood by all IDE.

$sourceHead = $this->headObject(['Bucket' => $srcBucket, 'Key' => $srcKey]);
$contentLength = (int) $sourceHead->getContentLength();
$options['ContentType'] = $sourceHead->getContentType();
$concurrency = (int) ($options['Concurrency'] ?? 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to cast, as it's an int by contract. Let people deal with PHP runtime errors if they misuse the method.

@@ -32,6 +32,9 @@ $resource = \fopen('/path/to/cat/image.jpg', 'r');
$s3->upload('my-image-bucket', 'photos/cat_2.jpg', $resource);
$s3->upload('my-image-bucket', 'photos/cat_2.txt', 'I like this cat');

// Copy objects between buckets
$s3->copy('source-bucket', 'source-key', 'destination-bucket', 'destination-key');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options parameters deserve a small documentation IMHO

* Metadata?: array<string, string>,
* PartSize?: positive-int,
* Concurrency?: positive-int,
* mupThreshold?: positive-int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency

Suggested change
* mupThreshold?: positive-int,
* MupThreshold?: positive-int,

unset($options['PartSize']);

// If file is less than multipart upload threshold, use normal atomic copy
if ($contentLength < $mupThreshold) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a parameter for this?
Is it an issue to split the file even if it is smaller than the default 2GB?

* files smaller than 64 * 10 000 = 640GB. If you are coping larger files,
* please set PartSize to a higher number, like 128, 256 or 512. (Max 4096).
*/
$partSize = ($options['PartSize'] ?? 64) * $megabyte;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PartSize is not defined, I'd rather default to max(64 * $megabyte , 2 ** ceil(log($contentLength / 10000, 2)));

Comment on lines +136 to +140
} catch (\Throwable $e) {
$error = $e;

break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could move the try/catch outside the loop for more readability, and removing the error variable.

try {
    foreach ($responses as $idx => $response) {
        $parts[] = new CompletedPart(['ETag' => $response->getCopyPartResult()->getEtag(), 'PartNumber' => $idx]);
} catch (\Throwable $e) {
   foreach ($responses as $response) {
       try {
           $response->cancel();
       } catch (\Throwable $e) {
           continue;
       }
    }
    $this->abortMultipartUpload(AbortMultipartUploadRequest::create(['Bucket' => $destBucket, 'Key' => $destKey, 'UploadId' => $uploadId]));
}

--$parallelChunks;
}
$error = null;
foreach ($responses as $idx => $response) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not efficient. We define 10 concurents upload, but in fact, we are uploading 10, waiting for the 10 to finish, and uploading 10 more, ...
That means:

  • we are waiting for the 10th to finish before starting the 9 previous.
  • if 1 upload takes time to finish, we don't leverage the 9 others.

A Beter implementation would be to use a pool (or buffer), to fill the pool with 10 responses, and once 1 response is over, add a new one before checking the other responses.

Better: leverage async processing, by not waiting for the end of a response, but checking if the response is over, if not check the next one.

while not all part uploaded
  while pool not full
    => start new upload
  while pool full
    foreach response in pull
      if response not finished
        continue
      else
        => process response
        => remove response from pool

foreach response in pull
  => wait for response to finish
  => process response
  => remove response from pool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants