Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: additional tasks for artifact size metrics plugin #36

Merged
merged 17 commits into from
Apr 19, 2024

Conversation

0marperez
Copy link
Contributor

@0marperez 0marperez commented Apr 17, 2024

Description of changes:
Adds two new tasks to the plugin:

  • collectDelegatedArtifactSizeMetrics: Gets ASM from S3 after codebuild generates them
  • putArtifactSizeMetricsInCloudWatch: Puts ASM in cloudwatch

Some refactoring

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment on lines 39 to 44
publishing {
repositories {
mavenLocal()
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: you shouldn't have to configure mavenLocal(), it's already done here I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I wasn't able to get local publishing working before. I must've misconfigured something but it works now

*/
internal abstract class CollectDelegatedArtifactSizeMetrics : DefaultTask() {
/**
* The file where the artifact size metrics will be stored, defaults to /build/reports/metrics/artifact-size-metrics.csv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove leading slash in /build because it's actually a relative path. the way it's written makes me think the file will literally be written to /build/...

* This task should typically be run after codebuild gathers metrics and puts them in S3 during CI but can also be used to
* query the metrics bucket if you modify the file key filter.
*/
internal abstract class CollectDelegatedArtifactSizeMetrics : DefaultTask() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what is delegated here? I would clarify in the KDocs or remove it from the name

Comment on lines 36 to 46
val pullRequestNumber = if (project.hasProperty("pullRequest")) {
project.property("pullRequest")
.toString()
.let { releaseProperty ->
releaseProperty.ifEmpty { // "-PpullRequest=" (no value set)
null
}
}
} else {
null
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplification: val pullRequestNumber = project.findProperty("pullRequest")?.toString()?.takeIf { it.isNotEmpty() }

same applies for releaseTag

val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java)

val relevantArtifactSizeMetricsFileKeys = artifactSizeMetricsFileKeys.filter {
it?.startsWith("[TEMP]${pluginConfig.projectRepositoryName}-$identifier-") == true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: isn't the [TEMP] prefix only used for temporary metrics for PRs? What if the users want to fetch metrics using a releaseTag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this task is mainly meant for CI. Querying the bucket would be something we might have to do from time to time and sort of a secondary use case I noticed. To do so just change the prefix filter manually and publish to maven local. The format for a release tag for example would be: "$repo-$releaseTag-release"

}
}

private fun getFiles(keys: List<String?>): List<String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Why should a key be null at this point? We should filter out null keys earlier and have this be a known List<String>

},
) { file ->
files.add(
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the exception message should probably use key instead of file, Metrics file $key ...

fun put() {
val currentTime = Instant.now()
val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java)
val release = project.property("release").toString().let {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming (consistency): releaseTag

value = artifactSize
dimensions = listOf(
Dimension {
name = "Release"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion: instead of "Release" I think "Version" is more clear


cloudWatch.putMetricData {
namespace = "Artifact Size Metrics"
metricData = listOf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why does the same metric get included 3 times with different dimensions? Can it just be declared once with all the dimensions configured?

val currentTime = Instant.now()
val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java)
val release = project.property("release").toString().let {
check(it.isNotEmpty()) { "The release property is empty. Please specify a value." } // "-Prelease=" (no value set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The hint about "-Prelease=" (no value set) would be even better in the error message itself.

@@ -46,6 +46,8 @@ internal abstract class AnalyzeArtifactSizeMetricsTask : DefaultTask() {
hasSignificantChangeFile.convention(project.layout.buildDirectory.file(OUTPUT_PATH + "has-significant-change.txt"))
}

private val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Unnecessary this

Comment on lines 71 to 74
GetObjectRequest {
bucket = "artifact-size-metrics-aws-sdk-kotlin" // TODO: Point to artifact size metrics bucket
key = "artifact-size-metrics.csv" // TODO: Point to artifact size metrics for latest release
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
key = "${pluginConfig.projectRepositoryName}-latest-release.csv"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What component will put this object? How will we track historical release metrics in addition to the latest release metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in a GitHub workflow that I haven't PR'd.
After every release and after calculating metrics:

aws s3 cp artifact-size-metrics.csv s3://${{ secrets.ARTIFACT_METRICS_BUCKET }}/$REPOSITORY-${{ github.event.release.tag_name }}-release.csv
aws s3 cp artifact-size-metrics.csv s3://${{ secrets.ARTIFACT_METRICS_BUCKET }}/$REPOSITORY-latest-release.csv

This stores the historical metrics and creates/overrides the latest metrics file

Comment on lines 84 to 107
private fun getFiles(keys: List<String?>): List<String> {
val files = mutableListOf<String>()

runBlocking {
S3Client.fromEnvironment().use { s3 ->
keys.forEach { k ->
k?.let {
s3.getObject(
GetObjectRequest {
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
key = k
},
) { file ->
files.add(
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"),
)
}
}
}
}
}

return files
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: In practice, how many objects will we be fetching serially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the SDK, which would have the most artifacts it's up to us. We can select how many artifacts to generate metrics for. If we decide we want all the services it would be 764+ (382 services * 2 because of the closures. Plus a few other non service artifacts).

The objects should be pretty light weight (a few kb from what I've seen in the test objects I've created).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the answer is more than 3 or 4 we should be parallelizing these calls.

metrics.forEach { metric ->
val split = metric.split(",").map { it.trim() }
val artifactName = split[0]
val artifactSize = split[1].toDouble()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why toDouble()? Won't these sizes be in whole bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is generated as a Long but the cloudwatch putMetric operation requires a metrics value to be a Double. Unless we have an absurdly large artifact I think it should be fine to convert to Double without losing precision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Maybe leave a comment on this line here about why we're going straight from String to double.

Comment on lines 42 to 53
val metrics = metricsFile
.get()
.asFile
.readLines()
.drop(1) // Ignoring header

runBlocking {
CloudWatchClient.fromEnvironment().use { cloudWatch ->
metrics.forEach { metric ->
val split = metric.split(",").map { it.trim() }
val artifactName = split[0]
val artifactSize = split[1].toDouble()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: In practice, how many metrics will we be sending to CloudWatch serially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this comment, the maximum we will see (for now) is ~764. I think this shouldn't cause any performance issues. I should confirm this though and run some tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutMetricData requests can contain up to 1MB of data and 1000 metrics. I think we should batch this by collecting metrics up to 1000 and then calling putMetricData. Maybe we'll always limit the number of services to a handful but we can still be more efficient in how we call CloudWatch.

Comment on lines 63 to 83
private fun getFiles(keys: List<String>): List<String> = runBlocking {
val files = mutableListOf<Deferred<String>>()

runBlocking {
S3Client.fromEnvironment().use { s3 ->
keys.forEach { k ->
k?.let {
S3Client.fromEnvironment().use { s3 ->
keys.forEach { k ->
files.add(
async {
s3.getObject(
GetObjectRequest {
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
key = k
},
) { file ->
files.add(
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"),
)
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $k is missing a body")
}
}
}
},
)
}
return@runBlocking files.awaitAll()
}

return files
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You don't need a mutable list reference to track the deferred results, you can just use map:

private fun getFiles(keys: List<String>) =
    S3Client.fromEnvironment().use { s3 ->
        keys
            .map { k ->
                async {
                    s3.getObject(
                        GetObjectRequest {
                            bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
                            key = k
                        },
                    ) { it.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $k is missing a body") }
                }
            }.awaitAll()
    }

For clarity you can even extract the getObject call to a helper function:

private suspend fun S3Client.getObjectAsText(key: String) =
    getObject(
        GetObjectRequest {
            bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
            key = k
        },
    ) { it.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body") }

private fun getFiles(keys: List<String>) =
    S3Client.fromEnvironment().use { s3 ->
        keys.map { k ->
            async { s3.getObjectAsText(k) }
        }.awaitAll()
    }

@@ -85,8 +85,44 @@ private fun Project.registerRootProjectArtifactSizeMetricsTask(
}

open class ArtifactSizeMetricsPluginConfig {
/**
* Changes the prefix used to get artifact size metrics in the
* "collectDelegatedArtifactSizeMetrics" task. For developer use only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd say this whole repository is "For developer use only", the comment is kind of unnecessary

var artifactPrefixes: Set<String> = emptySet()

/**
* Same as artifactPrefixes but considers the whole closure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you repeat the KDocs here instead of redirecting to artifactPrefixes

check(it.isNotEmpty()) { "The release property is empty. Please specify a value." } // "-Prelease=" (no value set)
it
val pluginConfig = project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java)
val releaseTag = project.property("release").toString().also {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: .property(...) will throw an exception if the property is not set. I think you want to use findProperty and then handle the potential null / empty value.

@0marperez 0marperez merged commit fb38865 into main Apr 19, 2024
4 of 5 checks passed
@0marperez 0marperez deleted the artifact-size-metrics branch April 19, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants