-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: additional tasks for artifact size metrics plugin #36
Conversation
publishing { | ||
repositories { | ||
mavenLocal() | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: you shouldn't have to configure mavenLocal()
, it's already done here I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I wasn't able to get local publishing working before. I must've misconfigured something but it works now
*/ | ||
internal abstract class CollectDelegatedArtifactSizeMetrics : DefaultTask() { | ||
/** | ||
* The file where the artifact size metrics will be stored, defaults to /build/reports/metrics/artifact-size-metrics.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove leading slash in /build
because it's actually a relative path. the way it's written makes me think the file will literally be written to /build/...
* This task should typically be run after codebuild gathers metrics and puts them in S3 during CI but can also be used to | ||
* query the metrics bucket if you modify the file key filter. | ||
*/ | ||
internal abstract class CollectDelegatedArtifactSizeMetrics : DefaultTask() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what is delegated here? I would clarify in the KDocs or remove it from the name
val pullRequestNumber = if (project.hasProperty("pullRequest")) { | ||
project.property("pullRequest") | ||
.toString() | ||
.let { releaseProperty -> | ||
releaseProperty.ifEmpty { // "-PpullRequest=" (no value set) | ||
null | ||
} | ||
} | ||
} else { | ||
null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplification: val pullRequestNumber = project.findProperty("pullRequest")?.toString()?.takeIf { it.isNotEmpty() }
same applies for releaseTag
val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java) | ||
|
||
val relevantArtifactSizeMetricsFileKeys = artifactSizeMetricsFileKeys.filter { | ||
it?.startsWith("[TEMP]${pluginConfig.projectRepositoryName}-$identifier-") == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: isn't the [TEMP]
prefix only used for temporary metrics for PRs? What if the users want to fetch metrics using a releaseTag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this task is mainly meant for CI. Querying the bucket would be something we might have to do from time to time and sort of a secondary use case I noticed. To do so just change the prefix filter manually and publish to maven local. The format for a release tag for example would be: "$repo-$releaseTag-release"
} | ||
} | ||
|
||
private fun getFiles(keys: List<String?>): List<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Why should a key be null at this point? We should filter out null keys earlier and have this be a known List<String>
}, | ||
) { file -> | ||
files.add( | ||
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the exception message should probably use key
instead of file
, Metrics file $key ...
fun put() { | ||
val currentTime = Instant.now() | ||
val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java) | ||
val release = project.property("release").toString().let { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming (consistency): releaseTag
value = artifactSize | ||
dimensions = listOf( | ||
Dimension { | ||
name = "Release" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming suggestion: instead of "Release" I think "Version" is more clear
|
||
cloudWatch.putMetricData { | ||
namespace = "Artifact Size Metrics" | ||
metricData = listOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why does the same metric get included 3 times with different dimensions? Can it just be declared once with all the dimensions configured?
val currentTime = Instant.now() | ||
val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java) | ||
val release = project.property("release").toString().let { | ||
check(it.isNotEmpty()) { "The release property is empty. Please specify a value." } // "-Prelease=" (no value set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The hint about "-Prelease=" (no value set)
would be even better in the error message itself.
@@ -46,6 +46,8 @@ internal abstract class AnalyzeArtifactSizeMetricsTask : DefaultTask() { | |||
hasSignificantChangeFile.convention(project.layout.buildDirectory.file(OUTPUT_PATH + "has-significant-change.txt")) | |||
} | |||
|
|||
private val pluginConfig = this.project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Unnecessary this
GetObjectRequest { | ||
bucket = "artifact-size-metrics-aws-sdk-kotlin" // TODO: Point to artifact size metrics bucket | ||
key = "artifact-size-metrics.csv" // TODO: Point to artifact size metrics for latest release | ||
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET | ||
key = "${pluginConfig.projectRepositoryName}-latest-release.csv" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: What component will put this object? How will we track historical release metrics in addition to the latest release metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in a GitHub workflow that I haven't PR'd.
After every release and after calculating metrics:
aws s3 cp artifact-size-metrics.csv s3://${{ secrets.ARTIFACT_METRICS_BUCKET }}/$REPOSITORY-${{ github.event.release.tag_name }}-release.csv
aws s3 cp artifact-size-metrics.csv s3://${{ secrets.ARTIFACT_METRICS_BUCKET }}/$REPOSITORY-latest-release.csv
This stores the historical metrics and creates/overrides the latest metrics file
private fun getFiles(keys: List<String?>): List<String> { | ||
val files = mutableListOf<String>() | ||
|
||
runBlocking { | ||
S3Client.fromEnvironment().use { s3 -> | ||
keys.forEach { k -> | ||
k?.let { | ||
s3.getObject( | ||
GetObjectRequest { | ||
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET | ||
key = k | ||
}, | ||
) { file -> | ||
files.add( | ||
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"), | ||
) | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
return files | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: In practice, how many objects will we be fetching serially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the SDK, which would have the most artifacts it's up to us. We can select how many artifacts to generate metrics for. If we decide we want all the services it would be 764+ (382 services * 2 because of the closures. Plus a few other non service artifacts).
The objects should be pretty light weight (a few kb from what I've seen in the test objects I've created).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the answer is more than 3 or 4 we should be parallelizing these calls.
metrics.forEach { metric -> | ||
val split = metric.split(",").map { it.trim() } | ||
val artifactName = split[0] | ||
val artifactSize = split[1].toDouble() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why toDouble()
? Won't these sizes be in whole bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is generated as a Long
but the cloudwatch putMetric
operation requires a metrics value to be a Double
. Unless we have an absurdly large artifact I think it should be fine to convert to Double
without losing precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Maybe leave a comment on this line here about why we're going straight from String
to double
.
val metrics = metricsFile | ||
.get() | ||
.asFile | ||
.readLines() | ||
.drop(1) // Ignoring header | ||
|
||
runBlocking { | ||
CloudWatchClient.fromEnvironment().use { cloudWatch -> | ||
metrics.forEach { metric -> | ||
val split = metric.split(",").map { it.trim() } | ||
val artifactName = split[0] | ||
val artifactSize = split[1].toDouble() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: In practice, how many metrics will we be sending to CloudWatch serially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this comment, the maximum we will see (for now) is ~764. I think this shouldn't cause any performance issues. I should confirm this though and run some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PutMetricData
requests can contain up to 1MB of data and 1000 metrics. I think we should batch this by collecting metrics up to 1000 and then calling putMetricData
. Maybe we'll always limit the number of services to a handful but we can still be more efficient in how we call CloudWatch.
…of AWS operations
private fun getFiles(keys: List<String>): List<String> = runBlocking { | ||
val files = mutableListOf<Deferred<String>>() | ||
|
||
runBlocking { | ||
S3Client.fromEnvironment().use { s3 -> | ||
keys.forEach { k -> | ||
k?.let { | ||
S3Client.fromEnvironment().use { s3 -> | ||
keys.forEach { k -> | ||
files.add( | ||
async { | ||
s3.getObject( | ||
GetObjectRequest { | ||
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET | ||
key = k | ||
}, | ||
) { file -> | ||
files.add( | ||
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body"), | ||
) | ||
file.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $k is missing a body") | ||
} | ||
} | ||
} | ||
}, | ||
) | ||
} | ||
return@runBlocking files.awaitAll() | ||
} | ||
|
||
return files | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You don't need a mutable list reference to track the deferred results, you can just use map
:
private fun getFiles(keys: List<String>) =
S3Client.fromEnvironment().use { s3 ->
keys
.map { k ->
async {
s3.getObject(
GetObjectRequest {
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
key = k
},
) { it.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $k is missing a body") }
}
}.awaitAll()
}
For clarity you can even extract the getObject
call to a helper function:
private suspend fun S3Client.getObjectAsText(key: String) =
getObject(
GetObjectRequest {
bucket = S3_ARTIFACT_SIZE_METRICS_BUCKET
key = k
},
) { it.body?.decodeToString() ?: throw AwsSdkGradleException("Metrics file $file is missing a body") }
private fun getFiles(keys: List<String>) =
S3Client.fromEnvironment().use { s3 ->
keys.map { k ->
async { s3.getObjectAsText(k) }
}.awaitAll()
}
@@ -85,8 +85,44 @@ private fun Project.registerRootProjectArtifactSizeMetricsTask( | |||
} | |||
|
|||
open class ArtifactSizeMetricsPluginConfig { | |||
/** | |||
* Changes the prefix used to get artifact size metrics in the | |||
* "collectDelegatedArtifactSizeMetrics" task. For developer use only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd say this whole repository is "For developer use only", the comment is kind of unnecessary
var artifactPrefixes: Set<String> = emptySet() | ||
|
||
/** | ||
* Same as artifactPrefixes but considers the whole closure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you repeat the KDocs here instead of redirecting to artifactPrefixes
check(it.isNotEmpty()) { "The release property is empty. Please specify a value." } // "-Prelease=" (no value set) | ||
it | ||
val pluginConfig = project.rootProject.extensions.getByType(ArtifactSizeMetricsPluginConfig::class.java) | ||
val releaseTag = project.property("release").toString().also { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: .property(...)
will throw an exception if the property is not set. I think you want to use findProperty
and then handle the potential null / empty value.
Description of changes:
Adds two new tasks to the plugin:
collectDelegatedArtifactSizeMetrics
: Gets ASM from S3 after codebuild generates themputArtifactSizeMetricsInCloudWatch
: Puts ASM in cloudwatchSome refactoring
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.