Building a Website Screenshot API with Puppeteer and Google Cloud Functions

Here’s the source of a Google Cloud function that, using Puppeteer, takes a screenshot of a given website and store the resulting screenshot in a bucket on Google Cloud Storage:

const puppeteer = require('puppeteer');
const { Storage } = require('@google-cloud/storage');

const GOOGLE_CLOUD_PROJECT_ID = "screenshotapi";
const BUCKET_NAME = "screenshot-api-net";

exports.run = async (req, res) => {
  res.setHeader("content-type", "application/json");
  
  try {
    const buffer = await takeScreenshot(req.body);
    
    let screenshotUrl = await uploadToGoogleCloud(buffer, "screenshot.png");
    
    res.status(200).send(JSON.stringify({
      'screenshotUrl': screenshotUrl
    }));
    
  } catch(error) {
    res.status(422).send(JSON.stringify({
      error: error.message,
    }));
  }
};

async function uploadToGoogleCloud(buffer, filename) {
    const storage = new Storage({
        projectId: GOOGLE_CLOUD_PROJECT_ID,
    });

    const bucket = storage.bucket(BUCKET_NAME);

    const file = bucket.file(filename);
    await uploadBuffer(file, buffer, filename);
  
    await file.makePublic();

  	return `https://${BUCKET_NAME}.storage.googleapis.com/${filename}`;
}

async function takeScreenshot(params) {
	const browser = await puppeteer.launch({
		args: ['--no-sandbox']
	});
	const page = await browser.newPage();
	await page.goto(params.url, {waitUntil: 'networkidle2'});

	const buffer = await page.screenshot();

	await page.close();
	await browser.close();
  
  	return buffer;
}

async function uploadBuffer(file, buffer, filename) {
    return new Promise((resolve) => {
        file.save(buffer, { destination: filename }, () => {
            resolve();
        });
    })
}

Usage:

curl -X POST -d '{"url": "https://github.com"}' https://google-cloud-endpoint/my-function

Building a Website Screenshot API →

💡 If I were to run this in production I’d extend the code to first check the presence of an existing screenshot in the bucket or not, and – if the screenshot is not too old – redirect to it.

Today I Learned, the Google Cloud Platform / Terraform Edition

🌍 This week I’ve been goofing around with Google Cloud Platform and Terraform to manage it. I’ve learned quite a few things, that might be of help to others, so here goes …

There’s a difference between IAM policy for service account (google_service_account_iam) and IAM policy for projects (google_project_iam_member).

The former is used to allow other users to impersonate/control a certain Service Account, the second is used to define what a Service Account can access and do.

Should’ve RTFM‘d

~

To apply IAM Policies onto a Service Account the invoker needs the “Project IAM Admin” (roles/resourcemanager.projectIamAdmin) role. If not you’ll get back a 403:

returned error: Error applying IAM policy for project "my-project": Error setting IAM policy for project "my-project":
googleapi: Error 403: The caller does not have permission, forbidden

~

There’s a bug in Terraform which prevents terraform apply from setting IAM Roles when you have a Service Account or User with an uppercase character in their e-mail address (e.g. Firstname.Lastname@domain.tld).

Terraform will automatically convert all e-mail address to all-lowercase variants (e.g. firstname.lastname@domain.tld). Therefore GCP’s Resource Manager won’t be able to apply the policies as the Rresource does not exist. Accounts that inherit access to the project are not affected.

The error you get back is quite obscure:

returned error: Error applying IAM policy for project "my-project": Error setting IAM policy for project "my-project":
googleapi: Error 400: Request contains an invalid argument., badRequest

To see whether you have such a conflicting user, check the output of gcloud projects get-iam-policy PROJECT-ID

~

You can partially apply a Terraform State by using Resource Targets. You need this, for example, when working with secrets which you can store onto Google Cloud Platform. To work with secrets you rely on a keyring+keycode, but you need that keyring/keycode to be available before you can generate encrypted secrets from plaintext strings.

Here’s how I did it, using Resource Targets:

  1. Run Terraform to only target the google_kms_key_ring and google_kms_crypto_key resources:

    terraform apply -target=google_kms_key_ring.my_key_ring -target=google_kms_crypto_key.my_crypto_key
  2. Generate your secrets and store them in the keyring using helper scripts such as gcloud-kms-scripts which I created just for that.
  3. Use the encrypted secrets in your .tf files
  4. Run terraform apply as you’d normally do

💡 If you want to target a module, add a module. to its name, e.g. terraform apply -target=module.my_module

~

Applying IAM Policies to a newly created Service Account won’t always work, as it takes some time before the SA is available for use. There’s a workaround in which you trigger a sleep command using local-exec, yet I’m hoping that this will be solved in a future release of Terraform (perhaps with a delay Terraform Resource?).

resource "null_resource" "before" {
}

resource "null_resource" "delay" {
  provisioner "local-exec" {
    command = "sleep 10"
  }
  triggers = {
    "before" = "${null_resource.before.id}"
  }
}

resource "null_resource" "after" {
  depends_on = ["null_resource.delay"]
}

~

It’s possible to impersonate a Service Account from within your Terraform code. First you connect using your main account, and then generate a short lived token for the SA. From the Terraform docs I got this snippet:

provider "google" {
    scopes = [
    "https://www.googleapis.com/auth/cloud-platform",
    "https://www.googleapis.com/auth/userinfo.email",
  ]
}

data "google_service_account_access_token" "default" {
 provider = "google"
 target_service_account = "impersonated-account@projectB.iam.gserviceaccount.com"
 scopes = ["userinfo-email", "cloud-platform"]
 lifetime = "300s"
}

data "google_client_openid_userinfo" "me" { }

output "source-email" {
  value = "${data.google_client_openid_userinfo.me.email}"
}

provider "google" {
   alias  = "impersonated"
   access_token = "${data.google_service_account_access_token.default.access_token}"
}

data "google_project" "project" {
  provider = "google.impersonated"
  project_id = "target-project"
}

Due to the provider = "google.impersonated" part, the google_project will run as impersonated-account@projectB.iam.gserviceaccount.com

~

That’s it! I hope these might have been of help to you, saving you some lookup work …

Did this help you out? Like what you see?
Thank me with a coffee.

I don't do this for profit but a small one-time donation would surely put a smile on my face. Thanks!

☕️ Buy me a Coffee (€3)

To stay in the loop you can follow @bramus or follow @bramusblog on Twitter.

Secrets in Serverless

Good post on how and where to store your secrets when working in a Serverless / Cloud Environment — something I was wondering about myself a little while ago

Serverless applications and cloud functions often need to communicate with an upstream API or service. Perhaps they require a username and password to connect to a database, an API key to talk to an upstream service, or a certificate to authenticate to an API. This raises questions like: How do I manage secrets in serverless environments? How do I get credentials into my serverless lambda or cloud function? How can I use secrets AWS Lambda or Google Cloud Functions?

This post describes common patterns and approaches for managing secrets in serverless, including the benefits and drawbacks of each approach.

Secrets in Serverless →

🌍 If you’re using Terraform then the google_kms_secret datasource will come in handy.

Google Cloud Platform Terraform Recipes

At work we’ve recently started to use Terraform to manage the cloud infrastructure of our clients. During my initial research on how to set up a Cloud SQL instance using Terraform, I stumbled upon the “Terraform Google Modules” organization (unaffiliated to Google). They provide several Terraform modules in order to easily create:

  • A Kubernetes Engine Cluster
  • A CloudSQL Instance
  • A CloudSQL HA Cluster (MySQL RR)
  • Cloud Functions
  • Repository Triggers

This will surely save me and my colleagues some work 🙂

Terraform Google Modules (on Terraform Registry) →
Terraform Google Modules (on GitHub) →