Google Cloud Function for Machine Learning

No Comments

In this post I’ll show you how to use a Google Cloud Function to access the machine learning API for natural language processing. Cloud functions are one of the serverless features of the GCP. Please keep in mind that serverless does not mean that your code does not run on some virtual machine. You just don’t have to provision and maintain that machine.

Google Cloud Function Basics

A Cloud Function is implemented in JavaScript and executed within a Node.js runtime environment. Cloud Functions are triggered by certain events:

  • Cloud Pub/Sub messaging
  • Cloud Storage operations
  • HTTP requests

Depending on the event source, your program has to export a JavaScript function with a given signature. For Pub/Sub and Storage events your function may look like this:

exports.entry_point = function(event, callback) {...}

The event parameter carries the data that are relevant to the given event, callback is a function reference that has to be called at the end of your Cloud Function.

When implementing a Cloud Function for handling HTTP requests, you will use a function like this

exports.entry_point = function(request, response) {...}

where request and response represent the HTTP request and response objects.

Please note that your Cloud Function should be stateless. Depending on the load, the cloud environment will spawn multiple instances of the function, so you cannot rely on the global state of your JavaScript.

A minimal Node.js project consists of a single JavaScript file, typically named index.js, and a JSON file package.json that defines metadata like dependencies etc. You can edit and manage these resources with the Google Cloud dashboard or deploy them from your local file system with the gcloud CLI from the cloud SDK. We’ll use the second option.

Machine Learning Use Case

Our machine learning use case we are going to implement will look like this:

Cloud Function ML Use Case

The upload to a Storage bucket nlp-test-in will trigger our Cloud Function that calls the Natural Language API to perform a text analysis of the content of the uploaded file. The results will be written to a JSON file in the Storage bucket nlp-test-out.

First, we add the required dependencies for the Natural Language and Storage APIs to the package.json file:

{
  "name": "gcf-ml-aes",
  "version": "1.0.0",
  "dependencies": {
    "@google-cloud/language": "^1.2.0",
    "@google-cloud/storage": "^1.6.0"
  }
}

Our Cloud Function implementation in index.js may look like this:

const Storage = require('@google-cloud/storage');
const languageApi = require('@google-cloud/language');
 
const OUT_BUCKET_NAME = "nlp-test-out";
 
// Storage API
const storage = new Storage();
const outputBucket = storage.bucket(OUT_BUCKET_NAME);
 
// Language API
const client = new languageApi.LanguageServiceClient();
 
function gcsUri(bucket, file) {
  return `gs://${bucket}/${file}`;
}
 
function outputFilename(inputFilename) {
  return inputFilename.replace(".txt", "-results.json");
}
 
/**
 * Triggered from a message on a Cloud Storage bucket.
 *
 * @param {!Object} event The Cloud Functions event.
 * @param {!Function} The callback function.
 */
exports.analyse_entity_sentiment = function(event, callback) {
  const data = event.data;
  const inputFileUri = gcsUri(data.bucket, data.name);
  const outFilename = outputFilename(data.name);
 
  console.log('Processing text from: ' + inputFileUri);
  const aesRequest = {
    gcsContentUri: inputFileUri,
    type: 'PLAIN_TEXT'
  };
 
  // Call to Language API
  client
    .analyzeEntitySentiment({document: aesRequest})
    .then(results => {
      const outputFile = outputBucket.file(outFilename);
      outputFile.save(JSON.stringify(results));
      console.info('Text analysis results writtten to: ' + gcsUri(OUT_BUCKET_NAME,outFilename));
      callback();
    });
}

There are two interesting things to observe:

  1. The client to the Natural Language API can read its input directly from a Storage bucket by setting the gcsContentUri field in the request document.
  2. The text analysis results are returned as a JSON object that we are saving to file in a separate output bucket.

Deployment

Let’s assume our project looks like this:

$ ls -la
-rw-r--r--    1 tobias  staff     541  6 Apr 11:27 .gcloudignore
-rw-r--r--@   1 tobias  staff      41  6 Apr 11:12 .gitignore
drwxr-xr-x    4 tobias  staff     128  6 Apr 11:14 data
-rwxr-xr-x@   1 tobias  staff     170  6 Apr 10:23 deploy.sh
-rw-r--r--@   1 tobias  staff    1414  6 Apr 11:24 index.js
drwxr-xr-x  277 tobias  staff    8864  6 Apr 11:05 node_modules
-rw-r--r--    1 tobias  staff  118819  6 Apr 11:05 package-lock.json
-rw-r--r--    1 tobias  staff     152  6 Apr 11:05 package.json

The .gcloudignore file defines exclusions for a file that should not be uploaded the GCP including the node_modules and data folder. Given that we can simply deploy our Cloud Function with

gcloud functions deploy aes-1 \
  --entry-point=analyse_entity_sentiment \
  --trigger-resource=nlp-test-in \
  --trigger-event=google.storage.object.finalize

the name of the Cloud Function will be aes-1, the exported JavaScript function analyse_entity_sentiment is called when the event of type google.storage.object.finalize is triggerd on the bucket nlp-test-in. There are several other options, see gcloud functions deploy –help.

After successful deloyment, the new Cloud Function aes-1 shows up in the GCP dashboard:

Cloud Function Dashboard

Test

To test our Cloud Function, we upload a test file to the nlp-test-in bucket:

$ gsutil cp data/louvre.txt gs://nlp-test-in

Looking at the log shows a successful execution:

$ gcloud functions logs read --limit 4
LEVEL  NAME   EXECUTION_ID    TIME_UTC                 LOG
D      aes-1  73379230141287  2018-04-13 13:02:04.535  Function execution started
I      aes-1  73379230141287  2018-04-13 13:02:04.753  Processing text from: gs://nlp-test-in/louvre.txt
I      aes-1  73379230141287  2018-04-13 13:02:06.255  Text analysis results writtten to: gs://nlp-test-out/louvre-results.json
D      aes-1  73379230141287  2018-04-13 13:02:06.354  Function execution took 1820 ms, finished with status: 'ok'

We download the file holding the analysis results to the local file system:

$ gsutil cp gs://nlp-test-out/louvre-results.json .

I wrote a separate blog post that explains the results of the entity sentiment analysis in detail.

There is also the Cloud Functions emulator that let you deploy and run your functions on your local machine before deploying them to the GCP.

Summary

You learned how to implement and deploy a Cloud Function to the GCP. With the text analysis use case I demonstrated how easy it is to use the Natural Language API from your Node.js program.

The full source code can be found at my GitHub repo ttrelle/gcf/ml-aes.

If you are interested in AI and machine learning, have a look at our codecentric.AI channel on YouTube.

Tobias Trelle

Dipl.-Math. Tobias Trelle is a Senior IT Consultant at codecentric AG in Solingen/Germany. He’s into IT business for nearly 20 years and is interested in software architecture and scalability. Tobias gives talks at conferences and meetups and is the author of the German book “MongoDB: Der praktische Einstieg”.

Comment

Your email address will not be published. Required fields are marked *