Overview

What I learned from my first Go program

2 Comments

The task

Some time ago I wanted to get a feeling for Erlang and I also had a need for a web crawler. So I decided to catch two birds with one stone and created fufzig. In retrospect the crawler was a good subject since it covered different topics like file and network IO, regexp, command line arguments, parallel execution, long running programs,…

So these days Go is a hot new language together with Rust and Swift. So what about porting the crawler from Erlang to Go? The result of the porting can be found at github. This blog article provides a summary of my experience.

Initial contact

Installing go on a linux computer and a mac was without problems. The tutorial and package documentation is also good. Getting a hello world to run is easy and fast.

Packages, imports, GOPATH and dependencies

One (or two) words about larger modularization: Go packages provide a way to bundle multiple source files. You declare your belonging to a certain package with the package keyword and import other packages with the import keyword:

package groschen
 
import (
  "net/url"
  "os"
  "path"
  "strings"
)

So far so good. Where do the other packages come from? They are searched on your local hard disk in the directory/directories mentioned in the GOPATH environment variable. In addition to two other sub-directories (bin, pkg) each GOPATH element has a src subdir. Under this src subdir is a tree corresponding to the package names and in each subdir the source files. This may look like:

start_dir
└── src
    └── foo
        ├── bar
        │   └── my_cool_thing.go
        └── some_other_thing.go

So if you want to import a package called “foo/bar” and have GOPATH set to “/home/user/go:/usr/local/go” it will search the files in “/home/user/go/src/foo/bar” and “/usr/local/go/src/foo/bar”.

Packages from other developers you want to use can be fetched with “go get <path or url>”. This command will install the package in the first part of GOPATH. This is the reason why a two part GOPATH is recommended: GOPATH=$HOME/go/3rd-party:$HOME/go/own

I have so far no experience with real dependencies (you use lib A which needs lib B in version C) and can not comment on it. But from what I read Go has no versioned dependencies. Maybe this list of package managers will be helpful.
Noteworthy: Getting the newest version of all dependencies (including test dependencies) of a subtree (including sub-packages)
go get -u -t ./...

Formatting a number

As a first small task to get productive I chose the formatting of an int into a string with a separator char after each 3 digits. So format(12345) should yield “12,345”. I found out that:

  • the string type in Go is an array/slice of bytes (usually UTF-8 strings but may also be just binary data)
  • the name “rune” is another name for a Unicode code point and is a 32 bit integer
  • to reverse a string you have to convert the byte array/slice first to an array/slice of runes and then back to the byte array/slice
  • a function is public and can be used in other packages if the name starts with an uppercase character (details in the spec)
  • testing in Go is easy
  • the tests in the standard lib often follow a pattern where they declare a list of input values and expected outcomes and then these test cases are executed in a loop. Such a pattern has an increased risk of side effects between single loop executions if there is some shared state.
func reverse(s string) string {
  runes := []rune(s)
  for i, j := 0, len(runes)-1; i < len(runes)/2; i, j = i+1, j-1 {
    runes[i], runes[j] = runes[j], runes[i]
  }
  return string(runes)
}
 
func FormatIntWithThousandSeparator(v int, sep string) string {
  tmp := reverse(strconv.Itoa(v))
  result := ""
  for len(tmp) > 3 {
    result += tmp[0:3] + sep
    tmp = tmp[3:]
  }
  result += tmp
  return reverse(result)
}

In retrospect this implementation can be improved by using bytes.Buffer which is similar to the StringBuilder Java class and avoid creating and garbage collecting too many objects.

Collections

After this small task of formatting a number I started adding real crawler functionality. The biggest point I came across here is the topic of collections. Coming from a Java/Python background I’m used to a rich set of types, easy conversions between the types and methods to work with multiple data items. The building primitives of Go are slices and maps. Slices are very much like the ArrayList from Java: they have a size (number of elements in the container) and a capacity (allocated size of the container). Maps are also a first level container not much surprise here.

What is not so easy: creating your own data container in a type safe way. Back in the old days we would use the preprocessor to generate code for each value type. With C++ templates and Java generics it is possible to create such type safe containers easily without casting. So what about creating a method which looks at a map and returns the value which occurs most often? In Java the signature would look like this:

<K, V> V getMostOccurringValue(Map<K, V> map) {
  return...;
}

In Go the implementation may look like this:

func MostOccuringValue(source map[interface{}]interface{}) interface{} {
  counting := make(map[interface{}]int)
  var mostOccuringValue interface{}
  mostOccuringCount := -1
 
  for _, value := range source {
    if _, ok := counting[value]; ok {
      counting[value]++
    } else {
      counting[value] = 1
    }
 
    if counting[value] > mostOccuringCount {
      mostOccuringCount = counting[value]
      mostOccuringValue = value
    }
  }
  return mostOccuringValue
}

However how do you call it? I failed to get the code below to compile:

func TestMostOccuringValueWithPrimitiveTypes(t *testing.T) {
  value := map[string]int{
    "foo":    12,
    "bar":    -1,
    "foobar": 12,
  }
 
  // error: cannot use value (type map[string]int) as type 
  //        map[interface {}]interface {} in argument to MostOccuringValue
  if MostOccuringValue(value) != 12 { t.Fail() }
 
  // error: cannot convert value (type map[string]int) to type 
  //        map[interface {}]interface {}
  if MostOccuringValue(map[interface{}]interface{}(value)) != 12 { t.Fail() }
 
  // error: invalid type assertion: value.(map[]) 
  //        (non-interface type map[string]int on left)
  if MostOccuringValue(value.(map[interface{}]interface{})) != 12 { t.Fail() }
}

So does using interfaces change the situation?

func TestMostOccuringValueWithInterfaces(t *testing.T) {
  var reader1 io.Reader = strings.NewReader("str1")
  var reader2 io.Reader = strings.NewReader("str2")
  var reader3 io.Reader = strings.NewReader("str3")
 
  var seeker1 io.Seeker = strings.NewReader("str4")
  var seeker2 io.Seeker = strings.NewReader("str5")
 
  value := map[io.Reader]io.Seeker{
    reader1: seeker1,
    reader2: seeker2,
    reader3: seeker1,
  }
 
  // error: cannot use value (type map[io.Reader]io.Seeker) as type 
  //        map[interface {}]interface {} in argument to MostOccuringValue
  if MostOccuringValue(value) != seeker1 { t.Fail() }
 
  // error: cannot convert value (type map[io.Reader]io.Seeker) to 
  //        type map[interface {}]interface {}
  if MostOccuringValue(map[interface{}]interface{}(value)) != seeker1 { t.Fail() }
 
  // error: invalid type assertion: value.(map[]) 
  //        (non-interface type map[io.Reader]io.Seeker on left)
  if MostOccuringValue(value.(map[interface{}]interface{})) != seeker1 { t.Fail() }
}

Nope no change. No idea how to make this work. If it is possible at all. For the crawler I chose instead to specify explicitly the type.

You can write the code for each type or let the computer write the code using the generate code feature of Go 1.4 (which writes generated file) (see it in action). The first one is quite archaic and the second one feels awful like a preprocessor to me. Here the C preprocessor even had the advantage of generating the code in memory to avoid non synchronized versions. This is still a hot debate in the Go community and in my eyes an unsolved problem.

Error handling

Go provides two major modes to cope with errors: panic and multiple return values. Both the runtime system and the custom code can cause a panic. A panic is propagated upwards the call stack like an exception till it is either handled with a call to recover or the program crashes with a long dump of stack traces of all goroutines (unit of lightweight parallel execution). I’m used to chain exceptions to provide more information by the different layers to ease the support of the system during production. In general it is possible to do such exception chaining with the panic feature of Go but the convention is to provide errors explicitly as error return values.

The second way to communicate some problem is the use of multiple return values which is widely used in the standard library. So querying a map if it contains a key is:

if val, ok := dict["foo"]; ok { 
  //do something here
}

If you stat a file

fileinfo, err := os.Stat(path)
if err != nil {
  // error: can not stat the file
}

The pattern is that the method returns the actual value and an additional status/error value. The error values are of type error which is a predefined type wrapping a string. You can however provide your own error value as long as it implements a string representation method.

This dichotomy however forces you to decide on a case-by-case basis how to handle error conditions. In both cases I miss the ability to enrich the error with more information from along the call chain. Say you got an error while trying to execute some SQL in a webapp. Then you want to have the SQL statement, some information what the business logic tried to do and also some information about the HTTP request logged for later analyze. These informations belong to different layers and the chained exceptions from Java provide one way to collect and connect these informations. It is possible that Go already has some way to satisfy this demand. Let us take a look at a large Go application: Docker. Looking at the source code one sees a lot of code like the following (from job.go):

func Commit(d *daemon.Daemon, name string, c *daemon.ContainerCommitConfig) (string, error) {
  container, err := d.Get(name)
  if err != nil {
    return "", err
  }
 
  newConfig, err := BuildFromConfig(d, c.Config, c.Changes)
  if err != nil {
    return "", err
  }
 
  if err := runconfig.Merge(newConfig, container.Config); err != nil {
    return "", err
  }
 
  img, err := d.Commit(container, c.Repo, c.Tag, c.Comment, c.Author, c.Pause, newConfig)
  if err != nil {
    return "", err
  }
  return img.ID, nil
}

So the error messages are not enriched with context information about the current layer but just given unchanged to the caller. The example also shows the verboseness of the error handling: there are four calls and one return statement but with error handling the method body has 16 non-empty lines! I think this pattern is a step backwards from using exceptions which provide very similar semantics but exceptions enable a source code which has a less intrusive error logic.

Other points

One has to pay attention how enums are declared. By default the enums of Go are only a weak abstraction of integer: like the enums from C which are only named integer constants. An example:

...
const (
  Red = iota
  Green
  Blue
)
 
func main() {
  color1 := Red
  color1 = 5
 
  color2 := 6
  color2 = Green
 
  // prints 6 and 1
  fmt.Printf("The colors are %d and %d\n", color1, color2)
}

However it is possible to define typed enums which provide a bit more safety. In the example the variable types are given explicitly and a second enum is added:

...
type Color int
const (
  Red Color = iota
  Green
  Blue
)
 
type Size int
const (
  Small Size = iota
  Medium
  Large
)
 
 
func main() {
  var color1 Color = Red
  color1 = 5 // unexpectedly ok
  color1 = Small // compile error
  color1 = int(Small) // compile error
 
  // it is ok when we ask for it with a cast:
  color1 = Color(5) // ok
  color1 = Color(Small) // ok
 
  // while the literal “5” works the following cases do not and 
  // give the expected compile error:
  color1 = int(5) // compile error
  five := 5
  color1 = five // compile error
 
  var color2 int = 6
  color2 = Green // compile error
  color2 = Small // also a compile error
}

So besides the non-regular handling of numeric literals the enums are pretty type-safe.

For resource cleanup programming languages offer different idioms. There is the goto and label cascade in C, there is RAII from C++, the with statement of Python and Java has even two idioms: the older try/finally and the newer try-with-resources. Go adds another option: the defer keyword which registers a callback which is executed when the function is left either through a normal return or a panic. In practice this works out well. It even encourages the developer to write small functions (which is good) to reduce the time a resource is open. For example if you want to make sure that only one execution thread is in your method you may write the following code:

// no initial value needed
var fileLock sync.Mutex
 
func WriteResponseToFile(basePath string, content []byte, theUrl string) string {
  fileLock.Lock()
  defer fileLock.Unlock()
  ...
}

Go like Python allows you to unpack values from a container (slice, sequence in Python) to provide values for a function which takes varargs:

func f(base string, args ...string) {}
func someOther() {
  args := []string{"a", "b"}
  f("abc", args...)
}

Goroutines are lightweight parallel execution units which are mapped to OS threads.  For the communication between them the language provides channels as first class citizens which are typed FIFO buffers of a certain fixed size. I have used goroutines and channels for the crawler and they work like advertised. An example:

 
package main
 
import "fmt"
 
func main() {
  myChan := make(chan int)
 
  // Declare a function which generates some numbers
  generator := func() {
    for i := 0; i <= 20; i += 2 {
      // Write the number into the channel. This may block.
      myChan <- i
    }
 
    // finished
    close(myChan)
  }
 
  // Let the generator run in parallel
  go generator()
 
  for {
    value,ok := <- myChan
    if !ok {
      // channel got closed
      break
    }
    fmt.Printf("Got %d\n", value)
  }
}

Summary

Judging Go depends on the starting point. If starting from C you get a language which:

  • is less verbose
  • provides an alternative way to resource cleanup and error handling more explicit
  • simplifies implementation because the Garbage Collector relieves some burden from the developer
  • is safer because of index checks at runtime for example
  • has Unicode support
  • has built-in abstractions for parallel execution

The negative side:

  • because of the Garbage Collector it is possible that the execution has extra latency (can be avoided by not using heap allocated objects) which may not allow you to use Go in a soft/hard real time environment

If on the other side you start from Java you get:

  • multiple return values
  • faster startup times
  • if you distribute the platform specific compiled version of your Go program you only have include the executable file itself since the Go runtime and the other Go packages your program uses are part of this file and only system libraries (libc, libpthread) which are available everywhere are required as external dependencies for program execution
  • C integration with cgo may be (I haven’t tested it) easier than JNI

but on the same time also lose some functionality:

  • type system with generics and type safe enums
  • because of the missing generics and the error handling the Go code will be more verbose
  • mature ecosystem (libraries, tools, IDE, vendor support, trainings, …)
  • proven dependency management

I’m looking forward to see which software niche Go will claim for itself. From my first contact I’m not seeing Go used for big applications because:

  • dependency management is weak
  • type system could be more supportive if bigger domain models are used
  • error handling produces a lot of boilerplate code

On the other side I also do not see Go for low level infrastructure / systems programming with soft/hard response time guarantees because of the GC. So no load balancer or embedded systems in Go.

A niche for Go could be software which is parallel in nature (server or computation) with relaxed response time requirements: a chat message server, an email server or a computation platform like Hadoop. Also Go has an advantage in the area of Unikernels because all dependencies are already baked into the one executable and the loading of extra functionality (like class loading in Java) is not supported per se.
But predicting the future of a language is hard in general: just take a look at the history of Java which was first developed for embedded systems, was than used for applets and now found its niche at the server side. So Go go!

Tags

Kommentare

Comment

Your email address will not be published. Required fields are marked *