class: center, title, no-number, inverse # DotGo 2019 Summary --- # Generating x86 Assembly with Go -- * You can write Assembly in Go -- * Should you? -- * No ??? Generally, the compiler will optimise your code far better than you can, and writing your own Assembly means you don't get any of the safety Go can usually guarantee. -- * Standard library uses a lot of Assembly ??? The standard library uses a lot of write-once assembly code, mainly in the crypto and runtime packages. Some functions have thousands of lines of assmbly. Critical bugs can and do show up there now and then because the code is extremely hard to review. But let's say you need to write some part of your Go application written in Go because you need more performance than the compiler can give you. What are your options? --- ## Solution -- * Assembly with macros (MASM, NASM) ??? Microsoft Macro Assembler, Netwide Assembler. These don't work with Go's variant of assembly. -- * [PeachPy](https://github.com/Maratyszcza/PeachPy) ??? PeachPy is an efficient Assembly code generator which can output Go Assembly. ...but it's Python, and we like writing Go. -- * [avo](https://github.com/mmcloughlin/avo) ??? Avo presents a familiar assembly-like interface that simplifies high-performance Go Assembly development without sacrificing performance. --- layout: false class: no-number .left-column-50[ ```go for r := 0; r < 80; r++ { Commentf("Round %d.", r) q := quarter[r/20] // Load message value. u := GP32() if r < 16 { MOVL(m.Offset(4*r), u) BSWAPL(u) } else { MOVL(W(r-3), u) XORL(W(r-8), u) XORL(W(r-14), u) XORL(W(r-16), u) ROLL(U8(1), u) } MOVL(u, W(r)) // Compute the next state register. t := GP32() MOVL(a, t) ROLL(U8(5), t) ADDL(q.F(b, c, d), t) ADDL(e, t) ADDL(U32(q.K), t) ADDL(u, t) // Update registers. ROLL(Imm(30), b) a, b, c, d, e = t, a, b, c, d } ``` ] .right-column-50[ ```asm ROUND1(AX, BX, CX, DX, BP, 0) ROUND1(BP, AX, BX, CX, DX, 1) ROUND1(DX, BP, AX, BX, CX, 2) ROUND1(CX, DX, BP, AX, BX, 3) ROUND1(BX, CX, DX, BP, AX, 4) ROUND1(AX, BX, CX, DX, BP, 5) ROUND1(BP, AX, BX, CX, DX, 6) ROUND1(DX, BP, AX, BX, CX, 7) ROUND1(CX, DX, BP, AX, BX, 8) ROUND1(BX, CX, DX, BP, AX, 9) ROUND1(AX, BX, CX, DX, BP, 10) ROUND1(BP, AX, BX, CX, DX, 11) ROUND1(DX, BP, AX, BX, CX, 12) ROUND1(CX, DX, BP, AX, BX, 13) ROUND1(BX, CX, DX, BP, AX, 14) ROUND1(AX, BX, CX, DX, BP, 15) ROUND1x(BP, AX, BX, CX, DX, 16) ROUND1x(DX, BP, AX, BX, CX, 17) ROUND1x(CX, DX, BP, AX, BX, 18) ROUND1x(BX, CX, DX, BP, AX, 19) ROUND2(AX, BX, CX, DX, BP, 20) ROUND2(BP, AX, BX, CX, DX, 21) ROUND2(DX, BP, AX, BX, CX, 22) ROUND2(CX, DX, BP, AX, BX, 23) ROUND2(BX, CX, DX, BP, AX, 24) ROUND2(AX, BX, CX, DX, BP, 25) ROUND2(BP, AX, BX, CX, DX, 26) ROUND2(DX, BP, AX, BX, CX, 27) ROUND2(CX, DX, BP, AX, BX, 28) ``` ] ??? Now compare the main loop of the SHA1 example in avo to... Twenty lines of its equivalent in Go's Assembly equivalent. avo's complete example is 47 lines long, where Go's Assembly equivalent is 1500 lines. --- # Better Optimisation -- * Close resource-intensive programs ??? Close resource-intensive programs (\*ahem\* slack \*ahem\*). Your benchmark counts on 100% CPU availability so give it as much as possible. -- * Use statistics ??? Look for *statistically significant* changes. Statistics is your friend. -- * Get multiple samples by running your benchmarks with `-count=n`. ??? This will make the benchmark run multiple consecutive times. Multiple samples = more significant information, because results fluctuate constantly. -- * Use [benchstat](https://godoc.org/golang.org/x/perf/cmd/benchstat) to determine your p-value. -- * Improvements need to be statistically significant. -- * Don't go looking for p-values. ??? Around 5% of the time, natural fluctuations will give you a false positive - if you only choose the numbers you like, you'll always find them eventually. -- * Don't remember what [the other tool]() was called - slow down the execution time of your benchmark so CPU turbo doesn't mess things up. --- # Heagonal Architecture -- * Ports and Adapters -- * Ports = interfaces -- * Adapters = concrete implementations --- * **Domain** -- * Pure business logic with as few implementation details as possible. -- * Often following DDD. -- * Unit testing with maximum coverage. -- * **Application** -- * Translates data between what the domain and framework expect. -- * End-to-end testing. -- * **Framework** -- * All the specific implementations with as few assumptions about the rest as possible - the framework treats the rest of the application as a black box. -- * UI testing. -- * Dependencies only go inwards: -- * Framework can freely import Application and Domain, Application can freely import Domain. -- * Other direction via DI. --- * Can be any shape. ??? A hexagon was chosen by the author because it's easiest to work with on a whiteboard. -- * Stay pragmatic - if something *technically* breaks the rules but works in your use case, just do it. -- * Make it as simple as you can get away with, but not any simpler. --- # Using Go as a scripting language in Linux -- * [Cloudflare blog post](https://blog.cloudflare.com/using-go-as-a-scripting-language-in-linux) -- * [gorun](https://github.com/erning/gorun) instead of `go run` for the correct handling of the exit code. -- * Add a line to `/proc/sys/fs/binfmt_misc` (should be mounded by systemd) to register gorun as the interpreter for all `.go` files. ```sh $ echo ':golang:E::go::/usr/local/bin/gorun:OC' | sudo tee /proc/sys/fs/binfmt_misc/register :golang:E::go::/usr/local/bin/gorun:OC ``` --- # Lightning Talk: Docker -- * [distroless](https://github.com/GoogleContainerTools/distroless) -- * Only contains your application and its runtime dependencies -- * No package managers, shells, etc -- * Secure, small, fast --- # Multi Module Repositories -- * Mainly works painlessly. -- * Version tag root package normally, prepend the lower lying modules' version tags with their path relative to the root (`subpackage/thirdlayer/v1.0.5`). -- * If you migrate a project from using one module to using multiple, you can make a hacky fix by making the new submodules depend on a non-existent version of the root module, then push the existing version as the new version. -- * This works because Go modules always choose the newest version depended on (within major version releases). --- # Tweaking GOCG -- * Only do this after optimising. No need to garbage collect allocations you don't make (aka. optimise out allocations). -- * Only one parameter to adjust: the `GOGC` environment variable. -- * Go heap usage looks like a sawtooth wave. As soon as it hits the double (100% - the default value of GOGC) of the smallest size it could collect to last time, it starts GC. -- * Example: we have 500MB memory that stays in use. When the heap size reaches 1GB, it will collect again. We can assume it will go down to around 500MB again, so this will continue. -- * The issue: -- * If we have a service with a lot of static data (like 20GB), we let the heap grow far too much before collecting, making GC slow *and* using far too much memory (this would crash on a machine with 32GB of RAM). In this situation we'd set GOGC to something like 50, so it would collect at 30GB. -- * Similarly, if we have a small service that barely has any allocations on the heap, the GC may run many times a second, using far too much CPU. If we're using 10MB RAM, we can easily set GOGC to 500 or even 1000. -- * You can even set GOGC to `off`. This is useful for short-lived tasks like compilation. If you have enough RAM, you can speed up the compilation of large Go programs by using `$ GOGC=off go build ./...`. --- class: middle, center <img src="/img/gopher_mic_drop.png" alt="Gopher Mic Drop" height="22%"> # That's it!