Mix Rust Code (WebAssembly) with Vue Component #optimization - analyze wasm call graph and shrink the size

in #utopian-io6 years ago

Repository

What Will I Learn?

  • Mix Vue project with Rust code
  • Analyzing WebAssembly code
  • Using another loader to reduce .wasm file size

Requirements

Difficulty

  • Intermediate

Tutorial Contents

banner

In my previous tutorial for mixing Vue with Rust, we able to compile Rust code into WebAssembly code with the bundle size about ~2mb and also able to reduce it about ~580kb as shown in Table 1 with the help of wasm-gc which remove all unneeded exports, imports, functions, etc. However, we can still shrink it further with help of binaryen and enable Link Time Optimization.

Table 1 - size of wasm code in previous tutorial
optionsSizeGzipped
none1962.12 kb256.52 kb
release1958.73 kb256.00 kb
gc581.94 kb256.00 kb
gc + release578.85 kb91.54 kb

In this tutorial, we will try to shrink the size of 3 wasm code that generated from 3 rust code. We will also be going to analyze the wasm code that we generated on each step with the help of webassembly.studio which has a feature to visualize the call graph of our wasm code as shown in the video below [1].

Preparation

Before we begin to compile our rust code, we need to restructure our project to be able to compile each of 3 rust code into wasm code just to make it easier to analyze and compare. The reason why we need 3 rust code is to have a comparison when we use external package/crate. First, we need to restructure our project as shown in Figure 1.

Figure 1 - new project structure and their corresponding file
.
├── src
│   ├── components
│   │   └── Calculator.vue
│   ├── libs
│   │   ├── algebra-matrix2x2
│   │   │   ├── Cargo.toml      ⬅️ `nalgebra` only specified here
│   │   │   └── calculator.rs
│   │   ├── arithmatic
│   │   │   ├── Cargo.toml
│   │   │   └── calculator.rs
│   │   └── empty
│   │       ├── Cargo.toml
│   │       └── calculator.rs
│   ├── App.vue
│   └── main.ts
├── Cargo.toml      ⬅️ Rust workspace config
├── package.json
└── vue.config.js

1. new project structure

[workspace]
members = [
    "src/libs/algebra-matrix2x2",
    "src/libs/arithmatic",
    "src/libs/empty"
]

[profile.release]
lto = false     # ⬅️ default is `false`

2. Content of ./Cargo.toml

[package]
name = "folder name"
version = "0.1.0"
authors = ["your name <[email protected]>"]

[lib]
crate-type = ["cdylib"]
path = "calculator.rs"

# dependencies only writte for "algebra-matrix2x2"
[dependencies]
nalgebra = { version = "0.15", default-features = false, features = [ "alloc" ] }

3. Content of ./libs/**/Cargo.toml


In Figure 1.1, you maybe figure it out that we will structure our rust code project in workspace mode as you may know that it has 4 Cargo.toml which one that resides in the rootspace is for initiating the workspace and also configure the build for profile release as shown in Figure 1.2. The rest are for defining the dependencies for each rust code that reside in the same folder as shown in Figure 1.3. You can define the dependencies which will be compiled as wasm code on each Cargo.toml (excluding workspace) but for this project, we only define it for algebra-matrix2x2 which use crate nalgebra. As specified in nalgebra documentation, we compile it without libstd (by specifying default-features=false) since at the current moment rust libstd for target build wasm32-unknown-unknown still not supported. Also, we enable feature alloc because we will use type provided by nalgebra that manage heap-allocated values.

In the previous tutorial, we use wasm-gc and default profile release for target wasm32-unknown-unknown to reduce the bundle size. However, based on the rustwasm documentation for building Conway's Game of Life [3], we can further reduce the bundle size by:

  • enable lto (right now at the nightly channel for target wasm, by default: optimize for size and abort at panic are enabled, debug option is disabled)
  • use wasm-gc
  • use binaryen
  • use wasm-snip (not use this since our implementation are simples one. Also, it does not always work)

Since we have install wasm-gc, we just need to install binaryen. Binaryen is a compiler and toolchain infrastructure library for WebAssembly, written in C++. Most of the tools are cli app and one of them can be embedded into javascript (binaryen.js). Luckily, there is a webpack loader for binaryen that we can use in this project. To install it, run

yarn add binaryen-loader --dev

That's will install binaryen-loader in devDependencies section. Since the compilation will take huge memory RAM (because compiling ⩾3 code) and nodejs by default limit the memory usage up to 2gb, we need to expand it into more than that (in my case I just set it to 5gb) by changing yarn build command in file package.json.

"build": "node --max_old_space_size=5120 node_modules/@vue/cli-service/bin/vue-cli-service build",

After that, we need to change the implementation as shown in Code Change 1 to manually test the logic to make sure it's runnable.
Code Change 1 - preparing for loading 3 different implementation
@Prop() module!: string

@Watch('module')
async changeModule(mod: string) {
  let loadWasm
  if (mod === 'algebra') {
    loadWasm = await import(/*webpackChunkName: 'calculator.algebra'*/'@/libs/algebra-matrix2x2/calculator.rs')
  } else if (mod === 'arithmatic') {
    loadWasm = await import(/*webpackChunkName: 'calculator.arithmatic'*/'@/libs/arithmatic/calculator.rs')
  } else {
    loadWasm = await import(/*webpackChunkName: 'calculator.empty'*/'@/libs/empty/calculator.rs')
  }
  this.wasm = await loadWasm.default()
}

async mounted() {
  await this.changeModule(this.module)
  await this.changeOperation(this.operation)
}

1. Some part of ./src/components/Calculator.vue

<select class="title" v-model="operation">
    <option value="arithmatic">arithmatic</option>
    <option value="algebra">algebra (matrix 2x2 diagonal operation)</option>
</select>
<input type="range" name="x" v-model.number="x" />
<Calculator class="center" :a="x" :b="y" :operation="selected" :module="operation">
    <select v-model="selected">
    <option value="add">add</option>
    <option value="substract">substract</option>
    <option value="multiply">multiply</option>
    <option v-if="operation === 'algebra'" value="dot">dot</option>
    <option v-if="operation === 'algebra'" value="tensor">tensor</option>
    <option v-if="operation === 'arithmatic'" value="divide">divide</option>
    <option v-if="operation === 'arithmatic'" value="power">power</option>
    <option v-if="operation === 'arithmatic'" value="remainder">remainder</option>
    </select>
</Calculator>

2. Some part of ./src/components/App.vue


In Code Change 1.1, we utilize a webpack feature called code splitting to separate wasm implementation into independent js file. We specify the outputted filename in properties webpackChunkName and make it as an inline comment. We also utilize dynamic imports feature just to make it easier to write (not to make it fast or more responsive). In Code Change 1.2, we added a mechanism to switch between arithmatic implementation and algebra implementation. The result is shown in Figure 2.

demo
Figure 2 - how the application look like after Code Change 1

Rust Code

After we have prepared the new project structure, we can begin to write Rust code that compiled into wasm code. We split it into 3 rust file because we want to know the behavior and how each optimization is going to be like. We also have empty .rs file which doesn't contain any code just to make comparison clear as shown in Code Change 2.

Code Change 2 - write some simple Rust code
// NOTHING

1. empty/calculator.rs

#[no_mangle]
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

#[no_mangle]
pub fn substract(a: i32, b: i32) -> i32 {
    a - b
}

#[no_mangle]
pub fn multiply(a: i32, b: i32) -> i32 {
    a * b
}

#[no_mangle]
pub fn divide(a: i32, b: i32) -> i32 {
    a / b
}

#[no_mangle]
pub fn power(a: i32, b: i32) -> i32 {
    a ^ b
}

#[no_mangle]
pub fn remainder(a: i32, b: i32) -> i32 {
    a % b
}

2. arithmatic/calculator.rs

extern crate nalgebra as na;

use na::{DMatrix};

#[no_mangle]
pub fn add(a: f32, b: f32) -> f32 {
    let matrix_a = DMatrix::from_diagonal_element(2, 2, a);
    let matrix_b = DMatrix::from_diagonal_element(2, 2, b);
    (matrix_a + matrix_b).determinant()
}

#[no_mangle]
pub fn substract(a: f32, b: f32) -> f32 {
    let matrix_a = DMatrix::from_diagonal_element(2, 2, a);
    let matrix_b = DMatrix::from_diagonal_element(2, 2, b);
    (matrix_a - matrix_b).determinant()
}

#[no_mangle]
pub fn multiply(a: f32, b: f32) -> f32 {
    let matrix_a = DMatrix::from_diagonal_element(2, 2, a);
    let matrix_b = DMatrix::from_diagonal_element(2, 2, b);
    (matrix_a * matrix_b).determinant()
}

#[no_mangle]
pub fn dot(a: f32, b: f32) -> f32 {
    let matrix_a = DMatrix::from_diagonal_element(2, 2, a);
    let matrix_b = DMatrix::from_diagonal_element(2, 2, b);
    matrix_a.dot(&matrix_b)
}

#[no_mangle]
pub fn tensor(a: f32, b: f32) -> f32 {
    let matrix_a = DMatrix::from_diagonal_element(2, 2, a);
    let matrix_b = DMatrix::from_diagonal_element(2, 2, b);
    matrix_a.kronecker(&matrix_b).determinant()
}

3. algebra-matrix2x2/calculator.rs


Code Change 2.2 is the arithmatic function implementation which we have written before in our previous tutorial which all accept type i32 as an argument with return type also i32. In Code Change 2.3, we write some implementation of matrix 2x2 operation like dot and tensor product. Notice that all operation is returning a determinant matrix expect dot product. The reason is just to make our function return only single value with type f32. That's why for dot function we do not convert it into determinant matrix because the result of dot product is a real number (which is machine code represented as a floating-point unit), not a matrix (or array).

Default profile.release (Disable_LTO)

In our previous tutorial, we use the default configuration of target build release for wasm32-unknown-unknown with the manifest:

opt-level = 's'             # optimize for shrinking
debug = false
rpath = false               # relative path
lto = false                 # disable link-time optimization
debug-assertions = false
codegen-units = 16
panic = 'abort'             # abort at panic!
incremental = false         # disable incremental compilation
overflow-checks = false     # disable check for overflow bit (since wasm run on JS VM)

then if we compile it (`yarn build`) we will get a result shown in Table 2. For WAsm Size, you can get it after compilation is done in folder `./target/wasm32-unknown-unknown/release` at file with extension `.wasm`.


Table 2 - bundle size comparison with target wasm32-unknown-unknown and default profile.release
ImplementationJS SizeGZippedWAsm Size
algebra1945.44 kb252.25 kb651 KB
arithmatic1895.43 kb245.86 kb635 KB
empty1893.56 kb245.58 kb634 KB

As shown in Table 2, all file (especially JS file) become bloated, even an empty rust code take place about 1.85 MiB. If we take the wasm code of empty and convert it into s-expression/symbolic-expression (.wat), we will get that it has 141,772 LoC (Line of Code) as shown in Figure 3. If we use twiggy, we get that 5,295 items had a shallow size percent less than 0.1 which all of it are garbage instructions. We may also notice that 34.45% shallow bytes are "function names" subsection which are part of name-section with the purpose to attach printable names to definitions in a module, which e.g. can be used by a debugger or when parts of the module are to be rendered in text form. These sections do not contribute to, or otherwise, affect the WebAssembly semantics, and like any custom section, they may be ignored by an implementation. However, they provide useful metadata that implementations can make use of to improve user experience or take compilation hints. Since the file is too big and the LoC is too long, we can't view the call graph in webassembly.studio 😂.

bloated
Figure 3 - empty.wasm with lto=false

Enable Link Time Optimization

Link-Time Optimization (LTO) allows the compiler to take all the libraries and crates into account when optimizing them, and optimize them as a single unit, rather than individually. I.e. things like inlining across crate bounds become possible. This typically increases the performance, but in some circumstances drastically increases compile time[2]. Unlike LLVM bitcode, WebAssembly code was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility (although it does have some features for both of those)[6]. To enable LTO, we need to change the configuration for build release in ./Cargo.toml

[profile.release]
lto = true

When we run `yarn build`, we will get results as shown in Table 3. For WAsm Size, you can get it after compilation is done in folder `./target/wasm32-unknown-unknown/release` at files with extension `.wasm`.


Table 3 - bundle size comparison with LTO enable
ImplementationJS SizeGZippedWAsm Size
algebra60.13 kb10.05 kb21 KB
arithmatic4.77 kb1.38 kb1,5 KB
empty0.76 kb0.44 kb125 B

In Table 3, we see a huge reduction in size by enabling LTO. We see an interesting thing here that empty WAsm size is not 0 byte but almost. If we take a peek at empty, we can see something like in Figure 4.

empty.wasm
Figure 4 - empty.wasm with lto=true

In Figure 4, we see that the compiler generate code to allocate the memory table even if the file calculator.rs is empty. The confusing part is it has function rust_eh_personality. This function is used by the failure mechanisms of the compiler. This is often mapped to specific compiler (e.g GCC) personality function, but crates which do not trigger a panic can be assured that this function is never called. The lang attribute is called eh_personality. The function rust_eh_personality also appeared in arithmatic and algebra. There is an interesting thing in arithmatic as shown in Figure 5.

arithmatic.wasm
Figure 5 - arithmatic.wasm with lto=true

In Figure 5, we see that divide and remainder can call panic code. This makes sense if we consider that anything divided by zero is not a valid operation. In other words, calling divide(x,0) or remainder(x,0) will cause runtime error.

Enable wasm-gc

Although LTO can remove almost all garbage code, some of them are failed to be removed because some function code is in use at compile time. It's impossible to remove them since it's used by the failure mechanisms of the compiler, so it will not compile if they are removed. This is where wasm-gc is used to remove all unneeded exports, imports, functions, etc after the compilation. It has a hardcoded blacklist of exports where, if found, they'll forcibly not be exported from the result (and then they're naturally gc'd unless they're otherwise referenced)[4]. In this section, we will use wasm-gc after compiling our code with LTO enabled for further bundle size reduction and get the result as shown in Table 4. For more info how to install and enable wasm-gc, see my previous tutorial.

Table 4 - bundle size comparison using wasm-gc after compiling with LTO enabled
ImplementationJS SizeGZippedWAsm Size
algebra53.04 kb9.56 kb19 KB
arithmatic4.25 kb1.28 kb1,3 KB
empty0.53 kb0.37 kb55 B

In Table 4, we get bundle reduction about 2~70 bytes. Not really significant but at least it helps remove some function that never been called as shown in Figure 6.

arithmatic s-expression
Figure 6 - s-expression of empty code: before gc'd (left), after gc'd (right)

In Figure 6, we see that $rust_eh_personality function is removed along with its custom type declaration type $t0 (func). We also see that unused global variable is also removed (Line 6). This behavior also is shown in algebra when we generate the call graph as shown in Figure 7.

algebra callgraph
Figure 7 - callgraph of algebra code: before gc'd (left), after gc'd (right)

In Figure 7, we see that not only $rust_eh_personality is removed, but also $memcmp, $memset, and $memmove. This is because all of that function are listed in hardcoded blacklist. However, there is some exception that $memcpy function not being removed because it used by another function.

Using Binaryen

Binaryen is a compiler and toolchain infrastructure library for WebAssembly, written in C++. It goes much further than LLVM's WebAssembly backend does. Binaryen's optimizer has many passes that can improve code very significantly. One specific area of focus is on WebAssembly-specific optimizations (those general-purpose compilers might not do), which you can think of as wasm minification, similar to minification for JavaScript, CSS, etc., all of which are language-specific[6]. In rustwasm book stated that we can get 15-20% savings on code size and often produce runtime speedups at the same time. Luckily, there is a webpack loader for Binaryen called (binaryen-loader) which under the hood use binaryen.js. Since we have installed binaryen-loader in the previous step, we now only need to enable it in vue.config.js as shown in Code Change 3.

Code Change 3 - applying binaryen-loader
rules: [{
  test: /\.rs$/,
  use: [{
    loader: 'wasm-loader'
  }, {
    loader: 'binaryen-loader'
  }, {
    loader: 'rust-native-wasm-loader',
    options: {
      release: process.env.NODE_ENV === 'production',
      gc: process.env.NODE_ENV === 'production'
    }
  }]
}]

1. add (chain) binaryen-loader between wasm-loader and rust-native-wasm-loader

illustration
2. Illustration how chaining the loader works

{
  optimization: {
    level: 2,       // -O2
    shrinkLevel: 1  // -Os
  },
  transformation: {
    passes: [
      "duplicate-function-elimination",
      "inlining-optimizing",
      "remove-unused-module-elements",
      "memory-packing"
    ]
  },
  debug: false
}

3. default options that applied based on binaryen.js docs and this line


In Code Change 3.1, we just add (chain) binaryen-loader between wasm-loader and rust-native-wasm-loader since binaryen-loader will take wasm code the spit out the minified version of it. As you see in Code Change 3.2, rust-native-wasm-loader take the rust code which in text (UTF-8) format then compiled it to wasm code which in binary format via wasm32-unknown-unknown target compiler then remove unwanted code/function with wasm-gc. Under the hood, binaryen-loader use binaryen.js which have default options stated in Code Change 3.3 and also use default passes that defined in Binaryen code base. Passes are a function that does some sort of transformation on wasm binary code. In Table 5, we see that we get further bundle size reduction by chaining into binaryen-loader.

Table 5 - bundle size comparison using Binaryen with default optimization
ImplementationJS SizeGZippedWAsm Size
algebra47.95 kb8.80 kb18 KB
arithmatic3.11 kb1.03 kb953 B
empty0.49 kb0.35 kb38 B

In Table 5, we see that we are able to reduce the JS size of arithmatic code from 4.25 kb to 3.11 kb which is 26.82% reduction. We also get 17 bytes loss in empty code because (table $T0 1 1 anyfunc) (code for dummy table initialization) is being removed by binaryen. However, we only get 9.6% (53.04 kb -> 47.95 kb) size reduction in algebra code. Actually, if we use remove-memory and post-emscripten like in Code Change 4, we can get the result as shown in Table 6.

Code Change 4 - enable passes `remove-memory` and `post-emscripten`
{
  loader: 'binaryen-loader',
  options: {
    transformation: {
      passes: [
        'post-emscripten',
        'remove-memory'
      ]
    }
  }
}

Table 6 - bundle size comparison using Binaryen after using passes `remove-memory` and `post-emscripten`
ImplementationJS SizeGZippedWAsm Size
algebra42.91 kb8.05 kb16 KB
arithmatic2.07 kb0.84 kb613 B
empty0.49 kb0.35 kb38 B

In Table 6, as expected empty code doesn't change at all because nothing else to remove while arithmatic code has some significant reduction about 33.44% (3.11 kb -> 2.07 kb). While algebra code show really significant size reduction as shown in WAsm size that had lost about ~2KB. If we convert the wasm code into s-expression, we can get some interesting discovery as shown in Figure 8 and Figure 9.

remove-memory diff code
Figure 8 - effect of running passes remove-memory before (left) and after (right)

In Figure 8, we can see that some strings that stored on global memory using data section are being removed. According to WebAssembly specs, data sections allow a string of bytes to be written at a given offset at instantiation time and are similar to the .data sections in native executable formats. Since it doesn't use in any function, it's obvious that it needs to be removed.

post-emscripten diff code
Figure 9 - effect of running passes post-emscripten before (left) and after (right)

In Figure 9, we see that emscripten can spot codes that can be simplified into one expression. According to WebAssembly specs, memory is just a large array of bytes that can grow over time. WebAssembly contains instructions like i32.load and i32.store for reading and writing from linear memory. Instead of declaring constant and do i32.add operation, it's better to store it directly with an offset based on that constant value.

Conclusion

In summary, by enable Link Time Optimization, using wasm-gc to remove garbage function, and utilize Binaryen tools with the right passes, we can reduce the bundle size about less than 10 KB / 80 kb as shown in Table 7.

Table 7 - JS bundle size comparison (JS Size) on each configuration
Implementationlto=falselto=trueLTO + wasm-gcLTO + GC + Binaryen(default)LTO + GC + Binaryen(default + post-emscripten + remove-memory)
algebra1945.44 kb60.13 kb53.04 kb47.95 kb42.91 kb
arithmatic1895.43 kb4.77 kb4.25 kb3.11 kb2.07 kb
empty1893.56 kb0.76 kb0.53 kb0.49 kb0.49 kb


Thanks to rustacean people in Rust discord server that mention (and also share my previous tutorial to reddit since it blocked in my country 😂) it can be optimized further. Seems I know the tricks now, if anyone has questions or suggestions, feel free to comment below or mention me on any discord channel you found my username (as long as it's on appropriate channel/category and I'm online 🙂).

Curriculum

References

  1. Sneak Peek at WebAssembly Studio
  2. Clap.rs - Tuning Your Wight Loss vs Performance
  3. Building Conway's Game of Life Tutorial using Rust
  4. wasm-gc issue#4
  5. wasm-intro
  6. binaryen-loader, binaryen.js, binaryen

Proof of Work Done

https://github.com/DrSensor/example-vue-component-rust/commits/master

compile results: https://webassembly.studio/?f=a53wrgwnhme

Sort:  

Thank you for your contribution.

  • Good job, nothing to point to. Continuation of a good work.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for the moderation. Glad to see your comment again 😄

@drsensor You are the real pro! Awesome tutorials for intermediate enthusiasts!

Thank you 😊

Hey @drsensor
Thanks for contributing on Utopian.
Congratulations! Your contribution was Staff Picked to receive a maximum vote for the tutorials category on Utopian for being of significant value to the project and the open source community.

We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.17
TRX 0.16
JST 0.029
BTC 76073.33
ETH 2917.65
USDT 1.00
SBD 2.64