We (the rust-analyzer team) have been aware of the slowness in Rowan for a while, but other things always took priority. Beyond allocation, Rowan is structured internally as a doubly-linked list to support mutating trees, but:
1. Mutation isn’t really worth it; the API isn’t user-friendly.
2. In most cases, it’s straight up faster to create a new parse tree and replace the existing one. Cache effects of a linked list vs. an arena!
In fairness, I don’t think we predicted just how large L1/L2 caches would get over the coming years.
dfajgljsldkjagjust now
The performance gain from using a single shared vector for the nodes is pretty crazy. It just goes to show how much allocation overhead can slow things down if you are not careful.
vjerancrnjak5 hours ago
It’s funny how there is continuous reinvention of parsing approaches.
Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.
zahlman1 hour ago
Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.
embedding-shape5 hours ago
Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".
high_na_euv5 hours ago
I'd say because parsing is very specific kind of work heavily dependent on the grammar you're dealing with
mgaunard5 hours ago
There are good parser generators, but potentially not as Rust libraries.
writebetterc5 hours ago
So it went from parsing at 25MiB/s to 115MiB/s. I feel like 115MiB/s is very slow for a Rust program, I wonder what it's up to that makes it so slow now. No diss to the author, good speedup, and it might be good enough for them.
mananaysiempre4 hours ago
115 MiB/s is something like 20 to 30 cycles per byte on a laptop, 50 on a desktop. That’s definitely quite slow as far as a CPU’s capacity to ingest bytes, but unfortunately about as fast as it gets for scalar (machine) code that does meaningful work per byte. There may be another factor of 2 or 3 to be had somewhere, or there may not be. If you want to go meaningfully faster, as in at least at the speed of your disk[1], you need to stop doing work per byte and start vectorizing. For parsers, that is possible but hard.
A quick rule of thumb is that one or two bytes per peak clock cycle per core or so (a bit like an old 8 bit or 16 bit machine!) is the worst case for memory bandwidth when running highly multithreaded workloads that heavily access main RAM outside cache. So there's a lot of gain to be had before memory bandwidth is truly saturated, and even then one can plausibly move to GPU-based compute and speed things up further. (Unified memory+HBM may potentially add a 2x or 3x multiplier to this basic figure, but either way it's in the ballpark.)
high_na_euv5 hours ago
"for Rust program"?
Isnt it more about the grammar than the prog lang?
writebetterc4 hours ago
The grammar matters also, of course. A pure Python program is going to be much slower than the equivalent Rust program, just because CPython is so slow.
I don't know if this does semantic analysis of the program as well.
shevy-java6 hours ago
Anyone using WebAssembly yet? HTML, CSS, JavaScript - all there.
Just about nobody uses WebAssembly. It first appeared almost ten
years ago. This is snail-speed evolution at best.
circuit10just now
This is like saying "HTML, CSS and JavaScript are all widely used, but the webcam capture API is used way less, so obviously it's a failure"
In its current scope, WASM is a way to port existing code or accelerate certain computations, which only some applications need. Most websites don't need it, like how most sites don't need to use webcam capture; that doesn't mean it's not useful for those that do
anonymous9082136 hours ago
People use wasm for things that need wasm. My use case is my cross-platform game engine, because running both natively and in the browser was a priority for me. It is a wonderful tool and it is a truly magical feeling to see my native games running in the browser. But 99% of web developers are developing ordinary websites, so they don't need it. That's not an indictment of wasm.
miki_oomiri5 hours ago
You have the wrong understanding about wasm. It's absolutely not supposed to be replacing HTML, CSS or JS.
And yes wasm is used wildly. On the web for expensive computation (Google earth, figma, autocad, unity games) or server side for portability and sandboxing (Cloudflare workers, fastly, …)
IshKebab3 hours ago
It is definitely meant to replace JS in some applications. It isn't quite there yet for normal web pages but it will be eventually. There are a few front-end web frameworks written in Rust that use WASM.
The whole "it's not meant to replace JS" thing was just to reduce pushback from JS devs.
miki_oomiri1 hour ago
> The whole "it's not meant to replace JS" thing was just to reduce pushback from JS devs.
It was born at the same time as webgl, at the time of Jit optimisation for js engines. As a subset of js first, then as wasm as we know it. It was originally for games and performance on the web.
At no point there was a conversation about "replacing js", but more like, "js can't do these stuff. let's have something else".
1. creating plugins that get executed in the browser to render files like Parquet, PSD, TIFF, SQLite, EPS, ZIP, TGZ, GIS related files and many more, where C libraries are almost always the reference implementations. There are almost a hundred supported file formats, most of which are supported through WASM
2. creating plugins that get executed in the server to generate your own endpoint or middleware while being sure you can't start exfiltrating data (which can be other people's files, and other sensitive stuff)
3. in the workflow engine to enable people to run their own sandboxed scripts without giving those a blank check to go crazy
adzm4 hours ago
I use a wasm xxhash implementation that is 40x faster than the fastest JavaScript version I can find. Drop in replacement. Call overhead is minimal, could be better with stringref if that ever gets available. Also some other audio analysis stuff in wasm I've been using is 400x faster than the JavaScript implementation but admittedly I just went straight to wasm rather than try to optimize the js in that case.
onion2k4 hours ago
I'm writing a point and click adventure game, and for that I've built a dialogue editor that uses a local text-to-speech model to turn speech into audio that runs in WASM (or WebGPU if it's available).
From what I can tell WASM is mostly being used to run big libraries from other languages in web apps. That's not a particularly common thing to need, so it's not commonly used. That doesn't mean it's moving too slowly.
demaga5 hours ago
I saw a few web apps that use Rust crates for physics. I guess they must be using wasm?
taminka5 hours ago
wasm isn't meant to supersede html/css/js (unfortunately) and it's regularly used for high performance applications in the browser, web-based cad software, figma, youtube (i think they use wasm for codec fallback when support is spotty) etc
there is also games, stuff to do with video (ffmpeg built for wasm), ml applications (mlc), in fact it's currently impossible to use wasm w/o js to load the wasm binary
as a result, the web stack is a bit upside down now, w/o the seemingly "low level" and "high performance" parts over the slow bits (javascript)
TkTech3 hours ago
Sure, here's a Rust/WASM procedural skybox generator I threw together the other day, and is much, much faster at 16k renders then Javascript. https://tkte.ch/night-sky/
flohofwoe4 hours ago
WebAssembly is a virtual ISA, not a replacement for HTML and CSS. It was also never meant to kill Javascript (which is actually a pretty nice language if you stick to the 'good parts' via Typescript and linting), but at most as an alternative or complement to JS, and as that WASM works really well.
In fairness, I don’t think we predicted just how large L1/L2 caches would get over the coming years.
Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.
[1] https://www.youtube.com/watch?v=p6X8BGSrR9w
Isnt it more about the grammar than the prog lang?
I don't know if this does semantic analysis of the program as well.
Just about nobody uses WebAssembly. It first appeared almost ten years ago. This is snail-speed evolution at best.
In its current scope, WASM is a way to port existing code or accelerate certain computations, which only some applications need. Most websites don't need it, like how most sites don't need to use webcam capture; that doesn't mean it's not useful for those that do
And yes wasm is used wildly. On the web for expensive computation (Google earth, figma, autocad, unity games) or server side for portability and sandboxing (Cloudflare workers, fastly, …)
The whole "it's not meant to replace JS" thing was just to reduce pushback from JS devs.
It was born at the same time as webgl, at the time of Jit optimisation for js engines. As a subset of js first, then as wasm as we know it. It was originally for games and performance on the web.
At no point there was a conversation about "replacing js", but more like, "js can't do these stuff. let's have something else".
Yes, tons. Obviously not all, but large parts of these are WASM: https://itch.io/games/platform-web
Tools like Figma are only performant because of WASM.
1. creating plugins that get executed in the browser to render files like Parquet, PSD, TIFF, SQLite, EPS, ZIP, TGZ, GIS related files and many more, where C libraries are almost always the reference implementations. There are almost a hundred supported file formats, most of which are supported through WASM
2. creating plugins that get executed in the server to generate your own endpoint or middleware while being sure you can't start exfiltrating data (which can be other people's files, and other sensitive stuff)
3. in the workflow engine to enable people to run their own sandboxed scripts without giving those a blank check to go crazy
From what I can tell WASM is mostly being used to run big libraries from other languages in web apps. That's not a particularly common thing to need, so it's not commonly used. That doesn't mean it's moving too slowly.
there is also games, stuff to do with video (ffmpeg built for wasm), ml applications (mlc), in fact it's currently impossible to use wasm w/o js to load the wasm binary
as a result, the web stack is a bit upside down now, w/o the seemingly "low level" and "high performance" parts over the slow bits (javascript)