From the Engineering Frontlines: Rewriting Nested Table

Nested table is Facet’s main visualization. It consists of many features that provide various ways to explore the hierarchical data returned by Druid (Druid is an open source, distributed analytics data store — more about Druid here). The first version was built during Facet’s early days, back when it was still called by its internal codename. It had several flaws, the biggest one being that it was relying heavily on D3 for DOM generation without offering enough abstraction between the high levels of Facet (powered by AngularJS) and the D3 bits themselves. This had a significant impact on our ability to refactor the code quickly, to implement new features, or to simply keep the code up to our standards. Software that can’t evolve quickly becomes problematic and we don’t like problematic things.

 

The new Nested Table

                          The new Nested Table

 

Then, one day, specs came from the product department asking to add new features to Nested Table (cleaner layout, better interactions, updated UI…) and those specs were awesome. Our first instinct was that it would be cleaner to rewrite the nested table entirely instead of just adding to it, mostly because the new features were hard to squeeze into the existing implementation. Software lives and dies.

Among the requests on that list of features, some would have been easy to implement and wouldn’t have required an entire rewrite. However, other requests aimed at enhancing user experience were tightly coupled to the table’s HTML structure and thus, were more problematic. Adding them without rethinking this structure would have been unnecessarily challenging for the developers and unsatisfying for users.

That’s why we chose to entirely rewrite the Nested Table and in this post, I’ll discuss what challenges we met doing so. I’ll also discuss why bidirectional scrolling is complex, how hierarchical data bullies the DOM, and why it’s challenging to use AngularJS in a production environment where 100 billion events are processed on a daily basis.

Bidirectional scrolling with pinned columns and rows

Everyone has used Microsoft Excel at least once. Or Google Spreadsheets. Or Apple Numbers. Or actually, any spreadsheet software. They all offer an interesting feature: bidirectional scrolling with pinned columns and rows. This allows users to navigate through data without losing context such as what generally lies in header columns and rows. In our case, it’s the first column and the first row.

Our product was already offering bidirectional scrolling with pinned header row, which is quite easy to implement: two containers inside a third one, and their “overflow” CSS property set to the right values do the job in a fairly decent manner (here’s a quick example). The result is pretty slick because all the scrolling is handled by the browser. This solution has minor drawbacks: the header row will never have a vertical scrollbar, but the body container can (under certain circumstances), which could cause visual inconsistencies. Additionally, the header row’s scrollbar can have a horizontal scrollbar which needs to be hidden (can you imagine having an extra scrollbar between the header row and the body?).

These disadvantages were not really a problem since they could easily be circumvented. Our main concern was that this solution couldn’t support a pinned column and a pinned row at the same time. It’s not possible with pure CSS but it was still our goal. Our customers often need to visualize the first column of their data along with a specific metric. I guess that’s the downside of building lightning fast software: our users aren’t limited in how much data they can query and display!

The Rubber Band Effect

Imagine this: you’re pulling two rolling chairs back, each one with a separate rubber band. Then, you release the rubber bands simultaneously with the hope that the two chairs will somehow stop side-by-side at the same distance from you. Well, this is pretty much what programmatic scrolling is like, and we call it the ‘Rubber Band Effect’. It creates all sorts of visual artifacts such as slowness, unsynchronized ending state, etc.

Declaratively speaking though, it’s a clean solution, thanks to AngularJS’ magic :

 

                 <div class=”pinned-column” sync-scroll-y=”yScrollValue”></div>
                 <div class=”pinned-row” sync-scroll-x=”xScrollValue”></div>
                 <div class=”body” sync-scroll-x=”xScrollValue” sync-scroll-y=”yScrollValue”></div>

 

Where “xScrollValue” and “yScrollValue” are two numbers watched in the “sync-scroll-x” and “sync-scroll-y” directives. I won’t go too far down the rabbit hole to describe what exactly hides behind those neat custom HTML attributes, simply because we didn’t pick this as a solution, and because they were quite simple, really: a pinch of “$watch” here, a touch of “element.scrollTop” or “element.scrollLeft” there and voilà!

The Rubber Band Effect however, is unacceptable from a user perspective. Users expect an effortless experience and this would have made the scrolling experience very uncomfortable.

It’s worth noting though that I talked about this very subject with a friend of mine right in the middle of Nested Table’s development. He happened to be working on the exact same challenge at that moment and he solved it in a very creative way. Instead of having one container out of three natively scrolled while the others were programmatically scrolled, he decided to add an extra layer on top of the entire table to handle the overall scrolling, leaving the browser unaware of mouse events. This has the major advantage of softening the Rubber Band Effect (basically every container has the same amount of delay, so it’s impossible to notice if you’re not looking for it), but also comes with an extra cost in maintenance since every event that occurs on individual rows/cells is stopped by the top layer. Mouse events — well, in fact, UIEvents — therefore need to be caught and transmitted to their actual targets (cells, most of the time) by computing their coordinates and inferring which cells they happen on top of.

We however, chose a different approach and kept the native scrolling on the y-axis, and implemented programmatic scrolling on the x-axis. This means that on the y-axis, the browser is in charge of providing a good scrolling experience, while on the x-axis, Nested Table handles the entire scrolling interaction. Nested Table shifts the content of the existing cells which mimics scrolling on the horizontal axis. We think this solution is a good trade-off between maintainability, cost in development, and user experience.

The difficulties of displaying hierarchical data

The name ‘Nested Table’ comes from the fact that it’s been made to display nested  (hierarchical) data. But what is nested data ? Here’s an example of flat data :

- United Kingdom
- France
- Germany
  
And here’s one of nested data :

United Kingdom
-Apples
-Pears

France
-Bananas
-Apples

Germany 
 -Apples 
 -Apples (but the small ones with the sweet taste towards the end)

 

This example could be the answer to the following questions: what are the top three countries where fruits are consumed on a daily basis? Amongst those three countries, what are the top two fruits?

Now that this central concept is defined, let’s imagine how we could display such data. Of course, the nestedness — the number of levels — is arbitrary. Where’s the fun otherwise ? One solution is to adapt the UI components so they behave like nested data and adopt a fractal pattern. This is called recursion.

Recursive directives with AngularJS

A directive usually changes a custom HTML tag (say, “<nested-table-row></nested-table-row>”) into a valid one (<div class=”nested-table-row”></div>), amongst other things. This “rendering” happens pretty early in the application’s lifecycle and is done once for each directive regardless of the number of times it appears. This leads to a problem in our case because of the way AngularJS crawls the DOM to find new directives it can “precompile” at startup. Here’s how it goes :

  1. AngularJS finds a new custom tag that matches one of its known directives
  2. It starts to precompile it and crawls the directive’s inner content
  3. Back to step 1 because the directive contains itself

Boom! Maximum call stack size exceeded. There are ways to circumvent this and some of them are actually pretty elegant. But, they all have a major flaw: performance. That’s why we chose a different solution.

Flattening data once and for all

This solution offers better performance, but it’s also less elegant, I admit it. I personally love recursion, but it’s hard to read and understand, especially when the reader is not the writer. And when writing professional code, well, maintainability has to be the #1 priority. On a side note, I’d say that achieving something with the fewest lines of code is not called elegance but rather obfuscation.

The decision we made was to flatten the data once it came back from the server-side, adding a “level” property to ensure proper visual rendering of the nestedness. Then, writing an adequate AngularJS directive to display data was pretty simple.

The performance element that influenced our decision to use one solution over the other took some prototyping work. This is because this choice is tightly correlated to several aspects of Facet: its nature (a solution built to explore high-dimensional data), the data volume it handles, and how fast it’s supposed to handle it. This solution wouldn’t apply in other cases, such as those that involve less data or even different web frameworks.

Performance in the browser

One thing our customers like the most about our product — and I’m talking about the entire stack here — is its speed. The fact that they’re able to gather insights about their data in seconds on operations that take otherwise several minutes or hours is invaluable. That’s why we decided to focus on speed and maintainability rather than hacking our way to the result.

The tale of the lone CPU

One day while writing the new Nested Table, I noticed something odd. Out of my laptop’s 4 cores, one was consistently 100% busy. If I learned something during my short career, it’s that having a busy CPU on a seemingly idle computer is a little bit like having to swim harder and harder to stay close to the beach; a bad sign.

I investigated various possible causes and finally found the culprit: the little loading animation I had crafted specifically for the new Nested Table. This neat little CSS based animation was rendered even when it was hidden. CSS is often neglected when analyzing global webpages’ performance when in fact, it can heavily impact a page’s performance. Some properties are cheap to animate (the transform style, opacity, etc.) but others aren’t (color, styles that affect layout such as padding/margin, etc.). Additionally, CSS rules are broken down following a specific order (remember CSS stands for Cascading Style Sheets) and selectors that are too vague or too broad can significantly slow down the rendering phase on large web applications.

I ended up removing the neat loader to use the old one which was already doing a great job.

Rendering many elements

While there isn’t a limit, per se, on how many elements a browser can render — none is defined by the standards at least — computers have limits. Limits in addressable memory, limits in processing power, and limits in display size.

Nested Table needs to render numeric values, mostly. At first sight, we could say that one cell contains exactly one div, which couldn’t be further from the truth. In fact, each value needs to be formatted following specific rules so that users can quickly identify significant values, regardless of, say, the number of leading zeros. Each cell therefore contains at least :

  • a sign (+ or -)
  • a prefix ($, in the vast majority of cases)
  • an insignificant part
  • a significant part
  • another insignificant part
  • some padding to respect the global length of values
  • units/suffix (K, M, %, etc.)

 

Significance-based

                           Significance-based value formatting

 

And each of those parts corresponds to a span, which multiplies the total number of elements by 7. When comparing one period to another one, we also increase the number of columns three times (see screenshot below).

Colors

                           Colors help quickly identifying trends

 

Let’s do the math for one hundred rows and five metrics: 100 * 7 * 5 * 3 = 10500 elements. The need to reduce this amount or at least mitigate its consequences appeared rapidly. We created an AngularJS directive called “virtual-scroll” with the following “interface” :

 

<div class=”container” virtual-scroll=”rows:rowsSubset:200″></div>

 

This directive listens for the element it’s attached to in order to populate the rows subset each time the container’s size changes. This way, the row subset always contains a reduced list of items (constrained by the third directive’s parameter, here 200). In order to preserve the scrolling experience, the directive also adds two buffers to the element it’s attached to: one as the first child and the other as the last child. Those two buffers are sized accordingly to the hidden elements count processed by the directive itself. In a nutshell, we have native scrolling (the global height is still the same) and better performances (fewer elements).

Difference

                           Difference between native scrolling and virtual scrolling

Conclusion

Implementing a versatile tool to visualize complex data was a technical challenge, especially because it wasn’t just about hacking our way to the result. At Metamarkets, we always pick performance and sturdiness over ease of implementation. That’s also why Nested Table is heavily tested, a topic we didn’t go through here but would be worth talking about in another blog post.

So far, our users have been delighted with the new Nested Table. That’s what UI programming is about at the end of the day: building things users trust and love to use.

If you’re interested in learning more or want to work on similar problems, drop us a line. We’d love to hear from you.

Filed in Technology