At Bronto we rely on Apache HBase to power several core features including our real-time Segmentation and Workflow Automation products. These features rely on timely access to consistent views of data that are changing at a high velocity.
HBase offers a blend of scalability, performance, and consistency guarantees that are a good fit for solving these problems at scale. However, like many of the current generation NoSQL stores, HBase is still complicated to use for both development and operations teams.
Diagnosing performance issues requires a deep knowledge of HBase’s internal architecture for operations teams. Developers cannot focus exclusively on application logic without also considering rowkey design, column family usage, and developing schema management tools.
Solutions exist to address both of these concerns in isolation. Projects like Kiji and Phoenix provide tools for schema management and indexing to make application development easier. On the Ops side, Cloudera Manager and Hannibal make performance monitoring and cluster administration much easier.
Usually, this separation of operational and developmental concerns would be a good thing, but in current versions of HBase, the separation isn’t always practical. Both Ops and Dev teams have to be familiar with both HBase internals and application specific behavior to be effective.
We experienced this first hand recently in our production environment. We began observing large spikes in queues that handle inbound tracking events. Using existing tools, our operations team ultimately traced the bottleneck to several regions of a table that handles our secondary indexes.
A full explanation of secondary indexing is beyond the scope of this post, but I will provide a brief overview to help understand the problem we were having.
HBase natively supports a single index on a primary “rowkey” sorted lexicographically as an uninterpreted byte array. Lookups by dimensions other than primary keys require rolling your own indexing solution*. Common techniques include storing additional fields in the rowkey and using secondary lookup tables.
At Bronto, we use the lookup table approach, which is both simple and provides a high degree of flexibility. When you wish to index a row by additional features, you write two rows: one using the primary row key and another containing your indexed data. The secondary row is usually stored in a separate table and has the following format:
To query by a given set of fields, you lookup the index row, retrieve the primary key, and read from the main table. It is also possible to scan across an index by name or by fields within the index.
This is the classic “BigTable” style index which scales well and doesn’t require collocated regions provided by HBASE-5229 or the use of coprocessors like Phoenix.
Diagnosis and solution
The slow downs we were seeing across the cluster were a case of “hotspotting” which occurs when a single region server is bombarded by write operations. Normally hotspotting is associated with monotonically increasing keys, like sequential identifiers or timestamps, but it can also occur when keys share common prefixes**.
As a further complication, the rowkey was actually an encoded, but not hashed, representation of the index key.
Because HBase’s lexicographical sort can differ from the natural sort order like in the case of signed integers, we use a third party library that preserves the natural sort order of keys. Once the index was decoded, we were able to identify the part of our application causing the problem.
After spelunking through our application code we quickly discovered the problem. The indexes were all using the same prefix for every entity.
The keys looked something like:
This particular index was in a hot path and led to a heavy write load on a few region servers. It wasn’t immediately obvious which index caused the problem.
It turns out, the fix was quite simple! Instead of writing the shared prefix as the index name, we used a hash of the index name and the entity id. This still allowed us to perform lookups on a specific entity, but provided a much greater dispersion of writes across the cluster.
Ultimately, the above scenario was caused by a software design decision. However, from the point of view of the developer, there was no obvious indication that something was wrong, having indexes share names is reasonable.
From an operational standpoint, it was obvious that there was a problem with hotspotting, but sourcing the cause required knowledge of the services writing to HBase.
I don’t think we should return to the relational database model where an administrator is required to approve every schema or index change. One aspect of systems like HBase that I like is that it empowers, or depending on your view burdens, the developer with the ability to tailor data storage according to the needs of their application.
At the same time, it would be absurd to expect developers to know every operational detail of their datastore.
In an attempt to help bridge this gap, we are currently developing a new HBase client that will allow for both general operational and application specific performance monitoring. The current design captures contextual information about the request including the row, region, and table affected, as well as timing metrics.
This contextual information is then asynchronously processed to extract metrics and report them via logs or custom dashboards. In addition to passive metric collection, circuit breaker style patterns can be implemented.
So, in the above scenario, individual regions or even index prefixes could be monitored for spikes in requests or response latency and the appropriate teams could be notified***.
Exposing row level contextual data is particularly useful for cases where row keys contain customer identifiers. HBase metrics are collected at the region level at the most granular, but a single region may contain several customers data. Being able to isolate latency problems or monitor read/write patterns by customer would be useful for both teams as well.
Such a system has several advantages for both Ops and Dev teams. Ops teams can develop standard metrics based on well-defined interfaces to provide to developers for use within their applications. Similarly, developers can create metrics centered on their domain specific concerns without worrying about HBase.
For both teams, deploying new metrics is as simple as redeploying application code, the previous procedure required a cluster wide restart which could take a few hours to perform.
Of course, the HBase provided metrics will still be invaluable tools, but providing a flexible interface for metric collection will allow for more versatility and customization based on operational needs. In a future post, I will explore the implementation of this new client and metric collection library.
*There are several solutions, some native to HBase like HBASE-5229, and other third party libraries like Phoenix that attempt to add the pieces necessary for secondary indexing. ** Having row keys sorted together is NOT a bad thing. In fact we rely on including customer ids in row keys allows us to efficiently answer questions across a single customers data. In this case though, we were storing a single index shared by all of our customers instead of a single index for each customers data. *** It is worth noting, that both latency and request counts are exposed in HBase metrics but they are limited to a single region, only persist over the life of the region/region server, and for request counts are not exposed as a rate.