Resumes

Soo … I decided to update my resume today before I forget all of the things I did at Medallia. I really didn’t realize just how many initiatives/projects I’ve thrown energy into; whether they were driven by excited naivety, careful planning/design, or frustration at lack of change. Some of these experiences had quite a bit of history; from a lack of foresight when initially designed,to competing priorities when we recognized the problem, to solving the problem too late. Taking each of these complex experiences and reducing them to on line feels … wrong.

I’ll give an actual example of such a project; a initial product design flaw that took ~2 years to fix and didn’t quite have a happy resolution. I called it “Combined Sentiment”. If this is IP-sensitive, I’m sure somebody will let me know. For context, our text analytics solution does text summarization via topics and sentiment. There was a particular product constraint that led to this fun issue:

  • When we display counts of topics or sentiments, we show # of unique pieces of feedback containing that topic/sentiment, NOT # of unique mentions of that particular topic/sentiment

If we counted # of unique mentions, somebody that writes a novel on how bad the bathroom was at a hotel can sway the counts significantly. We did the actual numeric comparison and certain volatile topics were swayed heavily (some people just love to complain).

So, we ended up showing a dashboard which included counts for a topic and counts broken down by each sentiment.

Our astute reader may have realized that in a single piece of feedback, it’s possible that a particular topic is mentioned both positively and negative. E.g. “The bathroom smelled fantastic, but the sink was far too shallow; just like this contrived complaint”. This means that if 10 pieces of feedback that mention bathrooms, it’s possible that 8 are positive and 7 are negative!

This happened rarely, but the amount of overlap could seem dramatic for less popular topics. You would not believe the amount of confusion this caused. More importantly, it caused users to doubt our system because they couldn’t understand why the topic + sentiment counts added up to more than just the topic count.

To fix this, the obvious thing to do is create a separate sentiment called mixed opinion so the counts become e.g. 5 positive, 3 mixed, 4 negative.

This wasn’t trivial to fix though because of some complexities:

  • our topics are hierarchical, so what happens if you have topic A with children topics B and C? If a comment mentions B negatively and C positively, A would have to count as mixed
  • our topics had role-level visibility, so if you can’t see topic C, then A should really count as negative (in the above scenario)
  • a piece of feedback can have multiple comments and users can filter to only view data from a subset of comments; now, what if B matched comment1 and C matched comment2, and you query comment1 for A?
  • our in-house OLAP engine had some fun limitations at the time (which is worth a whole different blog post)

In the end, we deprioritized this because:

  • the fix for this was quite costly; perf investigations showed a 10% increase in cache sizes (~600 gb for our largest client at the time and we were nearing single JVM limits)
  • the fix was super risky; would have required a full re-write of our queries across ~3 static reports and our entire dynamic reporting framework (hard to keep all the #’s consistent across them all + possible degraded query performance; the latter ended up not being true but this was not easy to verify since we didn’t have a rigorous performance testing environment … different story)
  • an easier fix was to improve product documentation and training for our clients

Finally, one of our early adopters threatened to discard our product if we didn’t “fix the bug”. I ended up spending spending a December re-writing our data set loading logic, querying layers, and doing plenty of automated/manual/performance testing. We released it to them with a feature flag (the “combined sentiment feature”) on January 1st and they were pacified.

It was in the interest of engineering for all clients to enable this flag to simplify maintenance, debuggability, and testability since added an extra layer of logic into every text analytics query in the system. However, half of our clients were already used to the way the counts were shown previously by this point and did not want to switch on “combined sentiment”. This is understandable since each client would have had to retrain thousands of users on how a key report in the system worked. The cost to us though, was that unit, functional, and user testing for product changes that rely on counts from reporting need to go through two rounds of testing, once with and once without “combined sentiment”.

We could of course mark this as technical debt, ensure that no new clients were given this solution, and slowly reduce the set of customers using it. In fact, this was one of the key motivations for why I wrote debt tracking into Curiosity; but that’s a subject for another post.

Lots of lessons learned from this one! There were many things we did right here: iterating with customers as we built the product, arriving at a solution that solved 90% of the user’s needs/problems, staying aware of our customers’ concerns. In this case, the core issue was that our product needed to be easier to understand, the difficulty was in deciding who was responsible for fixing it; whether it was engineering (rewriting the feature) or product support (explaining how it worked). We went with option 2 for as long as we could before switching to option 1. Was that the right choice? Believe me, there was a lot of other tasks competing for priorities with this feature and had things turned out differently; the “fix” could have created inconsistent results and the clients get really upset with us or even if it succeeded, this could have been yet another one-off project introducing technical debt to solve one client’s needs (which I abhor).

In hindsight though, I think this was one of the key tasks that ended up regaining our customer’s trust in our product. It was also a major aid in helping servicing and sales folks regain trust in engineering. Both were necessary to survive in enterprise SaaS.

Some stories really can’t be summarized in a one-liner on a resume. This is why I dislike using resumes to filter candidates; making a decision that can affect a person’s entire career without even listening to their story seems shallow? superficial? disingenuous?


Name:

Comment: