D3 Tooltips with Interrupted Transitions

Posted on 16 January 2015 by Andrei Taraschuk

At Mobile System 7 we extensively use D3 Javascript Library in all of our data visualization components. Whenever a complex data structure is depicted we use tooltips to provide additional context. In this post I show how to build a simple tooltip with fade transitions. Let's get started!

First create a tooltip element and hide it by default (opacity = 0).

 var tooltip = d3.select('body').append('div')
     .attr('class', 'my-tooltip-class')
     .style('opacity', 0);

Next, wire in mouse events to show and hide the tooltip. Note that we are using transition.delay() to prevent the tooltip from showing immediately upon mouse over.

.on('mouseover', function (d) {
    tooltip.style('opacity', 0);
    tooltip.transition().delay(300).duration(500).style('opacity', 0.9);
})

Lastly, in the 'mouseout' event we need to interrupt (remove) any existing transitions and delays in the queue, then hide the tooltip. Keep in mind that unless the transition is interrupted the tooltip will still be displayed after the delay.

.on('mouseout', function () {
        tooltip.interrupt().transition();
        tooltip.style('opacity', 0);
 });

Here's the complete code:

var tooltip = d3.select("body").append("div")
            .attr("class", "my-tooltip-class")
            .style("opacity", 0),

    svg = d3.select('body').append("svg")
            .attr("width", 50)
            .attr("height", 50)
            .append("g").attr("transform", "translate(0,0)");

    svg.append("rect")
        .attr("x", 0).attr("y", 0)
        .attr("width", 100).attr("height", 100).attr("fill", 'red')
        .on("mouseout", function () {
            tooltip.interrupt().transition();
            tooltip.style("opacity", 0);
        })
        .on("mouseover", function () {
            tooltip.style("opacity", 0);

            // Note that we are also using d3.event pageX and pageY properties
            // to position the tooltip

            tooltip.html('Hello world')
                .style("left", (d3.event.pageX) + "px")
                .style("top", (d3.event.pageY - 25) + "px");

            tooltip.transition().delay(300).duration(500).style("opacity", 0.9);
        });

Murderous Twins and the Two Faces of Identity

Posted on 16 December 2014 by Joseph Turner

How do we prove our identity? What makes us unique, and how can we display that uniqueness to others? As animals, our physical bodies, our genetic makeup offers one answer, but it is potentially fallible: what about identical twins? If two identical twins went on a camping trip and only one returned, how could we tell which twin it was?1 And beyond identical twins, biometric identity verification would be awkward to implement uniformly at scale, and it poses privacy issues.

A second possibility is our knowledge, or more broadly, our possessions. This is what we use most often in our daily lives. We bring our social security card in to get a drivers license, then we use that drivers license to open a bank account. We use the credit card we are issued to make a purchase. When we call the bank, we recount our last three purchases to prove we are indeed the person named on the account. Though much more practical, this too is fallible, as evidenced by a never ending rash of social engineering attacks and stolen bank information. To return to our evil twin, perhaps you could quiz her on some details of her supposed life. This would only smoke her out in the case where she hadn't done her research.

How do we catch this devilish Mary Kate (or was it Ashley?). The evil twin returns to "her" life, with a husband and two kids. After a couple weeks, she returns to work. She seems different at first, but that's to be expected after such a traumatic experience. After a while though, it starts to get weird. She quits her book club that she loved so much. She can't remember her inside jokes with her husband. The songs she sings her kids at night are different songs. With every day, her real identity becomes more apparent: she isn't who she says she is.

And this brings us to our third indicator of our identities: our actions. In everything we do we display facets of our accumulated knowledge, experience, and idiosyncrasies. Individually, these activities prove nothing, but combined they define our identity in a way that is very hard to fool. Though she might hold up to our initial scrutiny with enough information, it would be basically impossible for her to act consistently like her sister all day, every day.

The two aspects of online identity

Online, without physical form, we are all identical twins. With no physical differentiation, our identities live solely in the context of our knowledge and our possessions. As a consequence, our online identities have two aspects: identity as resource and identity as actor2.

Identity as resource

By passing an identity challenge, whether it's entering your password, entering a texted authentication code, or copying the numbers from your RSA token, you are granted your identity resource. The resource allows the holder to perform the actions granted to that identity. This process is called authentication.

Historically, much effort has gone into building the fortifications surrounding the identity resource. In the recent past (and for many, still to this day), enterprises kept all resources behind the firewall, requiring physical presence (or the secret handshake of a VPN) to even get the opportunity to try and prove your identity.

The move to the cloud has made limiting access in this way much trickier. Without physically limited access, much work has gone into improving the challenge itself. Two-factor authentication and related technologies are modern-day attempts to make access to the identity resource more selective. Unfortunately, they also make it harder to make legitimate identity claims. Though its easy to gloss over it, this is a serious barrier to usability. Like the firewall that came before, it seems like a trade-off many companies are willing to make. Also like the firewall, it isn't foolproof.

Identity as actor

We need more. We need the online equivalent of the concerned husband, the suspicious coworkers, and the irate book club. We need insight into the second face of identity online, one that has been ignored too long: identity as actor.

As in real life, our actions online create and define our identities, as unique as our fingerprints. Like our evil twin, an attacker who has passed the challenge and acquired the identity resource must act consistently with their assumed identity or risk detection - assuming, that is, that the actions are being observed and the differences are being noticed.

The problem is, of course, that it is really tough to monitor our online identities! We take actions across a range of services, some of which in fact use different identity resources, often using several services at the same time. In general, these services are not controlled by the same party, and our actions on a given service are not visible to other identities or other services. As a result, it can be very difficult to get a clear picture of the actions taken by an identity online.

Ultimately, we need protection mechanisms around both faces of our online identities. We need adaptive authentication to provide safety around the identity resource without breaking workflows, introducing confusion, and crippling usability. We also need identity analytics to notice an attacker's errant behavior and adaptive access control to respond to our suspicions as they grow stronger.

With Interlock we at Mobile System 7 believe that we are providing the first unified platform for all three of the above protections, across services, and across identities. You can be confident that no evil twin is stealing your identity, because we'll be there to notice when they don't show up at book club or forget to laugh at the inside jokes. Metaphorically speaking, of course.

Have questions or comments? Hit me up on Twitter. Interested in changing the face of identity with us? Send me your resume.


1: It turns out that identical twins have different fingerprints, but have you ever had your fingerprints taken? I haven't.

2: Why is that a consequence of our online presence? In the real world, after we have proven our identities, say by showing our drivers license, our faces become a token of our identities. If we open a bank account today, the teller will remember our face tomorrow, obviating the need for a second challenge. Online, we have a similar capability, cookies. However, as we move across devices we don't carry our cookies with us, unlike our faces. As a result, online our proof of identity is required semi-regularly before we can take action.

Testing Java Code With JRuby

Posted on 01 December 2014 by Andrew Semprebon

Here at Mobile System 7, most of our code is Ruby, but we do have a small amount of Java code where Ruby deployment would be inappropriate. However, just because we have to use Java in the code doesn't mean we need to use Java to test the code.

There are a number of good reasons for using Ruby for testing Java code

  1. Ruby is more concise — Less code is better.
  2. Ruby testing frameworks like MiniTest are more readable — using actual english to describe tests ratherThanCamelcaseIdentifiers
  3. We can use some of Ruby's dynamic debugging tools like pry
  4. We have more experience with Ruby and Ruby test frameworks than with Java/junit

As an example, we will develop a Java class that summarizes counts of objects in a collection, returning a Map with unique objects from the collection as keys and the count as values. For example:

ArrayList list = java.util.Arrays.asList(
    new String[] {"Mann", "Jones", "Trenberth", "Mann"});
Counter.count(authors)).get("Mann"); // returns 2

This is basically equivalent to the following Ruby code:

authors = ["Mann", "Jones", "Trenberth", "Mann"]
authors.inject({}) { |result, item| result.merge(item => (result[item] || 0) + 1) }
authors["Mann"]  # returns 2

Setup

At its simplest, you just need to have Ruby installed. The standard Ruby library includes the MiniTest framework, which provides for several different testing styles. Since our other Ruby tests are written with rspec, we use the MiniTest::Spec framework.

Writing the First Test

We start with the Ruby test, which looks like this:

require 'minitest/autorun'
require "java"

$CLASSPATH << "classes"
java_import Java::ComMs7::Counter

describe Counter do

  let(:counter) { Counter.new }

  describe ".count" do
    it "returns empty hash for empty collection" do
      counter.count([]).must_equal({})
    end
  end
end

The first few lines load MiniTest and the JRuby Java libraries, adds the directory where our Java class files are stored to Ruby's load path, and imports our Counter class so we don't have to specify the package each time. One thing to note here is the way Java packages are translated into Ruby modules: the periods are replaced with camel case to form a submodule name in JRuby's Java module.

Then, we specify the class being tested ("describe Counter") as well as the method (".count"), and then a test that verifies that the method returns an empty hash if an empty array is passed in. By starting with aa simple test like this, we can verify that all our setup is working correctly.

Writing the Counter Java class

Running this just results in a class not found error, so lets create the Java class with just enough implementation to get the test to pass:

package com.ms7;

import java.util.Map;
import java.util.HashMap;

public class Counter {

  public Map<Object, Integer> count(Iterable<Object> items) {
    return new HashMap<Object, Integer>();
  }
}

For now, this just returns an empty hash. When we run this, we get:

$ ruby test/counter_test.rb
Run options: --seed 9924

# Running tests:

.

Finished tests in 0.018000s, 55.5556 tests/s, 55.5556 assertions/s.

1 tests, 1 assertions, 0 failures, 0 errors, 0 skips
$

This tells us that the test passed.

Another Test

Lets add a more substantial test:

    it "returns count of 2 when given collection of 2 like items" do
      collection = ["Mann", "Mann"]
      counter.count(collection).must_equal({ "Mann" => 2 })
    end

Here, we create an array of two identical strings, and expect a Map (hash) with a count of 2. As expected, the test fails:

test_0001_returns hash with count(Java::ComMs7::Counter::count::with collection of 2 like items) [test/counter_test.rb:25]:
Expected: {"Mann"=>2}
  Actual: {}

Actually Computing the Count and Things Go Awry

Now we modify the count method to actually count things:

  public Map<Object, Integer> count(Iterable<Object> items) {
    HashMap<Object, Integer> result = new HashMap<Object, Integer>();
    for(Object item: items) {
      int count = (result.containsKey(item)) ? result.get(item) + 1 : 1;
      result.put(item, count);
    }
    return result;
  }

Looks good, right? Unfortunately, when we run the test, again. it still fails:

No visible difference in the Hash#inspect output.
You should look at the implementation of #== on Hash or its members.
{"Mann"=>2}

Well, that's odd. It looks the same but isn't actually equal.

Using Pry

To figure this out, we are going to use pry, a Ruby debugging utility. First, we install it using the Ruby gems package manager:

gem install pry

And then modify our test:

require "pry"
...
    it "returns count of 2 when given collection of 2 like items" do
      collection = ["Mann", "Mann"]
      binding.pry
      counter.count(collection).must_equal({ "Mann" => 2 })
    end

The idea here is that when we get to that "binding.pry" line, we will be dropped into an interactive Ruby session where we can poke at the values of things:

From: /Users/semprebon/Library/JRuby/jruby-1.7.2/lib/ruby/gems/shared/gems/minitest-4.5.0/lib/minitest/unit.rb @ line 1318 Java::ComMs7::Counter::count::with collection of 2 like items#test_0001_returns hash with count:
...
[1] pry(#<Java::ComMs7::Counter::count>)>

Ok, lets poke around:

[1] pry(#<Java::ComMs7::Counter::count>)> collection
=> ["Mann", "Mann"]
[2] pry(#<Java::ComMs7::Counter::count>)> counter.count(collection)
=> {"Mann"=>2}

So far, everything looks good. Lets try comparing the result with what we expect:

[3] pry(#<Java::ComMs7::Counter::count>)> counter.count(collection) == {"Mann"=>2}
=> false

Ah, yeah, that should be true. Maybe we are comparing different classes?

[4] pry(#<Java::ComMs7::Counter::count>)> counter.count(collection).class
=> Java::JavaUtil::HashMap

Yeah, that's the problem. The Java class is returning a Java HashMap object, but we are comparing it to a Ruby Hash object. In any case, we can convert the Java HashMap into a Ruby Hash with the to_hash method. Lets see if that works:

[5] pry(#<Java::ComMs7::Counter::count>)> counter.count(collection).to_hash == {"Mann"=>2}
=> true

Great, we exit out of pry and modify the test code:

    it "returns count of 2 when given collection of 2 like items" do
      collection = ["Mann", "Mann"]
      counter.count(collection).to_hash.must_equal({ "Mann" => 2 })
    end

And sure enough, the test now passes!

Test Data Generation

Finally, an example of how to generate test data using JRuby. Lets add a more complex test that has a random assortment of different items in the collection. We create a known quantity of different items, then shuffle them to create a random order:

it "returns correct count with mix of items" do
  collection = ([true]*97 + [false]*3).shuffle
  counter.count(collection).to_hash.must_equal({ true => 97, false => 3 })
end

As expected, this also passes.

Summary

This should give you some idea of how to test your Java using Ruby, and the advantages of doing so. Here are some reference links to get you started:

A Case for JRuby in the Enterprise

Posted on 18 November 2014 by Matt Dew

When we began developing Interlock at Mobile System 7, we, like all software teams, had to make difficult choices about our technology stack. As a new startup developing a product that initially had abstract goals and no firm roadmap, not only did we not know what Interlock was going to become, but we knew that we didn't know everything about the customer environments into which Interlock might be deployed.

Our technology choices had to make sense in light of so many factors of relatively equal importance (product goals, team skill set, requirements, time, money, customer infrastructure, on and on..) that it was almost as challenging to settle on something as it has been to design and build some of the more critical features in Interlock itself. Yet we had to choose something.

For Interlock, our primary challenge was to choose a technology stack that would help us maximize developer productivity and enable iteration over product features that were constantly evolving, while not leaving us unable to meet the understandably high technical standards and expectations of our enterprise customers.

That's where JRuby came in. For those who aren't aware, JRuby is a Java implementation of the Ruby programming language - i.e. Ruby on the JVM. JRuby development is led by a small team currently at Red Hat, but has seen significant community participation and in the last few years has graduated from being a curiosity in the Ruby community to a well-respected, first-class Ruby implementation - enjoying wide adoption and support. The fact that Rubyists are willing to be in the same room as the word Java (or JAnything) is as good a testament to JRuby's value as anything - because there was a time when the Java and Ruby communities were as philosophically opposed as neckbeards and hipsters.

Despite progress in the community, a philosophical division seems to linger with respect to when and where it's appropriate to use Ruby over Java. Ruby doesn't often come to mind when you mention enterprise software (particularly not off-the-shelf software), whereas Java is one of a small number of languages that is almost synonymous with it. Similarly, Java doesn't often come to mind when you think lean/agile startup, whereas Ruby has been credited (right or wrong) with helping many startups get features built and validated by customers quickly.

I think JRuby and other non-Java JVM languages are going to shatter those divisions, if they haven't already.

JRuby's technical details are far more interesting than I can describe in a short blog post, and people more intelligent than me have explained them better than I can, anyway. You can also find tutorials and code examples on the JRuby Wiki if you're interested in more detail about how to get started with JRuby.

I'm going to focus instead on some of the reasons we believe JRuby so perfectly meets Mobile System 7's needs, and on how we arrived at that conclusion. It ultimately boiled down to a few things:

We're developing an enterprise product

First and foremost, we knew we were building an enterprise product. Enterprises are not typically the wild west of software, and for good reason. Enterprise IT shops have thousands of users and dozens (if not hundreds) of applications to support, and their users expect critical applications and data to be available whenever and from wherever they need it. Application availability/stability directly affects workforce productivity, which in turn affects the bottom line.

Also, enterprises have generally made enormous investments in infrastructure, and they expect software products to be able to operate within that infrastructure as much as possible. Servers, operating systems, databases, user directories (AD, LDAP, etc), and countless other technologies are standardized, centralized, and managed independently of the applications that use them - ultimately so IT teams can make more efficient use of their technical and human resources.

Consequently, enterprise IT shops focus on finding efficiencies and managing risk as much as they focus on pushing the feature envelope for their users - and oftentimes the risk management side of the coin wins the toss.

No matter how aggressively a product vendor wants to help an enterprise push the envelope, that vendor's products are going to be subject to existing technical constraints - and it's in the enterprise's and the vendor's interests to make sure products operate within those constraints as much as possible.

You want to install what?

So, we didn't want to get down the long enterprise sales road only to have the Interlock technology stack be a non-starter (or even a minor concern) for IT departments. We had to avoid the "you want to install what on our servers?" question, and in my mind that left us with precious few choices: the .NET framework or the JVM (and maybe C if we had been brave, but we're not).

We could have attempted to convince our customers to let us install some other language interpreter or runtime (and whatever dependencies came with it), but that's a far more difficult sell than using something they're already comfortable with.

The JVM was the best choice because it gave us and our customers flexibility over the environments in which Interlock could be deployed. Plus, most enterprises are comfortable with Java and already have an application install base that requires it. Asking a customer to install the JRE as a dependency for your software is not usually a tough sell, because in all likelihood it's already in use.

Fortunately, JRuby applications run in the JVM!

Java may be ubiquitous in the enterprise, but it's not agile...man....

I love Java. The world is full of great Java developers, enterprises have been using it for years, and the performance and stability of the JVM just gets better and better.

Java also has a huge ecosystem. There are countless active open source projects, so chances are you can scour Github (or Sourceforge if you're nasty) and find any number of Java projects that you can use or alter free-of-charge to help get your project off the ground. The open source community continues to amaze me with the quality and quantity of the software it produces - particularly the Java community.

Java is fast. There was a time when people considered it to be a bit of a dog, but that time has definitely passed. All the big guys are using Java these days, and they're likely dealing with performance and stability requirements that dwarf those of 99% of enterprise products. If Java is good enough for them, it's probably good enough for us and our customers.

But?

When I think of using Java to build an enterprise product with an amorphous feature set, I start getting the sweats. Call it a premonition, call it a crazy vision, call it intuition, but something told me that for a product like Interlock, Java would call for more developers, time, and lines of code than we had in our budget.

I don't think that's necessarily true of all projects or even all startups, but use of Java frameworks and conventions often implies significant up-front design, a relatively well-defined domain model, and nearly constant second guessing - because the cost of change is relatively high in the Java world (I would look for a citation but we all know it's true).

In an environment where change is constant - such as during the early development of Interlock - you have to minimize those costs by leaving yourself as able to react to change as possible. While your programming language of choice only contributes in part to your ability to change, in my experience it can be a significant contributor.

Ruby, in contrast, is great for the change heavy software project. You can argue all you want about its performance and manageability over time relative to Java, but in my opinion the language really lends itself to projects that require frequent iteration over features. You could write a super fast widget in Java, but you should be able to write one much more quickly in Ruby. If all you know is that you need a widget and that someone is going to be refining and validating its features over time, I think it's wise to make every effort to get the widget developed and in front of that person as quickly as possible, solicit feedback, incorporate the feedback, repeat, repeat, repeat.

All things being possible regardless of the language - I would probably not choose to write a widget in Java under those circumstances, even though it may be the better option once the widget's features are more well-defined. However, if you're building an enterprise product you can't necessarily write the widget in whatever language is most familiar to you, because at some point it'll be part of the product - or you'll have to trash it and rewrite it in something else that's more product-worthy.

Fortunately with JRuby you don't have to choose between Ruby and Java - you get both! Code written in one can invoke code written in the other, and while it might feel weird to implement a Java interface in Ruby, it's totally easy...and in time you won't even care if anyone notices you're doing it.

//Poor example of a Java interface
package com.mobilesystem;

public interface Developer
{
  public void   code();
  public String excuse();
}

...

#Poor example of a Java Interface being implemented in a Ruby Class
class MattDew
  include com.mobilesystem7.Developer

  def code
    Laptop.new.open
    raise com.mobilesystem7.CodingProductivityException.new(self.excuse)
  end

  def excuse
    ["Can't code, in a meeting", 
     "Can't code, don't have headphones",
     "Can't code, lunch...",     
     "Can't code, the build is broken
    ].sample
  end

end

If you choose to write a widget in Ruby in order to expedite feedback and then decide later on that it needs to be rewritten in Java (in part or in total), JRuby helps make the transition easier. The code that uses the MattDew class doesn't have to care whether the class is implemented in Ruby or Java, whether the implementation changes, or whether other classes that implement com.mobilesystem7.Developer are written in Ruby, Java, or both. JRuby takes care of the making sure it all works together seamlessly.

Seamless, gradual maturation of product features and architecture

Interlock's development has seen the gradual progression of various factors - a solidification over time. This maturation was anticipated, and we wanted to make sure our technology stack would facilitate it.

Firstly, with respect to feature set, Interlock started off as a very abstract thing. We knew we wanted to enable enterprises to deploy Identity Analytics and Adaptive Access Control capabilities against their critical services and data, but we had yet to work out the many details around how to do it and around which capabilities to focus on first.

Over time we've worked out many specifics and our product has stabilized. Change is less frequent, though still common, and Interlock's primary features have gone from being abstract to more-or-less concrete. Our ability to change, react to customer feedback, and to quickly put out stable features and seek out customer validation has been critical in the solidification of the feature set.

Secondly, with respect to language use, our codebase has seen the steady introduction of more Java code over time - particularly in areas where performance is of utmost importance. On several occasions we've developed the first pass (or several iterations) of a feature in Ruby, written unit tests, and shipped a release, only to come back later and reimplement some of the code in Java. In many cases we haven't even needed to make significant changes to the unit tests, which generally are written using Ruby testing libraries even when the code being tested is written in Java (the excellent Andrew Semprebon will be sharing how we use Ruby to test Java in a future post).

There's great comfort in knowing that when you make the effort to write good Java code that you've already validated the feature and written working tests for it, and that there's a reduced likelihood that you'll have to come back and make significant changes again. Remember, change in Java is costly (citation here).

Thirdly, with respect to architecture, Interlock has progressed from having a monolithic library of shared code to more specialized services/components that do a small number of things well. We have components written purely in Java, components written purely in Ruby, and components that were written in both, and as much as possible we want those components to share libraries - particularly since the components emerged from a single codebase and were initially sharing code. Being able to make that transition without significant disruption or product delays has been critical for us.

JRuby has been central in helping us progress on all three fronts. Writing features in Ruby has enabled quick iteration and customer validation, and JRuby has allowed us to optimize in Java and separate capabilities into components where appropriate - ultimately helping us reduce the cost of change and make steady progress without scrapping entire components or backing ourselves into a performance bottleneck.

We need to deploy amongst enterprise application servers and services

A dirty secret of the Ruby community that I don't think gets enough attention is the fact that container-managed services are nonexistent in Ruby application servers. If you compare one Ruby application server to another, all you're really comparing is their responsiveness. This is because they all have one feature, and one feature only - the ability to serve up your Ruby application.

Consequently, if you want nice things like authentication, message queueing, background jobs, easy configuration of external integration points, in-memory caching, etc, you'll likely have to bake those things into your Ruby application with a gem and/or deploy additional technologies that are external to the application and its runtime. That's OK when you're managing your own infrastructure, but when you're shipping a product to a customer it greatly increases the deployment and maintenance burden - and, thus, the likelihood they're going to push back and say, "you want to install what?"

Java application servers, in contrast, have had these capabilities baked in for years. When you're writing a Java web application, for example, you simply develop against standard APIs, enable the services behind those APIs in your application server, and then deploy your app. The application server manages the lifecycle of the application and of the services on which it depends, and coordinates their availability across whatever N servers the combined deployment requires.

In addition to providing services that applications use directly, Java application servers provide simple configuration hooks into additional enterprise integration points. Does your application need a connection pool for access to a corporate LDAP service? This would be no problem in Java land, via JNDI, but this simple, common feature is not something you'll find in a pure Ruby application server. If you need to use pure Ruby to establish an SSL connection to that LDAP server, or, worse still, authenticate to Active Directory via NTLM, welcome to hell (or at least to a few hours of facepalming).

Furthermore, many enterprises have made heavy investments in Java application servers and expect products to be able to run in their application server of choice - or at least for you to make a compelling case as to why yours shouldn't have to.

Not only is it wise to be able to install within an enterprise's application server of choice, but if you build an application with the expectation that it will use container-managed services, you avoid having to concern yourself with figuring out which gems or external services to use - and, more importantly, how to manage their deployment, maintenance, and availability in your customers' environments.

Fortunately, JRuby enables you to run your Ruby application as a Java application. You can deploy within WebLogic, WebSphere, Tomcat, JBoss, Geronimo, etc, and make use of whatever container-managed services they provide. While the code can be almost 100% Ruby, to the application server it all ends up running as Java bytecode. No system dependencies are needed other than that which the customer already has in support of their existing Java applications - namely, the JRE and a Java application server.

Our ability to focus on product features instead of concerning ourselves with questions about deployment and integration has been of great benefit to our product and our customers, and without JRuby I believe it would've been a much more difficult road.

The bottom line is - we're writing a Java application primarily in Ruby, and enjoying the benefits of both.

NOTE: The TorqueBox project is something we really love, and is worth looking into...because they get it. They're making the convenience and full-featuredness of Java application servers available to Ruby applications via straightforward Ruby APIs and declarative configuration. There's more to it than that, of course, but I like to say they're doing for Ruby application deployment what Rails did for web application development - I really think it's that important.

Conclusion

Like working in an enterprise IT department, choosing languages and frameworks for an enterprise software product is an exercise in determining how to optimize productivity and minimize risk - and in understanding and accepting tradeoffs. There's rarely a right answer, and every choice has the potential to influence the speed with which your team develops; the speed with which you're able to change or respond to customer needs; the speed with which you can install, update, and support your product; and, of course, the speed with which your product runs.

While the factors that went into our choices are certainly more complicated than can be explained in a blog post (and this has already gone on too long), we were focused on enabling an easy and appropriate transition from flexibility to maturity in our product, and on doing it in a way that allowed us to operate within the technical constraints of our customers. These needs are primarily what led us to JRuby.

Two years into the development of Interlock, I'm confident that we made the right choice in the JVM and in JRuby - and with the interesting work going on in the JRuby community I'm confident we'll be able to take advantage of future benefits that we haven't yet imagined.

Your mileage will vary, of course, and I genuinely wouldn't expect otherwise. JRuby and the Java ecosystem have many great strengths, but there are certainly circumstances under which they aren't the appropriate choice. Hopefully this post has helped highlight at least one case where JRuby may be a good choice for you, and if you're interested in discussing further please reach out to me on Twitter at @matt_dew.

Identity Analytics in Interlock

Posted on 14 November 2014 by Joseph Turner

Here at Mobile System 7, we like to say that we create products that protect your data like your credit card protects your money. While that phrase evokes the right ideas - things like constant vigilance, examining multiple factors in determining risk, and generally preventing others from pretending to be you - it doesn't really offer a peek under the hood. In this post, I'm going to talk about the theoretical basis for the how our core product, Interlock, protects your data.

Ultimately, our approach boils down to two related concepts, Identity Analytics and Adaptive Access Control, and their interplay. Identity Analytics take as input a stream of data about how a given identity or group of identities interact with the services they use and output a level of risk associated for each of those identities. Adaptive Access Control takes that mapping and changes how the identities can interact with their services, for example by blocking a high-risk identity's access to sensitive resources.

Though both parts are integral to how our products work, in this post I'm going to talk specifically about Identity Analytics (IA) from a theoretical perspective. First, I'll break down the IA function pictured above into its constituent parts. Then, I'll discuss these parts in detail individually. For the sake of simplicity, I'm going to assume we're protecting a single identity on a single service. After we get the basics out of the way, I'll discuss how these ideas can be enhanced when working with a population of identities across multiple services.

The basics

Conceptually, the IA function is made up of two parts.

The first part digests the activity stream and extracts a set of relevant features. For example, the features might include things like:

  • The identity is in a blacklisted location
  • The identity is coming from a suspicious IP address
  • The identity is attempting to access resources they shouldn't
  • The identity is a member of a privileged group
  • The identity used service X outside of their normal operating time

Why do we need to perform feature extraction at all? Because although the activity stream arrives as a sequence of discrete events, the real input to our system at every moment is the complete activity stream, from the beginning of time. This allows us to understand aggregate usage and behavior. Without examining the full activity history, we would be forced to evaluate risk based on each event individually. Using an example feature from earlier, what does normal operating time mean in the context of a single event? In order for Interlock to be able to use important features like this, it needs historical data as well.

By examining the full activity stream, we get a lot more information for use in our risk level calculation, but it comes at the cost of processing a huge amount of data, much of which is redundant. By performing feature extraction, we reduce the dimensionality of the data. This eliminates or aggregates redundant data while highlighting the information needed by the second part of the IA function: the classifier.

A little wrinkle...

Before we move on, I'd like to point out an interesting detail of the feature observations. Because they are modified when activities arrive, they technically live in the time domain; this just means that the values changes over time. In fact, when a feature is observed, we model the observation as a function of time as well. In other words, if an incoming activity causes a feature to activate, the feature's value will be at a maximum at the time of that activity and may change as time moves forward. The actual way the value changes depends on the feature that has been extracted. Some are fully binary, so when the feature is observed it stays at its highest value until something mitigates it, like this:

An example would be membership in a sensitive group. That feature is full-valued for the entire span of time the identity is associated with the group.

Other features are modeled as "pulses". When the feature is observed, the value is highest and it decays over time, like this:

An example would be when a user attempts to access a resource for which they don't have permission. Though that feature is relevant to an identity's risk level today, it is much less relevant in a week, and even less so in a month. By decaying the value of the features over time, we ensure that the features contribute to risk when they are most relevant.

The classifier

The classifier is a function from the feature vector to a risk level. Based on customer feedback we discovered that three levels of risk are enough to accomplish the goals of the system.

  • Good - From the perspective of the IA, this identity is not interesting
  • Suspect - This identity has performed actions that are abnormal and should be monitored more closely
  • Bad - This identity is very likely up to no good (e.g., the account has been compromised, represents an insider threat, etc.)

The classifier function, then, takes as input a vector of feature values and outputs one of the discrete classes above.

As we just discussed, the features themselves are a function of time, so the classifier function also lives in the time domain. For practical reasons though, the classifier is only invoked at certain points in time in response to significant changes in the values of the feature vector. Whenever the risk level is calculated by the classifier for a given point in time, all feature functions are evaluated at that moment. These slices make up the actual feature vector consumed by the classifier.

In the figure above, I have annotated the discrete points at which the classifier is evaluated. The classifier is evaluated when a feature increases in value, and also when a feature value falls below a threshold. The values passed to the classifier correspond to the value of each feature at the point in time when its evaluation is triggered, corresponding to the vertical lines above. Of course, not every run of the classifier results in a new risk level. Practically speaking, there are many more evaluation points than pictured above, corresponding both to changes in feature value and changes in metadata about the identity. In general, we want to run the classifier any time that there might be a change in risk level.

So what is the classifier itself? How does it translate a feature vector into one of a discrete set of classes? A trivial but still useful approach is a simple test for features: if Feature X is active, return Bad. This is an approach taken by many services today. For example, if you log into your Gmail account while overseas, you get a notification and possibly a request for two-factor authentication. This approach can fall short, though, because it makes it difficult to evaluate features that don't directly contribute to risk level.

A more robust classifier examines features not in a vacuum but in the context of the entire feature set. With this approach, several features that in isolation have no impact on risk level can combine to affect risk in a meaningful way. This is the approach we take. Though we are constantly refining our classifier, as of today it combines an expert-directed decision forest with some hand-tuned heuristics. It incorporates feedback both from the user and from changes in the population of identities to refine the decisions it makes over time. This system gives us the flexibility to adjust and modify the feature set and adapt to the requirements of different environments while remaining confident in the output of the classifier.

Populations and services

As mentioned earlier, there are a number of practical details that have been simplified for the discussion above. First, what about populations of identities? Especially in the enterprise environment, there are aspects of the group of identities that are relevant to the risk level for a given identity. A few examples:

  • Accessing resources with more different devices than is normal for the organization
  • Operating outside of the group's normal operating location
  • Being in an inappropriately large number of groups

For each organization, the baselines - the normal number of devices, the organizational operating locations, the appropriate number of groups - are different. By looking at a group of identities rather that identities in isolation, we can get tons of useful population statistics against which we can compare the individual identities. Of course, this comes at a cost. Instead of processing merely the entire activity stream for an identity, we now must perform feature extraction on the full activity history for the entire organization.

Moving from a single service to a group of services offers a different benefit. By examining the actions an identity takes across different services, we can extract features to build models of typical access patterns. These models can then identify anomalous behavior that may represent a threat to that identity or to the organization to which it belongs. This too comes at a cost. Each additional service increases the size of the data stream significantly, and there is overhead associated with extracting and maintaining the intra-service models.

Conclusions

In this article, I've attempted to outline the theoretical underpinnings of Identity Analytics as they stand in the current version of our product, Interlock. While the basic ideas are pretty easy to explain, the practical issues with feature extraction and classification are well beyond the scope of this article. Both problems are made more challenging by the real-time or near-real-time requirements dictated by the Adaptive Access Control aspect of Interlock. In future posts, I'll cover some of these practical issues and how we're addressing them.

Looking forward, our roadmap includes aggressive improvements to both our theory and our practice. Improvements in the former will yield better, more comprehensive pictures of risk for individuals and organizations, while improvements in the latter will help us reach bigger scale: larger populations of identity, across even more services.

If you're interested in finding out what our Identity Analytics can tell you about your organization, please contact us today to arrange for a test-drive of your data in Interlock.


Copyright © 2014 Mobile System 7 - www.mobilesystem7.com