Archive for May, 2007

Real Intelligence for National Security, Part II

Manny May 6th, 2007

As in my first posting, “It’s Time to Get Real“, Saffron Technology has been coming out of stealth mode the last few months, and things are hopping! We hosted our first public event in DC last month, a Manifesto for “ExtremeIntelligence™”, which was very well attended, and so we are planning another such event for June 6. I’ve also been busy traveling in the US and UK to spread the word. The need for a new solution is clear. As I listen to more people in national security and aligned interests, such as in fraud detection and anti-money laundering, it’s clear that customer needs are not being met by traditional statistics and rule-based models. They remain very inaccurate and expensive, particularly when they continue to generate high rates of false alarms that require the high costs of manual filtering to find the “good stuff”.

In my last posting, “Real Intelligence for National Security“, I criticized models (in contrast to memories) as too “eager”. By definition, traditional statistics, neural networks, and rules attempt to fit data to a particular model. As such, models engender all sorts of notorious problems in knowledge engineering. My main point was that such approaches will never by fully incremental - quickly learning case-by-case without a priori assumptions and without dependency on the sequence of data arrival. In contrast, memories embody a minimum commitment principle; they do not fit data to one model but simply try to be a good memory of as much information as possible. This makes then much more flexible at query time to answer any number of questions.

In this posting, I cover a few of the questions that associative memories can answer. For each, the major differentiation is that memories carry more information than abstract models. Using rather than reducing information allows memories to be more accurate.

Who is similar?
Many commercial technologies that existed before September 11, 2001 were quickly applied to national security. While I applaud the patriotism of these efforts, there remains a growing dissatisfaction with many of the applied solutions. I remember an intra-agency summit on “entity disambiguation” where the session leader read a public press release from a leading vendor claiming to be the answer to national security and then asked, “So who thinks the problem has been solved and we can all go home now?” The audience responded with general laughter. This vendor’s rule-based approach had been tested as only 2% accurate when applied to a very sparse foreign intelligence data.

In contrast, in this same comparative study, our associative memory-based approach demonstrated 93% accuracy, with a very large crème-de-la-crème result of 100% perfection at the top of the alias and name variant list. As introduced in my last posting, a CATO Institute paper by Jonas and Harper first suggested but then deemed such results improbable. “Perhaps, through more assiduous work by government authorities and contractors- using a great deal more data - could overcome the low precision of data mining and bring false positives from 90+ percent to the low single digits.” Associative memory-based methods have already reached these goals and should replace the low accuracy of more traditional modeling.

The magnitude of this difference proves memory-based reasoning as a fundamentally different approach - not a VHS versus BetaMax competition over small differences. The reasons for the difference are easy to understand. Memories as compared to abstract models can hold and use more information about each entity and its features, relationships, and behaviors. While rule-base lexical and other pattern matching methods typically use 2 to 4 principle attributes, memories know and use all the available information about attributes as well as their nonlinear correlations. If you know how to handle more information, then having more information will always be an advantage.

Who is related?
I remember meeting some real-deal special operations folk. We were talking about social network analysis, and I said, “Even I am related to Osama bin Laden by six degrees of separation!” I got a bit nervous in the following seconds when these guys looked at me more intensely! But seriously, who cares? Simplistic social network analysis and methods of “guilt by association” are weak and inaccurate. Like abstract rules and statistics that lose information, social network graphs lack the richer context of each entity-entity association. Plying through friends-of-friends-of-friends social networks will find lots of connections. But do any of them really matter to the task at hand? What is the context of the relationship to decide appropriate guilt and targeting beyond mere association? Currently, analysts are still faced with filtering through massive lists and “balls of yarn” visualizations to find what really matters. When I interrogate an entity’s connections, I don’t want its Roledex. As when asking a friend for references to help solve a particular question, I want only those links that are relevant to my question.

Saffron’s solution to the associative memory scaling problem allows us to remember and recall all the context of all the associations amongst all entities across all data sources. Memories represent the meaning of associations by also remembering their context: when, where and what existed, was materially exchanged, or discussed between the entities. In subsequently reasoning from memory, some relationships are relevant or informative and others are not, depending on a specific question. The richness and detail of associative memories to represent entity networks and their contextual relationships is different than the reductionistic simplicity of node-arc social network models. Instead of every entity having just a name, every entity has a complete associative memory! Each entity memory holds its connections to other entities as well as all the conditions around each connection. Memories are represented by associative matrices, one for each and every entity, each of which can hold millions of attributes and their coincidences, among millions of such entity matrices on just a few off-the-shelf computers.

When a user asks a question, the system first returns entities, not documents. Think of search results as “entity rank” rather than “page rank”. For each relevant entity, results also include other entities and terms that have been in context with the entity and the query term. For example, if I’m interrogating Abu Hamza’s memory (a notorious cleric) for associations to foreign countries, I can also see the people he associated with in the context of each relevant country, such as in the US, Yemen, or Afghanistan. Each memory also recalls the most relevant “snippets” (sentences or paragraphs) of evidence for how each entity is relevant to the query context. While typical search engines return documents relevant to the query at large, each memory returns the snippets most relevant to the context of the query and its particular entity. Such high precision, context-dependent results are made possible by the richly detailed, non-linear memories for each and every entity.

What might happen?
Jonas and Harper argue that data mining will not work to uncover new terror plots. However, they then suggest that no new technology is required or even capable! See Kim Taipale’s testimony to the U.S. Senate Committee on the Judiciary for a more complete technical and legal refutation, with which I totally agree. I also agree with Jonas and Harper: that traditional data mining is inadequate. However, they use a very narrow definition and scenario of data mining to then argue against any new approach to data analysis more generally. As Taipale counters much more articulately than I can, Jonas and Harper present a “straw man” argument. For example, Jonas and Harper argue that terrorist attacks do not happen with enough frequency to learn and predict them. True, but Taipale clarifies how there are repeating patterns of terrorist activity when looking at terrorism’s precursor events. Terrorists must engage in surveillance, planning, and supplying before an attack. These precursor activities are repeated; therefore, they can be learned and predicted. But not by traditional data mining, which is too “data hungry” in relying on the probabilistic Law of Large Numbers.

Memory-based prediction is more akin to nearest-neighbor reasoning as compared to probabilistic reasoning. Memories reason from prior experience, even if that experience is sparse. Memory-based reasoning is even capable of “one-shot learning”; how many times do you need to touch a hot stove before learning that it is not a pleasant experience? Memories reach asymptotic accuracy with only a fraction of the cases needed by other methods. For example, one of our OEM partner’s products, Electronic Learning Assistant, has been declared the “The World’s Best Spam Blocker” when compared to others using Bayesian, collaborative, rules, and other methods when given only a few examples. Another partner, Intelligent Agent Corporation, has demonstrated expert human levels of adverse event prediction in the Oil&Gas industry when given only a few examples - when statistical methods fail.

If this predictive reasoning by similarity to experience reminds you of the first question, “Who (or what) is similar,” it should. Analogical reasoning is a key part of predictive reasoning. We predict what will happen by similarity to specific cases within vast amounts of past experience. Memory-based reasoning captures and uses more information, in both detail and scope. It is easy to be more accurate by using more information, which other methods lose. As in my first posting, this approach is emerging as more natural and powerful when compared to the more brutish and questionable technologies of the last century, such as rule-based reasoning and statistical data mining.


One of my next postings will address personalized medicine as extremely relevant to this discussion. While it is difficult to speak about our work in national security, personalized medicine provides a platform to further discuss problems of reductionistic data modeling and how memory-based solutions overcome them. Consider how the statistical generalization of a drug as “safe and effective” is becoming increasingly unreliable. Such statistics do not consider the specific genetic and environmental factors of each individual. Stereotyping all members of a group as the same is both inaccurate and wrong-headed, whether predicting an individual’s terrorist threat or an individual’s drug sensitivity. Both domains represent serious life and death issues where the scale and complexity of the data demands newer, faster, and more accurate methods.