Archive for the 'national security' Category

Real Intelligence for National Security, Part II

Manny May 6th, 2007

As in my first posting, “It’s Time to Get Real“, Saffron Technology has been coming out of stealth mode the last few months, and things are hopping! We hosted our first public event in DC last month, a Manifesto for “ExtremeIntelligence™”, which was very well attended, and so we are planning another such event for June 6. I’ve also been busy traveling in the US and UK to spread the word. The need for a new solution is clear. As I listen to more people in national security and aligned interests, such as in fraud detection and anti-money laundering, it’s clear that customer needs are not being met by traditional statistics and rule-based models. They remain very inaccurate and expensive, particularly when they continue to generate high rates of false alarms that require the high costs of manual filtering to find the “good stuff”.

In my last posting, “Real Intelligence for National Security“, I criticized models (in contrast to memories) as too “eager”. By definition, traditional statistics, neural networks, and rules attempt to fit data to a particular model. As such, models engender all sorts of notorious problems in knowledge engineering. My main point was that such approaches will never by fully incremental - quickly learning case-by-case without a priori assumptions and without dependency on the sequence of data arrival. In contrast, memories embody a minimum commitment principle; they do not fit data to one model but simply try to be a good memory of as much information as possible. This makes then much more flexible at query time to answer any number of questions.

In this posting, I cover a few of the questions that associative memories can answer. For each, the major differentiation is that memories carry more information than abstract models. Using rather than reducing information allows memories to be more accurate.

Who is similar?
Many commercial technologies that existed before September 11, 2001 were quickly applied to national security. While I applaud the patriotism of these efforts, there remains a growing dissatisfaction with many of the applied solutions. I remember an intra-agency summit on “entity disambiguation” where the session leader read a public press release from a leading vendor claiming to be the answer to national security and then asked, “So who thinks the problem has been solved and we can all go home now?” The audience responded with general laughter. This vendor’s rule-based approach had been tested as only 2% accurate when applied to a very sparse foreign intelligence data.

In contrast, in this same comparative study, our associative memory-based approach demonstrated 93% accuracy, with a very large crème-de-la-crème result of 100% perfection at the top of the alias and name variant list. As introduced in my last posting, a CATO Institute paper by Jonas and Harper first suggested but then deemed such results improbable. “Perhaps, through more assiduous work by government authorities and contractors- using a great deal more data - could overcome the low precision of data mining and bring false positives from 90+ percent to the low single digits.” Associative memory-based methods have already reached these goals and should replace the low accuracy of more traditional modeling.

The magnitude of this difference proves memory-based reasoning as a fundamentally different approach - not a VHS versus BetaMax competition over small differences. The reasons for the difference are easy to understand. Memories as compared to abstract models can hold and use more information about each entity and its features, relationships, and behaviors. While rule-base lexical and other pattern matching methods typically use 2 to 4 principle attributes, memories know and use all the available information about attributes as well as their nonlinear correlations. If you know how to handle more information, then having more information will always be an advantage.

Who is related?
I remember meeting some real-deal special operations folk. We were talking about social network analysis, and I said, “Even I am related to Osama bin Laden by six degrees of separation!” I got a bit nervous in the following seconds when these guys looked at me more intensely! But seriously, who cares? Simplistic social network analysis and methods of “guilt by association” are weak and inaccurate. Like abstract rules and statistics that lose information, social network graphs lack the richer context of each entity-entity association. Plying through friends-of-friends-of-friends social networks will find lots of connections. But do any of them really matter to the task at hand? What is the context of the relationship to decide appropriate guilt and targeting beyond mere association? Currently, analysts are still faced with filtering through massive lists and “balls of yarn” visualizations to find what really matters. When I interrogate an entity’s connections, I don’t want its Roledex. As when asking a friend for references to help solve a particular question, I want only those links that are relevant to my question.

Saffron’s solution to the associative memory scaling problem allows us to remember and recall all the context of all the associations amongst all entities across all data sources. Memories represent the meaning of associations by also remembering their context: when, where and what existed, was materially exchanged, or discussed between the entities. In subsequently reasoning from memory, some relationships are relevant or informative and others are not, depending on a specific question. The richness and detail of associative memories to represent entity networks and their contextual relationships is different than the reductionistic simplicity of node-arc social network models. Instead of every entity having just a name, every entity has a complete associative memory! Each entity memory holds its connections to other entities as well as all the conditions around each connection. Memories are represented by associative matrices, one for each and every entity, each of which can hold millions of attributes and their coincidences, among millions of such entity matrices on just a few off-the-shelf computers.

When a user asks a question, the system first returns entities, not documents. Think of search results as “entity rank” rather than “page rank”. For each relevant entity, results also include other entities and terms that have been in context with the entity and the query term. For example, if I’m interrogating Abu Hamza’s memory (a notorious cleric) for associations to foreign countries, I can also see the people he associated with in the context of each relevant country, such as in the US, Yemen, or Afghanistan. Each memory also recalls the most relevant “snippets” (sentences or paragraphs) of evidence for how each entity is relevant to the query context. While typical search engines return documents relevant to the query at large, each memory returns the snippets most relevant to the context of the query and its particular entity. Such high precision, context-dependent results are made possible by the richly detailed, non-linear memories for each and every entity.

What might happen?
Jonas and Harper argue that data mining will not work to uncover new terror plots. However, they then suggest that no new technology is required or even capable! See Kim Taipale’s testimony to the U.S. Senate Committee on the Judiciary for a more complete technical and legal refutation, with which I totally agree. I also agree with Jonas and Harper: that traditional data mining is inadequate. However, they use a very narrow definition and scenario of data mining to then argue against any new approach to data analysis more generally. As Taipale counters much more articulately than I can, Jonas and Harper present a “straw man” argument. For example, Jonas and Harper argue that terrorist attacks do not happen with enough frequency to learn and predict them. True, but Taipale clarifies how there are repeating patterns of terrorist activity when looking at terrorism’s precursor events. Terrorists must engage in surveillance, planning, and supplying before an attack. These precursor activities are repeated; therefore, they can be learned and predicted. But not by traditional data mining, which is too “data hungry” in relying on the probabilistic Law of Large Numbers.

Memory-based prediction is more akin to nearest-neighbor reasoning as compared to probabilistic reasoning. Memories reason from prior experience, even if that experience is sparse. Memory-based reasoning is even capable of “one-shot learning”; how many times do you need to touch a hot stove before learning that it is not a pleasant experience? Memories reach asymptotic accuracy with only a fraction of the cases needed by other methods. For example, one of our OEM partner’s products, Electronic Learning Assistant, has been declared the “The World’s Best Spam Blocker” when compared to others using Bayesian, collaborative, rules, and other methods when given only a few examples. Another partner, Intelligent Agent Corporation, has demonstrated expert human levels of adverse event prediction in the Oil&Gas industry when given only a few examples - when statistical methods fail.

If this predictive reasoning by similarity to experience reminds you of the first question, “Who (or what) is similar,” it should. Analogical reasoning is a key part of predictive reasoning. We predict what will happen by similarity to specific cases within vast amounts of past experience. Memory-based reasoning captures and uses more information, in both detail and scope. It is easy to be more accurate by using more information, which other methods lose. As in my first posting, this approach is emerging as more natural and powerful when compared to the more brutish and questionable technologies of the last century, such as rule-based reasoning and statistical data mining.


One of my next postings will address personalized medicine as extremely relevant to this discussion. While it is difficult to speak about our work in national security, personalized medicine provides a platform to further discuss problems of reductionistic data modeling and how memory-based solutions overcome them. Consider how the statistical generalization of a drug as “safe and effective” is becoming increasingly unreliable. Such statistics do not consider the specific genetic and environmental factors of each individual. Stereotyping all members of a group as the same is both inaccurate and wrong-headed, whether predicting an individual’s terrorist threat or an individual’s drug sensitivity. Both domains represent serious life and death issues where the scale and complexity of the data demands newer, faster, and more accurate methods.


Real Intelligence for National Security

Manny February 23rd, 2007

It has been an exciting and busy time since my last posting! For one, it was wonderful to visit London again. I was also in London right after the subway bombings a couple of summers ago. Then as now, I deeply admire the British stiff upper lip, no matter if facing WWII bombs from above or today’s terror within. I had been thinking of what to write next regarding national security as promised, but clearly, the topic of counter terrorism is a matter of international concern. The Brits are wonderful in showing how daily life goes on despite being on the front line of this new war. However, the world has continued to change in this new millennium, while we continue to rely on rather brutish technology.

Listen. I work for the CIA. I am not a spy. I just read books. We read everything that is published in the world, and we feed the plots, dirty tricks, [and] codes into a computer. And the computer checks against actual CIA plans and operations. I look for leaks. I look for new ideas. We read adventures and novels and journals. I…… Who would invent a job like that?

Three Days of the Condor, 1975

I love this quote! The job of an intelligence analyst was crazy even in 1975. Today, it is nearly impossible, and new technology is needed to help read an ever-growing amount of data. It is not just about reading books these days. The volume of the Web, including “deep Web” chat rooms, blogs, and more is completely overwhelming. In my last posting, I introduced “real intelligence” as memory-based reasoning. International security needs real intelligence to help read, comprehend, and remember everything so that analysts do not have to.

I recently read that Tom Fingar, the Deputy Director of National Intelligence for Analysis, declared “It isn’t intelligence until it has been processed through the brain of an analyst.” Until then, it is all just data. We are drowning in data and do not have enough analytical brains. Technology is required to help. But why do we still use the same old technology when we are fighting a new kind of war? It is a new millennium with a new kind of warfare, and yet we still use questionable 20th century technology like rule-based systems and data mining. Remember my first posting about Tukey and his doubts about statistical inference. The joking derision, “That is so 20th Century!”, can also be said (and often is said) against databases, rule-based systems, search engines, and data mining. Because I am a biologist and psychologists, even when data mining is based on “neural networks”, I believe this have little if anything to do with real neurons and real intelligence.

In a recent paper from the CATO Institute Jeff Jonas and Jim Harper highlighted that traditional data mining has failed to address the needs of the intelligence community. Statistical reasoning about population distributions and abstract generalities is irrelevant when looking for rare adverse events in a dynamic sea of normality. One of their arguments, to which I agree, is that data mining is very data “hungry” and requires enough historical examples for its notions of statistical power and significance.

I agree that the data mining methods have been marginally effective even for commercial problems, such as for predicting consumer behavior, and have largely failed when re-applied to the harder challenges of intelligence analysis. On the other hand, memory-based reasoning is a different type of data analysis, more akin to reasoning in real brains rather than reasoning by rules and statistics. The memory-based approach to data analysis does not share these problems, such as being data hungry. Real intelligence reasons by similarity to specific experiences. If I touch a glowing stove once, I get burned and learn not to touch it again. The use of more or less data is irrelevant to this kind of experience-based reasoning.

David Aha’s edition on “lazy learning” defines a class of machine learning methods within this new approach. They include memory-based, case-based, experience-based, and instance-based reasoning. This class is distinguished from “eager” learners such as traditional statistics and neural networks which try to fit data to a particular model. This fitting of data leads to all sort of problems, including complex parameterizations, dependency on the order of data arrival, and over-fitting the data to the model. For example, if you believe in model fitting, you must worry about over-fitting.

The many problems of fitting data to models are understood in statistical modeling, but the same it true for rule-based inferencing as just another kind of modeling. For example, one rule-based system for identity management (including alias detection) attempts to make decisions about aliases and name variants when given each new piece of identity data. Its rule-based reasoning leads to decisions about combining identities that are order dependent, and the consequences of these decisions are then impossible to untangle. These models become increasingly wrong and impossible to fix. There is an enduring hope and occasional claim that rule-based and statistical systems will become fully incremental and “sequence neutral” one day. This is a naive hope.

Suppose I gave you the mean of a distribution (of 10 numbers, let’s say, but I do not provide you with the size of the distribution). An average number is useful and can be used for decision making, such as deciding on whether something is below or above average. However, if I give you another number and ask you to update the average, you do not have sufficient information to do so. It is impossible to update even this simple statistic as new data arrives. And this is a very trivial case. The problem of incremental change becomes increasingly difficult with greater scale and complexity and will never be addressed by reductionistic statistical models, neural networks, and rules. Decision rules are too eager and confound what they know with what they have already decided to do about what they previously knew, which is wrong and even dangerous.

Memories on the other hand are perfectly incremental and order independent. They store association counts; counts are perfectly incremented with each and every observation. Any order of data arrival results in the same final count. In the case of alias detection, memories store all the co-incident information about each identity and can quickly recall similar entities when asked (using k-nearest neighbors, for example). The answers might change as new data arrives, but any decisions made about entity similarity do not confound the information about similarity itself. A memory updates its information about new data but does not include decisions about how to fit the new data according to some a priori model. Lazy learners are called “lazy” because they adhere to this principle of minimum commitment. Memories read, comprehend, and remember all the information without reductionistic fitting to a particular model. This memory of detail makes them much more universal, stable and accurate.

We need this new approach for international security because we need more brains. To again quote Tom Fingar, “We don’t have enough analytical brains to meet all of the challenges. We have to rely on technology.” So why don’t we add more intelligent brain-like technology? Brains learn and think much more quickly and fluidly than traditional technology. For example, our brains do not learn by the notoriously slow, parametric, data-hungry methods of data mining - even methods called “neural” networks. Instead, we learn by memory: comprehending new information by instant integration with what we already know. The best analytic minds do not fit new data into a single a priori model confounded with priori decisions. They read and remember and then quickly reasoning to new situations as the world unfolds. Memory-based representation and reasoning will help us think about the harder questions we face today. As we have all read in the news the past year or more, over-eager analytical reasoning is disastrous!

We are still clinging to technologies that have failed and will continue to fail the intelligence community. I am passionate about this belief and will have more to say in my next posting. I will include more examples of these failures and how our new approach is succeeding. As the saying goes, “The proof of the pudding is in the eating.”

I also look forward to speaking about this at Saffron’s ExtremeIntelligence Manifesto event on March 7 in Reston VA. See www.saffrontech.com/events.shtml to register. We are already collecting a nice crowd, but I hope to see even more of you. It is a time for change.