Real Intelligence for Personalized Medicine

October 3rd, 2007

Saffron enjoyed another successful symposium in July with its government customers and partners. This was co-sponsored by Choicepoint i2, highlighting our integration with Analysts Notebook, the dominant desktop tool for intelligence analysis and other similar domains of financial fraud and crime investigation. The symposium also featured one of our customers, who is using Saffron for "associative targeting" of bad guy networks.

The maturity of the entity analytic market within national security is becoming clear. The government has been a leader in the adoption of entity extractors. Although all this rich information about people, places, and things and how they are linked has been made available, new ways are required to exploit this information. I heard a wonderful phrase to describe the current problem: Information overload has lead to "write once, read never". Data can be easily stored but is never exploited. With entity extraction, entity overload rather than document overload now overwhelms the analyst. As argued in prior postings, associative memories provide the "real intelligence" needed to properly represent and support inferential arguments over massive, complex entity networks.

Personalized medicine shares many of these same problems in data and entity analytics. I will be speaking on this topic in October at the 1st Annual Total Cancer Care Summit: the Future of Personalized Medicine, sponsored by the Moffitt Cancer Center. My presentation is entitled "From Bad Guys to Bad Genes: Associative Targeting of Terrorist Networks Applies to Gene Networks for Personalized Medicine". This is another critical mission that affects us all, and new methods developed by Saffron are required to uncover the signals that are clinically important for similar life-and-death challenges.

As highlighted in a recent article in the New England Journal of Medicine entitled "Drinking from the Fire Hose -Statistical Issues in Genomewide Association Studies", the science of genomics is well developed and data is available in unprecedented volume. However, appropriate methods of data analysis are lacking:

"But as we delve further into the genome in the search for networks of interacting gene variants and interactions between these networks and environment factors, much more sophisticated methods of statistical analysis are likely to be required."

This article highlights the statistical problems with SNP data, collecting 500,000 data points per individual case. Gene expression microarrays are an order of magnitude smaller, ranging from 10,000 to 30,000 simultaneous gene expressions, but statistical methods are still inadequate for such scale. "Association studies" (as they are called) hope to map genes to diseases and treatments, but they consider only linear associations, assuming that each gene maps simply to each disease or treatment. Even if the association studies are "genome wide", it is unlikely and probably wrong to assume that a single gene will represent the key. For example, BCR1 is a gene associated with breast cancer, the presence of which can help the physician in selecting the appropriate treatment. However, not all patients with BRC1 treated by accepted guidelines respond uniformly. All patients with BCR1 are not the same, and yet these simplistic linear assumptions treat them all as a single, homogeneous bucket. Clearly, other significant individual differences remain.

Increasingly, data analytic efforts are being criticized as having little or nothing to do with the complexity of known biology. See Joseph Terwilliger's "utter refutation" for a particularly scathing review. Linear association studies or even tree-based predictive models do not capture the complete, context-dependent network structure of genomic interactions. We know the principles of biology to be complex and non-linear, and yet we still base actions on naive assumptions because we lack the mathematical and software tools to address the complexity and scale of the problem.

Since Terwilliger's paper in 2006, a recent paper by the ENCODE consortium received a great of press this summer. This extensive 4 year study challenges the notion of discreet genes. DNA transcription is an interwoven complex of gene networks. One report of this effort, "Change to Gene Theory Raises New Challenges for Biotech", concludes that this new understanding will undermine the entire industry based on gene patents, with claims for how DNA sequences map to a "specific functional product". They do not.

Much like the challenges of national security, functional genomics presents another extreme case of large and complex networks. Like "bad guys", bad genes are responsible for rare, adverse events. Like taking out one bad guy, taking out only one bad gene might have very little lasting effect, if any effect at all. Moreover, other lessons learned from the targeting of bad guys will be relevant to bag genes: While linear associations of single genes to specific functions is giving way to understanding genes within networks, even the modeling of non-linear, pair-wise, co-regulations of genes remains a shallow model. Accounting for gene-gene interactions will also need to capture the context-dependency of network dynamics. For example, associative targeting of bad guys cannot rely on mere "guilt by association". The context of interaction becomes a critical factor in determining which associations are relevant to a problem and which are mere coincidences of other, more normal interaction. When do environmental factors and many other gene expressions regulate the functional interactions? The underlying physics and biochemistry affect if and when entities effectively meet to produce a clinically relevant consequence.

As argued in my prior postings, associative memory representation and reasoning are emerging as separate from and more powerful than traditional statistics and neural networks, which are also commonly used in bioinformatics. Traditional methods have tried to address "the curse of dimensionality" in massive and complex bio-data by assuming data-to-model reductionism - fitting complex raw data into simpler abstract models. By definition, such reductionism looses information and accuracy. Association studies have relied on one such reductionism, assuming a linear model between gene and function. These models are easy to compute but do not reflect real world complexities. Associative memories do.

Traditional data mining also suffers from the notorious costs and delays of knowledge engineering. Models tend to behighly parametric and non-incremental, requiring the "black arts" of model selection, feature selection, dataset training, parameter tweaking, and other management methods to ensure that such model fitting does not lead to over-fitting, causing furtherreduction in accuracy. As such, models are slow to be developed and deployed. Their slow construction speed and limited predictive accuracy must be replaced by new methods to provide a more scalable, rapid, and robust industry of personalized decision support. Now that genomic and other "omic" data has become available and there is a stronger scientific understanding of genetic complexity, methods of data analysis remain as a roadblock to the wide-spread clinical availability of personalized genomic medicine.

Clinical testing of single genes and SNPs is a step in the right direction, but these first generation "personalized" medical systems are poor and do not represent our scientific understanding of genes and the complex signature of the individual patient with a complex disease. Along with the broader measurement of gene networks rather than single genes in the clinic, it is imperative that new data analytic methods rapidly learn and deliver accurate intelligence and decision support to the clinician. As with associative targeting of bad guy networks for operational commanders, associative memories address the required scale and complexity of bad gene networks to also support the physician.

Come join me in the Bahamas this month for the Future of Personalized Medicine!

Real Intelligence for National Security, Part II

May 6th, 2007

As in my first posting, “It’s Time to Get Real“, Saffron Technology has been coming out of stealth mode the last few months, and things are hopping! We hosted our first public event in DC last month, a Manifesto for “ExtremeIntelligence™”, which was very well attended, and so we are planning another such event for June 6. I’ve also been busy traveling in the US and UK to spread the word. The need for a new solution is clear. As I listen to more people in national security and aligned interests, such as in fraud detection and anti-money laundering, it’s clear that customer needs are not being met by traditional statistics and rule-based models. They remain very inaccurate and expensive, particularly when they continue to generate high rates of false alarms that require the high costs of manual filtering to find the “good stuff”.

In my last posting, “Real Intelligence for National Security“, I criticized models (in contrast to memories) as too “eager”. By definition, traditional statistics, neural networks, and rules attempt to fit data to a particular model. As such, models engender all sorts of notorious problems in knowledge engineering. My main point was that such approaches will never by fully incremental - quickly learning case-by-case without a priori assumptions and without dependency on the sequence of data arrival. In contrast, memories embody a minimum commitment principle; they do not fit data to one model but simply try to be a good memory of as much information as possible. This makes then much more flexible at query time to answer any number of questions.

In this posting, I cover a few of the questions that associative memories can answer. For each, the major differentiation is that memories carry more information than abstract models. Using rather than reducing information allows memories to be more accurate.

Who is similar?
Many commercial technologies that existed before September 11, 2001 were quickly applied to national security. While I applaud the patriotism of these efforts, there remains a growing dissatisfaction with many of the applied solutions. I remember an intra-agency summit on “entity disambiguation” where the session leader read a public press release from a leading vendor claiming to be the answer to national security and then asked, “So who thinks the problem has been solved and we can all go home now?” The audience responded with general laughter. This vendor’s rule-based approach had been tested as only 2% accurate when applied to a very sparse foreign intelligence data.

In contrast, in this same comparative study, our associative memory-based approach demonstrated 93% accuracy, with a very large crème-de-la-crème result of 100% perfection at the top of the alias and name variant list. As introduced in my last posting, a CATO Institute paper by Jonas and Harper first suggested but then deemed such results improbable. “Perhaps, through more assiduous work by government authorities and contractors- using a great deal more data - could overcome the low precision of data mining and bring false positives from 90+ percent to the low single digits.” Associative memory-based methods have already reached these goals and should replace the low accuracy of more traditional modeling.

The magnitude of this difference proves memory-based reasoning as a fundamentally different approach - not a VHS versus BetaMax competition over small differences. The reasons for the difference are easy to understand. Memories as compared to abstract models can hold and use more information about each entity and its features, relationships, and behaviors. While rule-base lexical and other pattern matching methods typically use 2 to 4 principle attributes, memories know and use all the available information about attributes as well as their nonlinear correlations. If you know how to handle more information, then having more information will always be an advantage.

Who is related?
I remember meeting some real-deal special operations folk. We were talking about social network analysis, and I said, “Even I am related to Osama bin Laden by six degrees of separation!” I got a bit nervous in the following seconds when these guys looked at me more intensely! But seriously, who cares? Simplistic social network analysis and methods of “guilt by association” are weak and inaccurate. Like abstract rules and statistics that lose information, social network graphs lack the richer context of each entity-entity association. Plying through friends-of-friends-of-friends social networks will find lots of connections. But do any of them really matter to the task at hand? What is the context of the relationship to decide appropriate guilt and targeting beyond mere association? Currently, analysts are still faced with filtering through massive lists and “balls of yarn” visualizations to find what really matters. When I interrogate an entity’s connections, I don’t want its Roledex. As when asking a friend for references to help solve a particular question, I want only those links that are relevant to my question.

Saffron’s solution to the associative memory scaling problem allows us to remember and recall all the context of all the associations amongst all entities across all data sources. Memories represent the meaning of associations by also remembering their context: when, where and what existed, was materially exchanged, or discussed between the entities. In subsequently reasoning from memory, some relationships are relevant or informative and others are not, depending on a specific question. The richness and detail of associative memories to represent entity networks and their contextual relationships is different than the reductionistic simplicity of node-arc social network models. Instead of every entity having just a name, every entity has a complete associative memory! Each entity memory holds its connections to other entities as well as all the conditions around each connection. Memories are represented by associative matrices, one for each and every entity, each of which can hold millions of attributes and their coincidences, among millions of such entity matrices on just a few off-the-shelf computers.

When a user asks a question, the system first returns entities, not documents. Think of search results as “entity rank” rather than “page rank”. For each relevant entity, results also include other entities and terms that have been in context with the entity and the query term. For example, if I’m interrogating Abu Hamza’s memory (a notorious cleric) for associations to foreign countries, I can also see the people he associated with in the context of each relevant country, such as in the US, Yemen, or Afghanistan. Each memory also recalls the most relevant “snippets” (sentences or paragraphs) of evidence for how each entity is relevant to the query context. While typical search engines return documents relevant to the query at large, each memory returns the snippets most relevant to the context of the query and its particular entity. Such high precision, context-dependent results are made possible by the richly detailed, non-linear memories for each and every entity.

What might happen?
Jonas and Harper argue that data mining will not work to uncover new terror plots. However, they then suggest that no new technology is required or even capable! See Kim Taipale’s testimony to the U.S. Senate Committee on the Judiciary for a more complete technical and legal refutation, with which I totally agree. I also agree with Jonas and Harper: that traditional data mining is inadequate. However, they use a very narrow definition and scenario of data mining to then argue against any new approach to data analysis more generally. As Taipale counters much more articulately than I can, Jonas and Harper present a “straw man” argument. For example, Jonas and Harper argue that terrorist attacks do not happen with enough frequency to learn and predict them. True, but Taipale clarifies how there are repeating patterns of terrorist activity when looking at terrorism’s precursor events. Terrorists must engage in surveillance, planning, and supplying before an attack. These precursor activities are repeated; therefore, they can be learned and predicted. But not by traditional data mining, which is too “data hungry” in relying on the probabilistic Law of Large Numbers.

Memory-based prediction is more akin to nearest-neighbor reasoning as compared to probabilistic reasoning. Memories reason from prior experience, even if that experience is sparse. Memory-based reasoning is even capable of “one-shot learning”; how many times do you need to touch a hot stove before learning that it is not a pleasant experience? Memories reach asymptotic accuracy with only a fraction of the cases needed by other methods. For example, one of our OEM partner’s products, Electronic Learning Assistant, has been declared the “The World’s Best Spam Blocker” when compared to others using Bayesian, collaborative, rules, and other methods when given only a few examples. Another partner, Intelligent Agent Corporation, has demonstrated expert human levels of adverse event prediction in the Oil&Gas industry when given only a few examples - when statistical methods fail.

If this predictive reasoning by similarity to experience reminds you of the first question, “Who (or what) is similar,” it should. Analogical reasoning is a key part of predictive reasoning. We predict what will happen by similarity to specific cases within vast amounts of past experience. Memory-based reasoning captures and uses more information, in both detail and scope. It is easy to be more accurate by using more information, which other methods lose. As in my first posting, this approach is emerging as more natural and powerful when compared to the more brutish and questionable technologies of the last century, such as rule-based reasoning and statistical data mining.


One of my next postings will address personalized medicine as extremely relevant to this discussion. While it is difficult to speak about our work in national security, personalized medicine provides a platform to further discuss problems of reductionistic data modeling and how memory-based solutions overcome them. Consider how the statistical generalization of a drug as “safe and effective” is becoming increasingly unreliable. Such statistics do not consider the specific genetic and environmental factors of each individual. Stereotyping all members of a group as the same is both inaccurate and wrong-headed, whether predicting an individual’s terrorist threat or an individual’s drug sensitivity. Both domains represent serious life and death issues where the scale and complexity of the data demands newer, faster, and more accurate methods.


Real Intelligence for National Security

February 23rd, 2007

It has been an exciting and busy time since my last posting! For one, it was wonderful to visit London again. I was also in London right after the subway bombings a couple of summers ago. Then as now, I deeply admire the British stiff upper lip, no matter if facing WWII bombs from above or today’s terror within. I had been thinking of what to write next regarding national security as promised, but clearly, the topic of counter terrorism is a matter of international concern. The Brits are wonderful in showing how daily life goes on despite being on the front line of this new war. However, the world has continued to change in this new millennium, while we continue to rely on rather brutish technology.

Listen. I work for the CIA. I am not a spy. I just read books. We read everything that is published in the world, and we feed the plots, dirty tricks, [and] codes into a computer. And the computer checks against actual CIA plans and operations. I look for leaks. I look for new ideas. We read adventures and novels and journals. I…… Who would invent a job like that?

Three Days of the Condor, 1975

I love this quote! The job of an intelligence analyst was crazy even in 1975. Today, it is nearly impossible, and new technology is needed to help read an ever-growing amount of data. It is not just about reading books these days. The volume of the Web, including “deep Web” chat rooms, blogs, and more is completely overwhelming. In my last posting, I introduced “real intelligence” as memory-based reasoning. International security needs real intelligence to help read, comprehend, and remember everything so that analysts do not have to.

I recently read that Tom Fingar, the Deputy Director of National Intelligence for Analysis, declared “It isn’t intelligence until it has been processed through the brain of an analyst.” Until then, it is all just data. We are drowning in data and do not have enough analytical brains. Technology is required to help. But why do we still use the same old technology when we are fighting a new kind of war? It is a new millennium with a new kind of warfare, and yet we still use questionable 20th century technology like rule-based systems and data mining. Remember my first posting about Tukey and his doubts about statistical inference. The joking derision, “That is so 20th Century!”, can also be said (and often is said) against databases, rule-based systems, search engines, and data mining. Because I am a biologist and psychologists, even when data mining is based on “neural networks”, I believe this have little if anything to do with real neurons and real intelligence.

In a recent paper from the CATO Institute Jeff Jonas and Jim Harper highlighted that traditional data mining has failed to address the needs of the intelligence community. Statistical reasoning about population distributions and abstract generalities is irrelevant when looking for rare adverse events in a dynamic sea of normality. One of their arguments, to which I agree, is that data mining is very data “hungry” and requires enough historical examples for its notions of statistical power and significance.

I agree that the data mining methods have been marginally effective even for commercial problems, such as for predicting consumer behavior, and have largely failed when re-applied to the harder challenges of intelligence analysis. On the other hand, memory-based reasoning is a different type of data analysis, more akin to reasoning in real brains rather than reasoning by rules and statistics. The memory-based approach to data analysis does not share these problems, such as being data hungry. Real intelligence reasons by similarity to specific experiences. If I touch a glowing stove once, I get burned and learn not to touch it again. The use of more or less data is irrelevant to this kind of experience-based reasoning.

David Aha’s edition on “lazy learning” defines a class of machine learning methods within this new approach. They include memory-based, case-based, experience-based, and instance-based reasoning. This class is distinguished from “eager” learners such as traditional statistics and neural networks which try to fit data to a particular model. This fitting of data leads to all sort of problems, including complex parameterizations, dependency on the order of data arrival, and over-fitting the data to the model. For example, if you believe in model fitting, you must worry about over-fitting.

The many problems of fitting data to models are understood in statistical modeling, but the same it true for rule-based inferencing as just another kind of modeling. For example, one rule-based system for identity management (including alias detection) attempts to make decisions about aliases and name variants when given each new piece of identity data. Its rule-based reasoning leads to decisions about combining identities that are order dependent, and the consequences of these decisions are then impossible to untangle. These models become increasingly wrong and impossible to fix. There is an enduring hope and occasional claim that rule-based and statistical systems will become fully incremental and “sequence neutral” one day. This is a naive hope.

Suppose I gave you the mean of a distribution (of 10 numbers, let’s say, but I do not provide you with the size of the distribution). An average number is useful and can be used for decision making, such as deciding on whether something is below or above average. However, if I give you another number and ask you to update the average, you do not have sufficient information to do so. It is impossible to update even this simple statistic as new data arrives. And this is a very trivial case. The problem of incremental change becomes increasingly difficult with greater scale and complexity and will never be addressed by reductionistic statistical models, neural networks, and rules. Decision rules are too eager and confound what they know with what they have already decided to do about what they previously knew, which is wrong and even dangerous.

Memories on the other hand are perfectly incremental and order independent. They store association counts; counts are perfectly incremented with each and every observation. Any order of data arrival results in the same final count. In the case of alias detection, memories store all the co-incident information about each identity and can quickly recall similar entities when asked (using k-nearest neighbors, for example). The answers might change as new data arrives, but any decisions made about entity similarity do not confound the information about similarity itself. A memory updates its information about new data but does not include decisions about how to fit the new data according to some a priori model. Lazy learners are called “lazy” because they adhere to this principle of minimum commitment. Memories read, comprehend, and remember all the information without reductionistic fitting to a particular model. This memory of detail makes them much more universal, stable and accurate.

We need this new approach for international security because we need more brains. To again quote Tom Fingar, “We don’t have enough analytical brains to meet all of the challenges. We have to rely on technology.” So why don’t we add more intelligent brain-like technology? Brains learn and think much more quickly and fluidly than traditional technology. For example, our brains do not learn by the notoriously slow, parametric, data-hungry methods of data mining - even methods called “neural” networks. Instead, we learn by memory: comprehending new information by instant integration with what we already know. The best analytic minds do not fit new data into a single a priori model confounded with priori decisions. They read and remember and then quickly reasoning to new situations as the world unfolds. Memory-based representation and reasoning will help us think about the harder questions we face today. As we have all read in the news the past year or more, over-eager analytical reasoning is disastrous!

We are still clinging to technologies that have failed and will continue to fail the intelligence community. I am passionate about this belief and will have more to say in my next posting. I will include more examples of these failures and how our new approach is succeeding. As the saying goes, “The proof of the pudding is in the eating.”

I also look forward to speaking about this at Saffron’s ExtremeIntelligence Manifesto event on March 7 in Reston VA. See www.saffrontech.com/events.shtml to register. We are already collecting a nice crowd, but I hope to see even more of you. It is a time for change.


It’s Time to Get Real

January 17th, 2007

Imagine having an army of personal assistants (think millions) who read and observe everything you give them (think terabytes). This army notes every association amongst all the entities (people, places and things) in great detail (think millions of attributes and their co-incidences for each entity). Then best of all, it never forgets. That’s Saffron’s associative memory technology

Welcome. As seen on our new web site and now my first blog, we are coming out of stealth mode to more openly present our company as we “get real”. Saffron has solved a decade old problem of machine intelligence, has established customers and partners with real solutions, and has the executive team in place to change entire industries with a new, disruptive approach to data analysis. While many emerging companies claim to be revolutionary with patented breakthroughs, these claims are usually shallow, but we are beginning to speak-out and prove it true in our case.

Saffron is the realization of a 20 year dream for me. I’ve always been more of a biologist-psychologist than a computer scientist. Since I was in grad school and saw the rise of “neural network” computing in the 1980s, I have believed there was a more natural, biologically inspired way for machines to learn - more the way real brains work using associative memories, which can remember and recall how everything is (potentially) related to everything else and can then reason from memory. When we say that a doctor, lawyer, teacher, executive, analyst, or engineer is wise, we mean that they are experienced and draw on this experience when making new decisions. This is the way machine intelligence should work as well.

When I was an engineer at IBM, our interest was in solving very large-scale enterprise problems. It was clear that every industry was beginning to face the problem of too much data (structured and even more unstructured) and that traditional search, AI, statistical techniques, and neural network data mining were going to fail. Being involved in intelligent agents, I was also starting to hear customers ask for agents that were adaptive and agents that could learn I felt that the market was getting ready for a “new, new thing”.

However, like decades of others who believed in associative memories as the right approach but then failed to commercialize them, we continued to be faced with problems of how associative memories could scale. Remembering the connections from everything to everything else just doesn’t scale well. We believed that real neurons had solved this problem, and looked for inspiration from real systems in order to engineer smarter machines. So in 1999, Jim Fleming and I founded Saffron Technology to solve the associative memory scaling problem - or go back and get “real” jobs again if we failed!

Chalk it up to naive optimism, which is a requirement to leave a great job and start a new venture in any case, but we soon learned just how hard the problem was to solve. “You don’t know how deep the pond is until you step in it”, as they say. It took us longer than I thought it would be fun to crack this nut. It also took longer than I thought for vertical markets to become increasingly dissatisfied with traditional approaches. However, the “heavy lifting” demands of national security and personalized medicine have also grown over the years since our founding, and these are the markets which we now serve. Saffron’s engine has also powered everything from “The World’s Best Spam Blocker” to adverse event predictors in oil and gas production. While these applications have been the focus of our partners, we are focused on advancing the core technology. Working quietly over the years, today we have a mature solution to the growing scale and complexity of enterprises using memory-based representation and reasoning. Saffron now stands as the real leader of this new industry.

I am often asked for reading material on associative memories. While there are various books and articles of related approaches in the history of neurocomputing and in the psychology of learning and memory (if you are more technically-minded), I like to suggest On Intelligence by Jeff Hawkins is a very engaging and accessible read. While other true believers in associative memories have been few and far between over the last decades, Jeff has been one of them. In On Intelligence, he makes the case that “real intelligence” is not what has been seen in Artificial Intelligence or even in “neural networks”, which in my opinion have little if anything to do with real neural networks. Instead, real intelligence is memory-based, and Jeff defines it as “memory-based prediction”. On Intelligence provides common-sense examples and very readable arguments about why this is true of your own brain, and in the last chapter, he predicts an emerging industry of memory-based systems in the next few years. We agree and present Saffron Technology as the turning point, providing mature products and solutions today. Our enterprise “memory base” along with our turnkey applications has matured over many release cycles, including partner and customer feedback to prepare us for today.

My next postings will address our current applications of real intelligence to national security and then onward to personalized medicine. For now, I wish only to welcome you and ask for your comments and dialog. Using this blog as a manifesto, I intend to be controversial. Please just fire back to help me articulate this new approach with concrete realities rather than mere claims. To start this dialog, let me leave you with some doubt about the adequacy of traditional approaches from one of the great leaders of 20th Century statistics:

“For a long time I have thought that I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt.” John Tukey, “The Future of Data Analysis”, 1962.

Saffron provides you the ability to focus on the particular. Unlike statistical assumptions and the reductionism of fitting data into abstract models that lose too much information and fail, Saffron provides a new and unique way to address the growing problems of data complexity and scale. This is exciting stuff, and I hope you will join us in both telling and further creating this story.