Helping At-Risk Veterans

Patterns and Predictions analyzes social, mobile, and linguistics data – in real time – to assess mental health risk.

Overview

Patterns and Predictions (P&P) is a predictive analytics firm with a core technology that provides unstructured and linguistics driven prediction. It is the technology powering the Durkheim Project’s ‘Big Data’ analytics network for the assessment of mental health risks -- veteran suicide risk in particular. The technical rubric for the project is “maximum speed at minimum cost."

“With Cloudera Search and Impala, our ingestion of data on Hadoop is promisingly efficient in terms of lower costs, better computational throughput, and reduced complexity of IT support.”


Chris Poulin, Principal Partner, Patterns and Predictions

The Durkheim Project began in 2010 with initial funding by the DARPA. In 2011, P&P began sourcing the technology and building out the integrated foundational infrastructure and predictive modeling that would support the project’s extensive data collection and analysis, once it was scaled up.

Phase One of the project began with a study of three cohorts, with 100 subjects each, representing “non-psychiatric”, “psychiatric”, and “suicide positive” profiles. The researchers developed linguistics-driven prediction models to estimate suicide risk, generated from unstructured clinical notes. As participants join, individual profiles are set up and accessible, via a dashboard, to researchers at Geisel and to clinicians. The system assigns overall risk scores to each profile based on the collective information and on keywords that are specific to each participant.

  • > Over a terabyte of data is processed every day, in real time
  • > Up to 100,000 active duty military and veterans are supported

Use Case

The technical rubric for the project is “maximum speed at minimum cost”, which prompted early adoption of Cloudera Search and Cloudera Impala. “The project has a very complex workflow,” explained Poulin. “All of our machine learning is indexed, and we actually access all of the machine learning through search interfaces, which can get expensive. With Cloudera Search and Impala, our ingestion of data on Hadoop is promisingly efficient in terms of lower costs, better computational throughput, and reduced complexity of IT support.”

Cloudera’s category leadership and subject matter expertise with Hadoop and Big Data led Poulin to engage Cloudera Professional Services to co-develop Bayesian counters, a lightweight statistical model that detects risk at scale, based on Apache HBase and CDH (Cloudera’s Distribution Including Apache Hadoop), the market-leading, 100% open source distribution of Hadoop and related projects. The Cloudera based framework is a cornerstone technology of the Durkheim Project.

The Phase One build and testing concluded in early 2013. It validated that the project’s machine learning data fabric was viable, with predictive capabilities that were 65% accurate in predicting suicide risk among a veteran control group.