Legal Search Science

Gold_Lexie_robotLegal Search Science is an interdisciplinary field concerned with the search, review, and classification of large collections of electronic documents to find information for use as evidence in legal proceedings, for compliance to avoid litigation, or for general business intelligence. See and Computer Assisted ReviewLegal Search Science as practiced today uses software with artificial intelligence features to help lawyers to find electronic evidence in a systematic, repeatable, and verifiable manner. The hybrid search method of AI human computer interaction developed in this field that will inevitably have a dramatic impact on the future practice of law. Lawyers will never be replaced entirely by robots embodying AI search algorithms, but some lawyers are already using them to significantly enhance their abilities and singlehandedly do the work of dozens, if not hundreds, of lawyers.

My own experience (Ralph Losey) provides an example. I participated in a study in 2013 where I searched and reviewed over 1.6 Millions documents by myself, with only the assistance of one computer – one robot, so to speak – running AI-enhanced software by Kroll Ontrack. I was able to do so more accurately and faster than large teams of lawyers working without artificial intelligence software. I was even able to work faster and more accurately than all other teams of lawyers and vendors that used AI-enhanced software, but did not use the science-based search methods described here. I do not attribute my success to my own intelligence, or any special gifts or talents. I was able to succeed by applying the established scientific methods described here. They allowed me to augment my own small intelligence with that of the machine. If I have any special skills, it is in human-computer interaction, and legal search intuition. They are based on my long experience in the law with evidence (over 34 years), and in my experience in the last few years using predictive coding software.

Team_Triangle_2Legal Search Science as I understand it is a combination and subset of three fields of study: Information Science, the legal field of Electronic Discovery, and the engineering field concerned with the design and creation of Search Software. Its primary concern is with information retrieval and the unique problems faced by lawyers in the discovery of relevant evidence.

Most specialists in legal search science use a variety of search methods when searching large datasets. The use of multiple methods of search is referred to here as a multimodal approach. Although many search methods are used at the same time, the primary, or controlling search method in large projects is typically what is known as supervised or semi-supervised machine learning.  Semi-supervised learning is a type of artificial intelligence (AI) that uses an active learning approach. I refer to this as AI-enhanced review or AI-enhanced search. In information science it is often referred to as active machine learningand in legal circles as Predictive Coding.

For reliable introductory information on Legal Search Science see the works of attorney, Maura Grossman, and her information scientist partner, Professor Gordon Cormack, including:

The Grossman-Cormack Glossary explains that in machine learning:

Supervised Learning Algorithms (e.g., Support Vector Machines, Logistic Regression, Nearest Neighbor, and Bayesian Classifiers) are used to infer Relevance or Non-Relevance of Documents based on the Coding of Documents in a Training Set. In Electronic Discovery generally, Unsupervised Learning Algorithms are used for Clustering, Near-Duplicate Detection, and Concept Search.

Legal Search ScienceMultimodal search uses both machine learning algorithms and unsupervised learning search tools (clustering, near-duplicates and concept), as well as keyword search and even some limited use of traditional linear search. This is further explained here in the section below entitled, Hybrid Multimodal Bottom Line Driven Review. The hybrid multimodal aspects described represent the consensus view among information search scientists. The bottom line driven aspects represent my legal overlay on the search methods. All of these components together make up what I call Legal Search Science. It represents a synthesis of knowledge and search methods from science, law, and software engineering.

The key definition of the Glossary is for Technology Assisted Review, their term for AI-enhanced review.

Technology-Assisted Review (TAR): A process for Prioritizing or Coding a Collection of Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of Documents and then extrapolates those judgments to the remaining Document Collection. Some TAR methods use Machine Learning Algorithms to distinguish Relevant from Non-Relevant Documents, based on Training Examples Coded as Relevant or Non-Relevant by the Subject Matter Experts(s), …. TAR processes generally incorporate Statistical Models and/or Sampling techniques to guide the process and to measure overall system effectiveness.

The Grossman-Cormack Glossary makes clear the importance of Subject Matter Experts (SMEs) by including their use as the document trainer into the very definition of TAR. Nevertheless, experts agree that good predictive coding software is able to tolerate some errors made in the training documents. For this reason experiments are being done on ways to minimize the central role of the SMEs, to see if lesser-qualified persons could also be used in document training, at least to some degree. See Webber & Pickens, Assessor Disagreement and Text Classifier Accuracy (SIGIR, 2013); John Tredennick, Subject Matter Experts: What Role Should They Play in TAR 2.0 Training? (2013). These experiments are of special concern to software developers and others who would like to increase the utilization of AI-enhanced software because, at the current time, very few SMEs in the law have the skills or time necessary to conduct AI-enhanced searches. This is one reason that predictive coding is still not widely used, even though it has been proven effective in multiple experiments and adopted by several courts.

Professor Oard

Professor Doug Oard

For in-depth information on key experiments already performed in the field of Legal Search Science, see the TREC Legal Track reports whose home page is maintained by a leader in the field, information scientist, Doug Oard. Professor Oard is a co-founder of the TREC Legal track. Also see the research and reports of Herb Rotiblat and the Electronic Discovery Institute, and my papers on TREC (and otherwise as listed below): Analysis of the Official Report on the 2011 TREC Legal Track – Part OnePart Two and Part Three; and Secrets of Search: Parts OneTwo, and Three.

For general legal background on the field of Legal Search Science see the works of the attorney co-founder of TREC Legal Track, Jason R. Baron, including:

Baron_at_blackboardAs explained in Baron and Freeman’s Quick Peek at the Math, and my blog introduction thereto, the supervised learning algorithms behind predictive coding utilize a hyper-dimensional space. Each each document in the dataset, including its metadata, represent a different dimension mapped in trans-Cartesian space, called hyper-planes. Each document is placed according to a multi-dimensional dividing line of relevant and irrelevant. The important document ranking feature of predictive coding is performed by measure as to how far from the dividing line a particular document lies. Each time a training session is run the line moves and the ranking fluctuates in accordance with the new information provided. The below diagram attempts to portray this hyperplane division and document placement. The points shown in red designate irrelevant documents and the blue points relevant documents. The dividing line would run through multiple dimensions, not just the usual two of a Cartesian graph. This is depicted in this diagram by folding fields. For more read the entire Quick Peek article.hyperplanes3d_2

For a scientific and statistical view of Legal Search Science that is often at least somewhat intelligible to lawyers and other non-scientists, see the blog of information scientist and consultant, William Webber, Evaluating e-Discovery. For writings designed for the general reader on the subject of predictive coding, see the many articles by attorney Karl Scheineman, another pioneer in the field.

AI-Enhanced Search Methods

AI-enhanced search represents an entirely new method of legal search, which requires a completely new approach to large document reviews. Below is the diagram that I created to show the new workflow that I use in a typical predictive coding project.



For a more detailed description of the eight steps see the Electronic Discovery Best Practices page on predictive coding. For another somewhat similar workflow description see the diagram below of Kroll Ontrack, the vendor whose predictive coding software I now frequently use. Their seven-step model is described at pages 3-4 of Kroll Ontrack’s white paper, Technology Assisted Review: Driving Ediscovery Efficiencies in the Era of Big Data (2013).


I have found that proper AI-enhanced review requires the highest skill levels and is, for me at least, the most challenging activity in electronic discovery law. See: Electronic Discovery Best Practices for a description of the ten types of legal services involved in e-discovery. I am convinced that predictive coding is The big new tool that we have all been waiting for. When used properly, good AI-enhanced software allows attorneys to find the information they need in vast stores of ESI, and to do so in an effective and affordable manner.

In my experience the best software and training methods use AI type active learning process in steps four and five of my chart above and steps 2-5 of Kroll Ontrack’s chart. My preferred active learning process in the iterative machine learning steps is threefold:

  1. The computer selects documents for review where the software classifier is uncertain of the correct classification. This helps the classifier algorithms to learn by adding diversity to the documents presented for review. This in turn helps to locate outliers of a type your initial judgmental searches in step two and five have missed. This is machine selected sampling, and, according to a basic text in information retrieval engineering, a process is not a bona fide active learning search without this ability. Manning, Raghavan and Schutze, Introduction to Information Retrieval, (Cambridge, 2008) at pg. 309.
  2. Some reasonable percentage of the documents presented for human review in step five are selected at random. This again helps maximize recall and premature focus on the relevant documents initially retrieved.
  3. Other relevant documents that a skilled reviewer can find using a variety of search techniques. This is called judgmental sampling. See Baron, Jason, Co-Editor, The Sedona Conference® Commentary on Achieving Quality in the E-Discovery Process (2009). Judgmental sampling can use a variety of search tools, including both the mentioned Supervised and Unsupervised Learning Algorithms, and is more further described below.

The initial seed set generation, step two in my chart, should also use some random samples, plus judgmental multimodal searches. Steps three and six in my chart always use pure random samples and rely on statistical analysis. For background on the three types of sampling see my article, Three-Cylinder Multimodal Approach To Predictive Coding.

Judgmental Sampling

After the first round of training, aka the seed set, judgmental sampling continues along with random and machine selected sampling in steps four and five. In judgmental sampling the human reviewer often selects additional documents for review and training that are based on the machine selected or random selected documents presented for review. Sometimes, however, the SME human reviewer follows a new search idea unrelated to the new documents seen. When an experienced searcher sees new documents this often leads to new search ideas.

All kinds of searches can be used for judgmental sampling, which is why I call it a multimodal search. This may include some linear review of selected custodians or selected date ranges, parametric Boolean keyword searches, similarity searches of all kinds, clustering searches, concept searches, as well as several unique predictive coding probability searches. These document probability searches are based primarily on the unique document ranking capabilities of most AI-enhanced search software. I find the ranking based searches to be extremely helpful to maximize efficiency and effectiveness.

I also often use informal random sampling of returned search sets as part of the judgmental sampling review and evaluation process. This is a process where I browse through search results, both according to ranking and random views. I will also use various document sorting views to get a better understanding of the documents returned. I will also use different search methods and document views according to the type of data, the custodian, where it was originally stored, or even any sub-search-goal or issue I might be focused on at any one point in time. A good search follows systems and is repeatable, but it is also fluid. It is adaptable to new information uncovered from the documents searched, or from new information received elsewhere about the case, or from new documents added to the search in mid-course. Good search and review software is designed to allow for both flexibility and a systems approach.

All of these methods allow an experienced legal searcher to get a feel for the underlying data. This obviates the need for full linear review of the multimodal search results in the judgmental sampling process. Still, it is sometimes appropriate to read a few hundred documents in a linear fashion, for instance, all email a key witness received on a critical day.

AI featured multimodal search represents a move from traditional legal art to science. See: Seven Years of the e-Discovery Team Blog, Art to Science. But there is still room for art in the sense of the deeply engrained skills and intuition that lawyers can only gain from years of experience with legal search. Knowledgable lawyers have unique insights into the evidence and witnesses involved in a particular case or type of dispute. This background knowledge and experience allows a skilled SME to improvise. The searcher can change directions depending on the documents found, and depending on new documents added to the dataset, or even new issues and a changed scope of relevance. Indeed, the search methods used in multimodal judgmental sampling vary considerably, both on a project by project basis, and over time in the same project, as understanding of the data, targets, and search develops. This is where years of legal search experience can be extremely valuable. All well designed predictive coding software allows for such a flexible approach to empower the attorneys conducting the search.

The CAL Variation

After study of the 2014 experiments performed by Professor Cormack and Maura Grossman, I have added a variation to the predictive coding work flow, which they call CAL, for Continuous Active Learning. Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic DiscoverySIGIR’14, July 6–11, 2014, at pg. 9. Also see Latest Grossman and Cormack Study Proves Folly of Using Random Search for Machine Training – Parts One,  TwoThree and Four. The part that intrigued me about there study was the use of continuous machine training as part of the entire review. This is explained in detail in Part Three of my lengthy blog series on the Cormack Grossman study.

My practical takeaway from their experiments and 2014 SIGR report is that focusing on high ranking documents is a powerful search method, whereas a random only search is pure folly. The form of CAL that they tested trained using high probable relevant documents in all but the first training round. (In the first round, the so called seed set, they trained using documents found by keyword search.) This experiment showed that the method of review of the documents with the highest rankings works well, and should be given significant weight in any multimodal approach, especially when the goal is to quickly find as many relevant documents as possible.

The “continuous” training aspects of the CAL approach means that you keep doing machine training throughout the review project and batch reviews accordingly. This could become a project management issue. But, if you can pull it off within proportionality and requesting party constraints, it just makes common sense to do so. You might as well get as much help from the machine as possible and keep getting its probability predictions for as long as you are still doing reviews and can make last minute batch assignments accordingly.

I have done several reviews in such a continuous training manner without really thinking about the fact that the machine input was continuous, including my first Enron experiment. Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron. But the Cormack Grossman study on the continuous active learning approach caused me to rethink the flow chart shown above that I usually use to explain the predictive coding process. The work flows shown before do not use a CAL approach, but rather an approach the Cormack Grossman report calls a simple approach, where you review and train, but then at some point stop training and final review is done. Under the simple approach there is a distinct stop in training after step five, and the review work in step seven is based on the last rankings established in step five.

The continuous work flow is slightly more difficult to show in a diagram, and to implement, but it does make good common sense if you are in a position to pull it off. Below is the revised workflow that illustrates how the training continues throughout the review.


Machine training is still done in steps four and five, but then continues in steps four, five and seven. There are other ways it could be implemented of course, but this is the CAL approach I would use in a review project where such complex batching and continuous training otherwise makes sense. Of course, it is not necessary in any project were the review in steps four and five effectively finds all of the relevant documents required. This is what happened in my Enron experiment. Predictive Coding Narrative: Searching for Relevance in the Ashes of EnronThere was no need to do a proportional final review, step seven, because all the relevant documents had already been reviewed as part of the machine training review in steps four and five. In the Enron experiment I skipped step seven and when right from step six to step eight, production. I have been able to do this is other projects as well.


My insistence on the use of multimodal judgmental sampling in steps two and five to locate relevant documents follows the consensus view of information scientists specializing in information retrieval, but is not followed by several prominent predictive coding vendors. They instead rely entirely on machine selected documents for training, or even worse, rely entirely on random selected documents to train the software. In my writings I call these processes the Borg approach, after the infamous villains in Star Trek, the Borg, a race half-human robots that assimilates people into machines. (I further differentiate between three types of Borg in Three-Cylinder Multimodal Approach To Predictive Coding.) Like the Borg, these approaches unnecessarily minimize the role of individuals, the SMEs. They exclude other types of search to supplement an active learning process. I advocate the use of all types of search, not just predictive coding.

Hybrid Human Computer Information Retrieval


Further, in contradistinction to Borg approaches, where the machine controls the learning process, I advocate a hybrid approach where Man and Machine work together. In my hybrid search and review projects the expert reviewer remains in control of the process, and their expertise is leveraged for greater accuracy and speed. The human intelligence of the SME is a key part of the search process. In the scholarly literature of information science this hybrid approach is known as Human–computer information retrieval (HCIR). (My thanks to information scientist Jeremy Pickens for pointing out this literature to me.)

The classic text in the area of HCIR, which I endorse, is Information Seeking in Electronic Environments (Cambridge 1995) by Gary Marchionini, Professor and Dean of the School of Information and Library Sciences of U.N.C. at Chapel Hill. Professor Marchionini speaks of three types of expertise needed for a successful information seeker:

  1. Domain Expertise. This is equivalent to what we now call SME, subject matter expertise. It refers to a domain of knowledge. In the context of law the domain would refer to the particular type of lawsuit or legal investigation, such as antitrust, patent, ERISA, discrimination, trade-secrets, breach of contract, Qui Tam, etc. The knowledge of the SME on the particular search goal is extrapolated by the software algorithms to guide the search. If the SME also has the next described System Expertise and Information Seeking Expertise, they can run the search project themselves. That is what I like to call the Army of One approach. Otherwise, they will need a chauffeur or surrogate with such expertise, one who is capable of learning enough from the SME to recognize the relevant documents.
  2. System Expertise. This refers to expertise in the technology system used for the search. A system expert in predictive coding would have a deep and detailed knowledge of the software they are using, including the ability to customize the software and use all of its features. In computer circles a person with such skills is often called a power-user. Ideally a power-user would have expertise in several different software systems. They would also be an expert in one or more particular method of search.
  3. Information Seeking Expertise. This is a skill that is often overlooked in legal search. It refers to a general cognitive skills related to information seeking. It is based on both experience and innate talents. For instance, “capabilities such as superior memory and visual scanning abilities interact to support broader and more purposive examination of text.” Professor Marchionini goes on to say that: “One goal of human-computer interaction research is to apply computing power to amplify and augment these human abilities.” Some lawyers seem to have a gift for search, which they refine with experience, broaden with knowledge of different tools, and enhance with technologies. Others do not.

Id. at pgs.66-69, with the quotes from pg. 69.

All three of these skills are required for an attorney to attain expertise in legal search today, which is one reason I find this new area of legal practice so challenging. It is difficult, but not impossible like this Penrose triangle.


It is not enough to be an SME, or a power-user, or have a special knack for search. You have to be able to do it all, and so does your software. However, studies have shown that of the three skill-sets, System Expertise, which in legal search primarily means mastery of the particular software used, is the least important. Id. at 67. The SMEs are more important, those who have mastered a domain of knowledge. In Professor Marchionini’s words:

Thus, experts in a domain have greater facility and experience related to information-seeking factors specific to the domain and are able to execute the subprocesses of information seeking with speed, confidence, and accuracy.

Id. That is one reason that the Grossman Cormack glossary quoted before builds in the role of SMEs as part of their base definition of technology assisted review. Glossary at pg. 21 defining TAR.

According to Marchionini, Information Seeking Expertise, much like Subject Matter Expertise, is also more important than specific software mastery. Id. This may seem counter-intuitive in the age of Google, where an illusion of simplicity is created by typing in words to find websites. But legal search of user-created data is a completely different type of search task than looking for information from popular websites. In the search for evidence in a litigation, or as part of a legal investigation, special expertise in information seeking is critical, including especially knowledge of multiple search techniques and methods. Again quoting Professor Marchionini:

Expert information seekers possess substantial knowledge related to the factors of information seeking, have developed distinct patterns of searching, and use a variety of strategies, tactics and moves.

Id. at 70.

In the field of law this kind of information seeking expertise includes the ability to understand and clarify what the information need is, in other words, to know what you are looking for, and articulate the need into specific search topics. This important step precedes the actual search, but is an integral part of the process. As one of the basic texts on information retrieval written by Gordon Cormack, et al, explains:

Before conducting a search, a user has an information need, which underlies and drives the search process. We sometimes refer to this information need as a topic …

Buttcher, Clarke & Cormack, Information Retrieval: Implementation and Evaluation of Search Engines (MIT Press, 2010) at pg. 5. The importance of pre-search refining of the information need is stressed in the first step of the above diagram of my methods, ESI Discovery Communications. It seems very basic, but is often under appreciated, or overlooked entirely in the litigation context where information needs are often vague and ill-defined, lost in overly long requests for production and adversarial hostility.

Hybrid Multimodal Bottom Line Driven Review

I have a long descriptive name for what Marchionini calls the variety of strategies, tactics and moves that I have developed for legal search: Hybrid Multimodal AI-Enhanced Review using a Bottom Line Driven Proportional Strategy. See eg. Bottom Line Driven Proportional Review (2013). I refer to it as a multimodal method because, although the predictive coding type of searches predominate (shown on the below diagram as AI-enhanced review – AI), I also  use the other modes of search, including the mentioned Unsupervised Learning Algorithms (clustering and concept), keyword search, and even some traditional linear review (although usually very limited). As described, I do not rely entirely on random documents, or computer selected documents for the AI-enhanced searches, but use a three-cylinder approach that includes human judgment sampling and AI document ranking. The various types of legal search methods used in a multimodal process are shown in this search pyramid.

Multimodal Search Pyramid

Most information scientists I have spoken to agree that it makes sense to use multiple methods in legal search and not just rely on any single method, even the best AI method. UCLA Professor Marcia J. Bates first advocated for using multiple search methods back in 1989, which she called it berrypicking. Bates, Marcia J. The Design of Browsing and Berrypicking Techniques for the Online Search Interface, Online Review 13 (October 1989): 407-424. As Professor Bates explained in 2011 in Quora:

An important thing we learned early on is that successful searching requires what I called “berrypicking.” … Berrypicking involves 1) searching many different places/sources, 2) using different search techniques in different places, and 3) changing your search goal as you go along and learn things along the way. This may seem fairly obvious when stated this way, but, in fact, many searchers erroneously think they will find everything they want in just one place, and second, many information systems have been designed to permit only one kind of searching, and inhibit the searcher from using the more effective berrypicking technique.

This berrypicking approach, combined with HCIR, is what I have found from practical experience works best with legal search. They are the Hybrid Multimodal aspects of my AI-Enhanced Review Bottom Line Driven Review method.

Why AI-Enhanced Search and Review Is Important

I focus on this sub-niche area of e-discovery because I am convinced that it is critical to advancement of the law in the 21st Century. The new search and review methods that I have developed from my studies and experiments in legal search science allow a skilled attorney using readily available predictive coding type software to review at remarkable rates of speed and cost. Review rates are more than 250-times faster than traditional linear review, and costs less than a tenth as much. See eg Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron, and the report by the Rand Corporation,  Where The Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery.

Thanks to the new software and methods, what was considered impossible, even absurd, just a few short years ago, namely one attorney accurately reviewing over a million documents by him or herself in 14-days, is attainable by many experts. I have done it. That is when I came up with the Army of One motto and realized that we were at a John Henry moment in Legal Search. Maura tells me that she once did a seven-million document review by herself. Maura and Gordon were correct to refer to TAR as a disruptive technology in the Preface to their Glossary. Technology that can empower one skilled lawyer to do the work of hundreds of unskilled attorneys is certainly a big deal, one for which we have Legal Search Science to thank.

Ralph and some of his computers at one of his law offices

More Information On legal Search Science

For further information on Legal Search Science see all of the articles cited above, along with my articles listed below. Most of my articles were written for the general reader, some are highly technical but still accessible with study. All have been peer-reviewed in my blog by most of the founders of this field who are regular readers and thousands of other readers. Also see the CAR procedures described on Electronic Discovery Best Practices.

I am especially proud of the legal search experiments I have done using AI-enhanced search software provided to me by Kroll Ontrack to review the 699,083 public Enron documents and my reports on these reviews. Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents(Part Two); A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents. (Part One). I have been told by scientists in the field that my over 100 hours of search, consisting of two fifty-hour search projects using different methods, is the largest search project by a single reviewer that has ever been undertaken, not only in Legal Search, but in any kind of search. I do not expect this record will last for long, as others begin to understand the importance of Information Science in general, and Legal Search Science in particular. But for now I will enjoy both the record and lessons learned from the hard work involved. I may also attempt a third search project soon to continue to make contributions to Legal Search Science. Stay tuned. I may extend my record to 150 hours.

April 2014 Slide Presentation by Ralph Losey on Predictive Coding

Articles by Ralph Losey on Legal Search

  1. Two-Filter Document CullingPart One and Part Two.
  2. Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part Three.
  3. In Legal Search Exact Recall Can Never Be Known.
  4. Visualizing Data in a Predictive Coding ProjectPart One, Part Two and Part Three.
  5. Guest Blog: Talking Turkey by Maura Grossman and Gordon Cormack, edited and published by RCL.
  6. Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One,  Part Two,  Part Three, and Part Four.
  7. The “If-Only” Vegas Blues: Predictive Coding Rejected in Las Vegas, But Only Because It Was Chosen Too LatePart One and Part Two.
  8. IT-Lex Discovers a Previously Unknown Predictive Coding Case: “FHFA v. JP Morgan, et al”
  9. Beware of the TAR Pits! Part One and Part Two.
  10. PreSuit: How Corporate Counsel Could Use “Smart Data” to Predict and Prevent Litigation. Also see
  11. Predictive Coding and the Proportionality Doctrine: a Marriage Made in Big Data, 26 Regent U. Law Review 1 (2013-2014).
  12. Less Is More: When it comes to predictive coding training, the “fewer reviewers the better” – Parts OneTwo, and Three.
  13. My Basic Plan for Document Reviews: The “Bottom Line Driven” Approach, PDF version suitable for print, or HTML version that combines the blogs published in four parts.
  14. Relevancy Ranking is the Key Feature of Predictive Coding Software.
  15. Why a Receiving Party Would Want to Use Predictive Coding?
  16. Vendor CEOs: Stop Being Empty Suits & Embrace the Hacker Way 
  17. Comparative Efficacy of Two Predictive Coding Reviews of 699,082 Enron Documents(Part Two).
  18. A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents. (Part One).
  19. Introduction to Guest Blog: Quick Peek at the Math Behind the Black Box of Predictive Coding that pertains to the higher-dimensional geometry that makes predictive coding support vector machines possible.
  20. Keywords and Search Methods Should Be Disclosed, But Not Irrelevant Documents.
  21. Reinventing the Wheel: My Discovery of Scientific Support for “Hybrid Multimodal” Search.
  22. There Can Be No Justice Without Truth, And No Truth Without Search (statement of my core values as a lawyer explaining why I think predictive coding is important).
  23. Three-Cylinder Multimodal Approach To Predictive Coding.
  24. Robots From The Not-Too-Distant Future Explain How They Use Random Sampling For Artificial Intelligence Based Evidence Search. Video Animation.
  25. Borg Challenge: Report of my experimental review of 699,082 Enron documents using a semi-automated monomodal methodology (a five-part written and video series comparing two different kinds of predictive coding search methods).
  26. Predictive Coding Narrative: Searching for Relevance in the Ashes of Enron in PDF form for easy distribution and the blog introducing this 82-page narrative, with second blog regarding an update.
  27. Journey into the Borg Hive: a Predictive Coding Narrative in science fiction form.
  28. The Many Types of Legal Search Software in the CAR Market Today.
  29. Georgetown Part One: Most Advanced Students of e-Discovery Want a New CAR for Christmas.
  30. Escape From Babel: The Grossman-Cormack Glossary.
  31. NEWS FLASH: Surprise Ruling by Delaware Judge Orders Both Sides To Use Predictive Coding.
  32. Does Your CAR (“Computer Assisted Review”) Have a Full Tank of Gas?  (and you can also click here for the alternate PDF version for easy distribution).
  33. Analysis of the Official Report on the 2011 TREC Legal Track – Part One.
  34. Analysis of the Official Report on the 2011 TREC Legal Track – Part Two.
  35. Analysis of the Official Report on the 2011 TREC Legal Track – Part Three
  36. An Elusive Dialogue on Legal Search: Part One where the Search Quadrant is Explained.
  37. An Elusive Dialogue on Legal Search: Part Two – Hunger Games and Hybrid Multimodal Quality Controls.
  38. Random Sample Calculations And My Prediction That 300,000 Lawyers Will Be Using Random Sampling By 2022.
  39. Second Ever Order Entered Approving Predictive Coding.
  40. Predictive Coding Based Legal Methods for Search and Review.
  41. New Methods for Legal Search and Review.
  42. Perspective on Legal Search and Document Review.
  43. LegalTech Interview of Dean Gonsowski on Predictive Coding and My Mission to Make Predictive Coding Software More Affordable.
  44. My Impromptu Video Interview at NY LegalTech on Predictive Coding and Some Hopeful Thoughts for the Future.
  45. The Legal Implications of What Science Says About Recall.
  46. Reply to an Information Scientist’s Critique of My “Secrets of Search” Article.
  47. Secrets of Search – Part I.
  48. Secrets of Search – Part II.
  49. Secrets of Search – Part III. (All three parts consolidated into one PDF document.)
  50. Information Scientist William Webber Posts Good Comment on the Secrets of Search Blog.
  51. Judge Peck Calls Upon Lawyers to Use Artificial Intelligence and Jason Baron Warns of a Dark Future of Information Burn-Out If We Don’t.
  52. The Information Explosion and a Great Article by Grossman and Cormack on Legal Search.

Please contact me at for any private comments, questions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s