How to Fix The Productivity Problem With eDiscovery Cloud Computing
Four Ideas To Improve Reviewer Performance, Morale & Profitability
By Jordan McQuown
For about the last 10 years or so, there has been a lot of talk about how the cloud would transform eDiscovery. The idea is simple enough. Instead of loading data into on-prem systems, everything gets loaded into the cloud. Reviewers then use web browsers to complete their work. It’s a nice idea, a grand vision even. But there’s a problem. Once your data sets exceed a couple hundred gigs, reviewer productivity grinds to an excruciatingly slow pace. The more data you add, the worse it gets.
This is due to something I have termed IPE: Inverse Performance Equation. IPE is having a major negative impact on organizations who engage in eDiscovery work in the cloud. Numerous businesses that I know of personally have opted for a hybrid approach, maintaining both on-prem and cloud-based eDiscovery environments. The financial consequences to them, let alone security and maintenance considerations, are quite significant. They are literally paying for two eDiscovery environments. If you or someone you know about has struggled with these issues, I’d like to present four ideas that can really help. These ideas are cost-effective, secure and almost completely eradicate the IPE problem.
Organizations that earn income from doing eDiscovery work, or that are entrusted to do that work on behalf of another entity such as auditors, can benefit from these ideas. By my estimation, this could include:
- eDiscovery service providers
- Law firms
- Corporations, government agencies and other stakeholders who prefer to manage eDiscovery themselves
- Consultancies with a core competency or practice area in eDiscovery and investigations
These types of entities engage in eDiscovery work and often deal with data sets measured in the hundreds of gigabytes, if not more.
Before I present my four ideas for improving reviewer performance, I’d like to provide some greater context and insight about the IPE problem. I feel like I need to do this because it’s not very well understood right now and some organizations haven’t even encountered it yet, although I’m quite sure they will at some point. I want to say a few things for the record.
First, I do believe that cloud computing is the future of eDiscovery. I cannot say how long it will take for cloud to take over on-prem, but I have no doubt that this will happen. It’s only a question of when, not if. Cloud computing is simply too attractive and too cost-effective to be relegated to the sidelines for long.
Second, I see two major hurdles that eDiscovery functions will have to overcome to make cloud computing a reality: perceived security concerns and the IPE problem. In another thought piece, I’ll talk about how the cloud can be made as secure, or nearly as secure, as on-prem operations. But I do think that as long as some clients perceive the cloud to be less secure than on-prem, this will be a driver in cloud adoption. For now, I see this as a secondary problem in cloud adoption, not the primary problem.
Third, the IPE problem is THE major problem confronting eDiscovery functions. Until this problem is solved, I don’t believe that cloud will ever achieve the level of adoption that pundits have talked about. In my experience, only the most select and sensitive clients will insist that their data not be put into the cloud. This means that most organizations are open to or ambivalent about whether or not their data is reviewed in the cloud. That makes IPE the true blockade to cloud adoption in eDiscovery.
Given this significance, I think it’s wise to understand how IPE actually impacts eDiscovery functions. To help us in this area, I’d like to offer a clear definition of IPE:
Once data sets exceed a certain threshold, usually 150-200 gigs, for every gigabyte of data that gets added to a matter, reviewer speed degrades by an equal measure.
I think of this as a tipping point. Once you cross a certain threshold, reviewer speed begins to slow. As more data gets added, performance continues to degrade until it becomes nearly untenable for actually getting work done. Here are the tell-tale signs of IPE:
- Document to document navigation speed slows down dramatically. This degrades reviewer productivity because they cannot quickly jump from one document to the next.
- Coding saves and issue designations become very hesitant and halted. This often happens when a hot-doc becomes unresponsive. What should be a nearly instantaneous task can completely lock up the application.
- Search speeds become very inconsistent. Sometimes queries are returned quickly and other times the reviewer cannot tell if the search is actually happening because the system does not seem to be doing anything.
- Reviewers encounter the spinning wheel of death. This seems to happen at the most inopportune times and for reasons no one seems to understand.
- Reviewers groan. This is literally what you can hear when IPE rears its ugly head. Reviewers become so frustrated that they verbalize it in quite colorful language.
Your organization may not yet have encountered the debilitating effects of IPE, especially if you have not deployed a cloud-based solution for eDiscovery yet. So if you are considering cloud computing for eDiscovery, these ideas can benefit you by helping you avoid IPE before it even becomes a problem. If you’ve already deployed a cloud solution for eDiscovery and have not yet encountered IPE, it’s probably only a matter of time. If you have deployed the cloud and have encountered IPE, these ideas could benefit you right away:
- Document how much IPE is impacting your reviewers and costing your organization.
- View your eDiscovery environment as a discrete set of technology tiers.
- Segment the Processing, Data and SQL tiers from the web application tier.
- Run a pilot program to ensure the performance is viable to your requirements.
My recommendation is that you begin by documenting how much IPE is actually impacting reviewers and, therefore, likely costing you as an organization. In my experience, there are two types of IPE impact: non-quantifiable and quantifiable. The non-quantifiable impact can often have greater negative consequences than the quantifiable impacts. Let me explain.
The non-quantifiable impacts of IPE are usually about the frustration and anxiety of your reviewers. Most reviewers are graded on productivity metrics that determine their standing within an organization. When reviewers achieve or exceed productivity goals, they usually don’t worry about job security. But when their productivity dips, their anxiety levels jump.
When reviewers cannot meet their productivity goals because the system is too slow (because of IPE), you have a significant problem. Morale goes down. Reviewers get nervous and start asking themselves if they should be looking for another job. In the age of the big quit, IPE could be a contributor to losing talent you want to retain. This is why I say the non-quantifiable impacts may be more negative than the quantifiable impacts.
You might find it acceptable to experience degraded performance and financial outcomes as data-sets increase. You probably won’t find it acceptable to lose talent because of IPE. One way to address this is to run employee satisfaction surveys that include questions related to system performance enabling job performance. If this metric consistently indicates trouble, it is quite likely that IPE will begin costing you employees.
Now let’s discuss how to document the quantifiable impacts of IPE. To help with this, I’d like to draw upon a mock organization I’ll call ABC Legal. To make this example simple, I’ll make the math simple too. Just to be clear, this math may or may not be similar to what your firm experiences. But the economies of scale will likely be quite similar, especially if your firm, like ABC Legal, is 100% cloud-based:
- Last year, ABC legal handled 100 matters. 65 of those matters had fewer than 100 gigs of data. 20 matters had 100-200 gigs of data. 15 matters had more than 200 gigs of data.
- ABC Legal has 7 reviewers and they are paid, on average, $1,000 per day per person.
- ABC Legal established a productivity goal for reviewers of 1,000 documents per day.
- On every matter with fewer than 100 gigs of data, the 1,000 documents per day target was realized. On matters with 100-200 gigs, productivity dropped to 750 documents per day. On matters with more than 200 gigs, productivity dropped to 500 documents per day.
- As you can see, the IPE impact is essentially a doubling of the cost-per-reviewer-per-day. ABC Legal pays $1,000 to have 1,000 documents reviewed when data sets are less than 100 gigs. But when data sets exceed 200 gigs, ABC Legal pays $1,000 to have 500 documents reviewed. This is a very common financial scenario I see in play today.
If your organization has already adopted the cloud, I would wager that an analysis like this will probably uncover similar financial results. IPE begins to cost you in real dollars as soon as data sets exceed 150 gigs or thereabouts.
Most people think of eDiscovery as an application. While the application is important, eDiscovery is also an ecosystem of interconnected devices. Even though there is a lot of complexity to the ecosystem, I tend to think of it as four discrete tiers:
- Web server tier where application access happens and where application agents are applied to handle a variety of tasks.
- Processing tier comprised of dedicated servers to handle data processing, analytics processing and compute intensive tasks. For example, data processing and data productions are both compute intensive tasks.
- Data tier includes data storage systems and storage management tools.
- SQL tier comprised of SQL databases running on discrete servers.
I didn’t include disaster recovery (DR) and offsite backups in these tiers because most organizations handle DR as an offshoot of their data tier. In some instances, a DR site is simply a hot backup of the current data sets on storage systems in an eDiscovery ecosystem. Other times, DR is a cold copy of recent data sets that can be made operational within a defined time window, anywhere from a few hours to a few days. Underneath this ecosystem is a secure network that connects all of the devices and applications for seamless interoperability.
For at least two decades now, the assumption has been that all of these tiers need to be physically close to each other—either in the Cloud or on-premise. But I’m not convinced of that at all.
What I’m about to suggest is primarily an architectural consideration, not a technology consideration. In other words, you may not need to buy a lot of new technology to make this approach work. Here are my ideas for how to segment these tiers:
- Web server tier. This gets placed in a public Cloud so users can access the application seamlessly and securely. Public cloud environments can be made highly secure. This tier lends well to burstable on-demand resources that cloud providers are exceptional at.
- Processing tier. This could get placed in a public Cloud. Or it could be in a private Cloud or a secure on-premise location, depending on the application requirements. This would fall under a mixed-use category. Certain aspects of processing lend well to the scalability and burstability of clouds, while analytics may quickly become cost prohibitive.
- Data tier. This gets placed either in a private Cloud or in a secure on-premise location. The distance between the web server tier and the data tier generally should be within the same metro area to prevent response latency.
- SQL tier. This gets placed in the same location as the data tier, in most instances. This is the single greatest dependency for an all-around performant eDiscovery environment.
The goal here is to uncouple the web tier from the processing, data and SQL tiers. Those three tiers tend to operate best when they are tightly coupled. But they don’t need to be on the same premise as the web application tier. Why should you consider this?
In our experience, after having conducted audits on hundreds of eDiscovery environments, the root causes of IPE can be addressed simply by segmenting the web tier from the processing, data and SQL tiers and then joining them with a very high-speed and secure internet connection. This solution works. It allows for exceptionally fast read-write activity to the database while also allowing users to log-in to the system from anywhere.
If you’d like to experiment with this approach at your organization, here are some suggestions for doing so. I recommend both quantitative and qualitative pilot programs.
- Design and deploy a new environment based on the tiers I’ve outlined above.
- Pick a few active matters with various data-set sizes: large workspace data set (200 gigs), middle of the road (100 gigs) and a small matter (25 gigs).
- Migrate the data to the new environment.
- Run a comparison test on the data sets in your existing and new environments. Take a sample of performance metrics from both environments for these types of activities: searches, index builds, production speeds – any of the common functions and tasks.
- Document your findings and compare the performance of the two environments.
- Select a handful of reviewers – they should have experience with the older platform. We like to select “vocal” reviewers who have not been shy about giving feedback.
- Provisions licenses for them in the new environment and provide what little training is necessary.
- Ask them to run a series of tasks that they normally would run during their workday. Limit this to about 2 hours of activity.
- Interview the reviewers and get their qualitative feedback. Did they find the experience to be fast, efficient and intuitive? Did they run into any snags?
With these two pilot programs in place, you’re likely ready to roll out the new environment with a much higher degree of confidence, which has a direct impact on how ardently your business development people sell the solution. Why should you do this?
- You will have proof, not assumptions, about the performance of the new environment. This will give you and your team confidence that the new approach is superior.
- You don’t want to migrate all of your production data until you’re confident that the new environment works as you wish.
- You don’t want to train and migrate users until you’re confident.
- You don’t want to create a differentiation marketing program for your eDiscovery function until you are 100% confident that it will work the way you want it to.
- You don’t want to set expectations with internal stakeholders about improved productivity and reduced costs until you can prove to yourself that the environment delivers those benefits.
In this thought piece, I’ve put forward four ideas for how to address IPE—the number one obstacle to the adoption of eDiscovery cloud computing. My ideas are primarily about architecture, not new technology per se. By uncoupling the web tier from the processing, data and SQL tiers, your reviewers are likely to see a massive productivity boost on matters that used to cause them to groan. If you have questions about any of the ideas I’ve put forward here, please know that I’m open to the conversation.
CHIEF TECHNOLOGY OFFICER (CTO), GEORGE JON
Jordan McQuown is an authority in information technology, cyber security, electronic discovery, and digital forensics. He has written Thought Leadership articles for the American Bar Association’s Cybersecurity Handbook and Information Security Magazine, and he is a regular speaker as a subject matter expert on the eDiscovery security, application and legal conference circuits.
George Jon (GJ) is an eDiscovery infrastructure, product and process specialist, delivering performant, scalable, fault tolerant environments for users worldwide. GJ works with global corporations, leading law firms, government agencies, and independent resellers/hosting companies to quickly and strategically implement large-scale eDiscovery platforms, troubleshoot and perfect existing systems, and provide unprecedented 24/7 core services to ensure optimal performance and uptime.
George Jon’s (GJ) conclusions are informed by fifteen-plus years of conducting enterprise-class eDiscovery platform assessments, application implementations and infrastructure benchmark testing for a global client base. GJ has compiled extensive quantitative and qualitative insights from the research and implementation of these real-world environments, from single users to multinational corporations, and is a leading authority on eDiscovery infrastructure.