Merck is using AI agents to cut drug discovery cycles by a third and ship compliant marketing materials up to 80% faster — but VP of Digital Platforms Sean Finnerty says the only reason it’s working is because they built the infrastructure first.
And the pharmaceutical manufacturer is seeing promising early results: AI is generating marketing drafts that are “99% right” when it comes to compliance, shrinking review cycles from months to days and accelerating delivery by 70% to 80%. In the company’s medical research, meanwhile, one AI-assisted discovery cycle was reduced by 33%.
Still, agentic AI only works if companies first build the underlying “plumbing,” Finnerty said of digital platforms and services at a recent AI Impact Series event.
“If we do one-offs, we’re gonna end up with thousands and thousands of things that are ultimately just gonna be debt that we’ll have to deal with later,” he said. “And that’s gonna be a drag on any further innovation.”
Starting with the plumbing
Merck’s plumbing-first strategy comes from lessons learned during the early days of cloud in the 2010s “when nobody knew what the heck was going on,” Finnerty said.
Getting the cloud right meant building from the ground up; at Merck, that infrastructure now supports 2,500 AWS accounts, numerous Microsoft Azure subscriptions, and new Google Cloud Platform (GCP) integrations.
“AI is gonna be the same exact thing,” Finnerty said. “We’re going to have thousands and thousands of agents.” The questions then pile up: How do you register them? How do you secure them? How do you ensure they’re connected to the right tools, and have access to the right data and the right context?
Context delivery is also critical; Merck works with three hyperscalers and has forty-seven edge locations and hundreds of databases. “Many, many petabytes” of structured and unstructured data are stored in Oracle databases, SQL databases, Excel spreadsheets, phone transcripts, and other repositories, Finnerty said.
His team is building scaffolding to deliver meaningful context in various situations, he explained. Data must be organized and ingested into various platforms, because “there’s no one solution to solve every single problem.” Sometimes it’s Databricks, other times it’s Amazon Redshift, “plus four other things.”
The goal is: “Let’s make that easy and frictionless for people to do, and secure it, and make sure it’s well integrated with MCP [model context protocol], and A2A [Agent2Agent], and upstream compute,” Finnerty said. “If you wanna run stuff on GCP or you wanna run stuff on AWS, we’ve got the plumbing in place so you can run your adjacent workloads wherever you want.”
How Merck is using agents
As it builds out its technical plumbing, Merck is experimenting with agents across regulated enterprise operations, scientific discovery workflows, and app modernization.
Notably, AI is accelerating drug discovery. Finnerty explained that scientists look at molecular structures and disease states to determine if a given condition is druggable. But even if a disease state is known, developing a drug to target it can take years.
Now with AI, teams are starting to see “very promising things,” such as cutting one particular research cycle down by one-third. “That’s a year off of the life of the discovery cycle,” Finnerty said. “Which means, theoretically, we can get it to a patient who needs that therapy a year faster.”
Once developed and approved, these products are regulated and marketing materials around them must be clearly and explicitly articulated. “The way you communicate that information per market, per country, per state, per region, is all very carefully governed and regulated,” Finnerty said. It’s also variable: An ad campaign for a vaccine in the state of Georgia looks much different from one launched in Canada.
Historically, humans did the due diligence to make sure the company complied with various laws. Draft materials go through iterations of reviews; when a mistake is discovered, it gets “kicked back to the beginning, and it goes through it again, and then it takes another however many weeks and months,” Finnerty said.
But now, AI can do that “much, much more effectively,” and the process is increasingly evolving from a human-in-the-loop to essentially a “human-as-governor.” With human oversight, AI can deliver a first draft in a day or week that is 99% there, allowing teams to ship materials up to 80% faster.
Meanwhile, when it comes to app modernization, AI can discover architecture, document data interactions, APIs, network paths, and do authentication checks and authorization; it can also write code for Terraform for deployment and refactor JavaScript into Python.
Where the company would have previously spent weeks and months and hundreds of thousands of dollars to update one application, Finnerty said, agents are now handling the work through prompts.
Running into “wackiness”
That’s not to say there aren’t significant challenges; Finnerty noted that his team has run into some “wackiness”; for example in automated code and scenario testing. AI has blatantly made up scenarios, whether due to incorrect context, infrastructure, “or if it was just getting creative with, ‘You should be testing these three functions that don’t even exist in the code that you’re trying to test.’”
“That surprised me a little bit because I thought we were further past some of the hallucination challenges in these later models,” he said.
To address this, his team has engineered guardrails to keep hallucinations to a minimum, essentially using AI to supervise AI and applying confidence scores. So if Claude created the first output, they’ll instruct Microsoft Copilot to assess it.
“So if you ask something once, have AI check it, then ask it a third time, the confidence increases every time, and it minimizes some of the garbage that gets created in the early runs,” Finnerty said.
Use cases for agentic AI in financial services
Meanwhile, at Mastercard, Chief Data Officer Andrew Reiskind and his team are focusing agentic experimentation on highly orchestrated transaction and dispute workflows. As he noted, a chargeback or fraud dispute is not a single event.
When a consumer disputes a charge (typically online), that “kicks off an entire other process on the back-end that tends to be very labor-intensive,” Reiskind said.
Mastercard has to collect specifics about the actual dispute; then the merchant has its own investigations (Was the card reported as lost or stolen? Does the consumer dispute charges often?). Further, the network sitting in the middle has its own rules for timing and information submission.
“You have each and every one of these steps, many of which are unstructured, but there are also structured data elements to this,” Reiskind said. Whether a card was lost or stolen tends to be structured, but the consumer complaint is “unstructured data of questionable reliability.”
“So you’re sitting there with a decisioning system that has deterministic decisions, but also probabilistic decisions,” he said.
This problem can be sped up and potentially solved by AI agents, but that can be a complex process: Which tasks are you handing off to agents? When are they kicking things back to human reps? How many agents are you ultimately using? What are the cost implications?
Then there are reputational questions and costs: Have you just called a consumer potentially a liar when they weren’t lying?
“It’s an exact problem where you want to, as a bank, maintain trust with your consumer,” Reiskind said. “But you also wanna make this efficient and take costs out of the system.”
The PB&J versus turkey mistake: Determine what risks are acceptable
There’s always going to be risk with AI, and enterprises should assess it from the beginning of product design, Reiskind said. There’s also the question of acceptable risk.
As an example: Did you serve a customer a peanut butter jelly sandwich instead of a turkey sandwich (a minor inconvenience)? Or did you serve gluten to someone with celiac disease?
“Is it an acceptable risk if one percent of the time it makes the mistake? If it is, let’s go to the next stage of how you’re mitigating that risk,” Reiskind said.
Leaders must perform cost-benefit analysis, break problems down to their “constituent pieces,” and calculate cost for each one. But these are estimates; it’s near-impossible to forecast real usage, Reiskind said. “It is not a simple process to get to the cost,” he said. “But it is doable.”