Culverhouse Conversations: Keeping Pace with AI

The way Alabama professor Ron Dulek frames it, artificial intelligence is comparable to the legend of John Henry. The story goes that Henry, a steel driver for the railroad in the 1870s, was pitted against a steam-powered rock drill. Henry emerged victorious against the machine but died right afterward.

The legend serves as a metaphor for humans vs. AI.

“Henry dies, but the steam engine comes back the next day,” Dulek said. “AI isn't going to die.”

An AI world is here and not going anywhere. It is weaved into almost every vocation. AI is a big research subject at Culverhouse, and it’s also used as a tool for assisting in research. Recently, UA professors Jef Naidoo, Marcus Doxey, Buvaneshwaran Venugopal, and Dulek sat down with moderator Shawn Mobbs for an edition of “Culverhouse Conversations” to discuss the use of AI in research and in the classroom.

Meet the Experts

Ron Dulek

Professor of Management

Marcus Doxey

Professor of Accounting, Garner-Hitt Chair in Accounting

Jef Naidoo

Department Head of Management, Associate Professor of Management

Buvaneshwaran Venugopal

Assistant Professor
of Finance

AI finding its voice

In 2018 Dulek and Naidoo conducted a study examining the effectiveness of automated text summarization, a sub domain of Natural Language Processing (NLP), which is a subfield of AI. The study employed the ROUGE-1 algorithm to evaluate the quality of AI-generated summaries by comparing them to human-written reference summaries of CEOs’ annual report letters.

The findings suggested that the text summarization tools performed at a moderately satisfactory level. However, the results also indicated that the AI-generated summaries replicated only about 20% of the important content covered in human-written summaries, with human authors clearly outperforming the automated systems. When they revisited the study years later with modern generative AI, the results were significantly different.

Now, AI writes better than humans on most linguistic dimensions. It does a great job with clear structure, proper tone, professional language, and overall effectiveness. The only place AI still struggles is in conveying emotion. It can sound polished and professional, but it doesn’t truly capture human feeling or authenticity.

In the updated study, the researchers employed business prompts that required responses from the perspective of a CEO. Each prompt was answered by a human executive, ChatGPT, and Gemini, allowing for a direct comparison of how each “writer” approached the identical communication task.

To evaluate the effectiveness of each generated response as a form of leadership communication, the researchers applied a taxonomy derived from prior literature. Approximately 500 participants assessed the responses using the expanded 18-dimension evaluation framework. “In every category, AI surpassed humans. In every category,” Naidoo said.

"Now, AI writes better than humans on most linguistic dimensions. It does a great job with clear structure, proper tone, professional language, and overall effectiveness. The only place AI still struggles is in conveying emotion. It can sound polished and professional, but it doesn’t truly capture human feeling or authenticity."

Venugopal asked how the evaluations worked. “The 500 were evaluating all three at the same time? Is there overlap of evaluators for the same document?”

Naidoo said yes, “all 500 participants evaluated each document”.

Venugopal then inquired about the aggregate score, and the efficacy rate, which had been 20% before.

“This time it's showing us that in every case, in 100% of the cases, AI surpassed humans,” Naidoo said. “Except for emotion.”

Emotion is central to leadership communication. “Leaders that are most successful are ones that leverage empathy and emotion,” Naidoo said, pointing to ongoing research on employee support and retention.

Venugopal asked if they gave a sample essay of the CEO and then asked AI to write it, but in a different voice, because maybe then the emotion could be matched.

Even then, the outcome didn’t change.

What wasn’t surprising was everything else. “Forceful… decisiveness… it was winning each time,” Naidoo said. In category after category, AI didn’t just compete. It led.

AI vs. analysts

Venugopal’s research looks at how AI handles real business data and where it can go wrong for researchers. The goal was to identify and categorize the different risks companies report in documents like 10-K filings. A detailed system (about 26 categories) was built to classify these risks and used AI to help summarize and analyze them.

A big issue emerged. AI summarized the text but changed it in important ways, such as removing “hedging” language like “may” or “could” that executives use to express uncertainty. These words were replaced with more definite, forceful statements that makes risks sound more certain and severe than they really are.

So, when AI is used to score or measure things like “risk severity,” the results become biased.

“The idea was that AI is very bad at ranking,” Venugopal said. “If you ask it to rank severity, at least in our context, every time it came up with a score between 7 to 9 for everything. That doesn't make sense.”

It struggles to understand nuance, context, and probability like low-likelihood “act of God” risks (hurricanes, floods, tornadoes, etc.).

The conversation shifts to prediction where Venugopal tests whether AI can estimate earnings per share, which is what financial analysts do every day. AI should be good at this because it has massive amounts of data. It’s about 60% accurate when recalling known numbers but struggles when asked to predict the future. If prompted to project next quarter, which it already “knows” from training, the performance drops.

The takeaway from the exchange is that AI isn’t a neutral tool that summarizes information. It reshapes it and makes language more certain, forceful, and sometimes less accurate. It can recall facts, but struggles with nuance, probability, and context, which is important in research and decision-making.

Dulek jumps in with the simple question, “Is it better than the analysts?”

Venugopal’s says it’s “surprisingly horrible,” and giving AI more information makes it worse. The models start overthinking and ignore what they’ve learned and rely too heavily on the new information.

“Their training is being overridden by the context that we give,” Venugopal said.

Naidoo connects this back to his own research, noting that AI tends to assume the worst. AI doesn’t like uncertainty, so it resolves it by making things sound definite and often negative. That same forcefulness shows up again.

Mobbs wanted to know whether giving more history, like multiple earnings calls, would help. Venugopal admits they tried variations, but limits like context size get in the way.

When they remove identifying details, like company names, products, the AI struggles badly, especially with smaller firms. But for big, well-known companies, it still performs reasonably well. Somehow, it can infer what company it’s looking at based on patterns.

The takeaway from the exchange is that AI isn’t a neutral tool that summarizes information. It reshapes it and makes language more certain, forceful, and sometimes less accurate. It can recall facts, but struggles with nuance, probability, and context, which is important in research and decision-making.

AI and disclosure bias

Doxey’s research is on how people and AI judge what information companies should disclose, and found AI shows a clear bias. He and Dr. Chez Sealy designed a study where both investors and auditors rated how important different company disclosures are.

Ideally, the auditors would match the ratings of the investors, but they did not. In their results, users wanted more disclosure and auditors consistently rated disclosures as less important.

The expectation was that AI, which can take on different roles, might better match the user perspective. Instead, AI was even more extreme and rated almost everything as highly important to disclose. It showed a strong conservative bias.

“So auditors were more bullish?” Naidoo asks.

“Auditors might say something on a seven-point scale and have a two or a three to disclose. The users might have been a four or a five, and the LLM would be like a six or a seven,” said Doxey. “If they just sort of blindly take what it says, they’ll be much more conservative. But if they actually use it… it actually brings them very close to where the users are.”

Venugopal suggests, “Can you nudge the LLM to move up?” by prompting it to reconsider its answers. Doxey acknowledges the idea but emphasizes, “there seems to be a conservative bias. It’s going to err on the side of disclosure.”

The conversation shifts to practice. Why aren’t auditors fully using AI? Doxey points to constraints of client confidentiality, regulatory oversight, and repeatability. “LLMs just don’t work that way. There’s still that element of we don’t know exactly what we’re going to get out of it this time,” Doxey said.

Naidoo asks “Is it better to be more conservative from a risk perspective?” Doxey responds, “The problem with being too conservative is you are telling companies to disclose a whole bunch of information that… is actually immaterial.”

The expectation was that AI, which can take on different roles, might better match the user perspective. Instead, AI was even more extreme and rated almost everything as highly important to disclose. It showed a strong conservative bias.

Doxey reframes accounting and auditing as solutions to human interaction problems like trust, interpretation, and decision-making. But in a world where “the company’s AI system is putting out real economic data points” and another AI consumes them, “materiality goes out the window.”

Dulek lists a key insight, saying “It doesn’t deal well with ambiguity.” Doxey agrees, translating it into accounting terms. “We would call that a conservative bias… it’s kind of losing the nuance.”

It comes back to training. Can AI be tuned to behave differently? Naidoo is optimistic but Venugopal says, “Training is not going to cover all scenarios… it’s very tough for them to move faster” in new environments.

Racing a Moving Target

The comparison to the steam engine helps explain AI, but it doesn’t capture the whole picture. The steam engine continually improves over time and becomes even more efficient. AI is the same. It is constantly changing, improving, and accelerating, all at a rapid pace.

That creates a challenge for researchers. By the time a study is designed, conducted, and published, the results may already be outdated. What feels like a clear finding today can quickly become a snapshot of the past.

AI is less like a steam engine and more like a high-speed train, which is already in motion, gaining speed, and still figuring out where it’s going. Researchers are just trying to keep up with it.

Just like in the legend of John Henry, the question isn’t whether humans can beat the machine, it’s whether they can adapt fast enough to stay relevant alongside it.