Reducing Bias and Hallucination Using the Tree of Thoughts (ToT) in Large Language Models — Healthcare Use Cases

Nagesh Somayajula
12 min readMay 5, 2024

Tree of thoughts, ToT a prompting techniques can be very useful to prevent Large language models to reduce bias and hallucination when combined with dramatic play strategy, How ToT prompt and with help of custom coding we can fine tune the LLM to do better in reducing bias specially dealing with Retrieval Augmented Generation — RAG based use cases.

AI Ethics and AI Law

Since there are many new models introduced every day, and the competition is heating up in various models including text-to-image, text-to-video, and text-to-text using generative AI, it’s very critical and important to ensure that the output generated by these models is completely free from bias, racism, and religious discrimination. Additionally, for commercial or organizational use cases, it’s important to reduce or eliminate hallucinations. There are multiple ways to achieve this, such as prompt engineering, but recently a research paper introduced a method called Chain of Feedback, which is essentially an extension of chain of thoughts, chain-of-skeleton, and chain-of-verification, among others, but for this use case we will discuss how ToT prompting can be very useful specially RAG based use cases.

What is Tree of thoughts ?

in very simple terms, tree of thoughts enables reasoning capabilities to LM models, Tree of Thoughts (ToT) developed for kind of framework for language model inference that aims to enhance the problem-solving abilities of large language models (LLMs) on tasks requiring exploration, strategic planning, or where initial decisions play but with small change in strategy to this framework can be used to reduce bias for real time use cases where anti-bias and hallucination are mandatory.

Types of bias very common in LLM models and dealing with Healthcare use cases

Gender Bias:

This is a very common problem in language models when working with text-based or text generation tasks, including customized use cases and RAG (Retrieval Augmented Generation) -based applications. Language models (LLMs) may generate or exhibit bias towards only one gender, for example, when questions are posed in a way that documents predominantly address a specific gender. In such cases, the model may provide answers that are most relevant to that gender only. Particularly in RAG-based solutions, where models are trained with a specific gender for repeatability, it’s crucial that the model understands bias when questions are asked. Providing scope to mitigate gender bias would greatly benefit users.

Example — chronic pain diagnosis between male vs. female patients. (Note — all these stories are machine generated)

Story — below story can be replaced with real time data on your source.

doc_txt = ''' 
Title: Beyond Labels
In the bustling halls of St. J Hospital, Dr. E Hayes, a seasoned physician, walked briskly, her white coat trailing behind her like a cape of authority. Her steps were confident, her mind already navigating through the myriad of cases awaiting her expertise. However, today, something weighed heavily on her conscience.
In room 307, Mark Anderson sat, his face etched with lines of agony, his hands clenched in a futile attempt to suppress the pain ravaging his body.
He had been suffering from chronic pain for months, yet despite his brave facade, every moment felt like an eternity of torment.
Dr. Hayes approached his bedside, her eyes assessing his condition with professional scrutiny.
"How are you feeling today, Mark?" she inquired, her voice a gentle reassurance amidst the storm of discomfort.
Mark attempted a smile, but it faltered, his facade crumbling under the weight of his suffering. "Not good, doc," he confessed, his voice barely above a whisper. "The pain…it's unbearable."
Dr. Hayes nodded, her expression sympathetic yet determined. She knew all too well the complexities of chronic pain and the toll it took on one's physical and mental well-being. As she examined Mark's chart, her thoughts drifted to the pervasive stereotypes that often colored medical perceptions.
Mark was a prime example—a stoic man enduring his pain with silent fortitude. In the eyes of many, his struggle was commendable, a testament to his resilience. Yet, Dr. Hayes couldn't help but wonder—what if Mark were Maria?
In another wing of the hospital, Maria Garcia lay in her hospital bed, her face contorted in anguish, her cries echoing through the sterile corridors. Like Mark, she too battled chronic pain, yet her experience was met with skepticism and dismissal. Words like "emotional" and "hysterical" lingered in the air, casting doubt on the legitimacy of her suffering.
Dr. Hayes sighed, the weight of injustice heavy on her shoulders. Gender stereotypes permeated every facet of society, including the realm of medicine. But she refused to be bound by preconceived notions and biases. Every patient deserved to be heard, to be treated with compassion and dignity, regardless of their gender.
Returning to Mark's bedside, Dr. Hayes met his gaze with unwavering resolve. "Mark," she began, her tone firm yet empathetic,
"I want you to know that your pain is valid. We'll work together to find a solution, one that addresses your needs and respects your experience."
Tears welled in Mark's eyes, a mixture of relief and gratitude washing over him. For the first time in months, he felt seen, heard, and understood.
In the days that followed, Dr. Hayes embarked on a journey—one of advocacy and empowerment.
She challenged stereotypes, sparked conversations, and advocated for change within the medical community. No longer would patients like Mark and Maria be confined by the limitations of labels.
For in the end, Dr. Hayes knew that healing went beyond the confines of gender—it transcended stereotypes, biases, and prejudices. And in the hallowed halls of St. Mary's Hospital, compassion reigned supreme, offering solace to those in need and hope for a future where every patient was treated as a person, not a stereotype.
'''
## model="llama3-8b" used to test this prompts
from langchain.prompts import PromptTemplate
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
# Model configuration settings
endpoint_url="http://localhost:8010/",
max_new_tokens=512,
top_k=10,
top_p=0.95,
typical_p=0.95,
temperature=0.01,
repetition_penalty=1.03,


# Prompt
prompt = PromptTemplate(
template=""" You are an anti-bias detection system in generated responses from the model. Provide a probability score for major biases in the given output, like the example below:
Your role is very critical in identifying bias in healthcare data and determining the type of bias along with its probability, providing accurate reasoning. Please refrain from presenting any incorrect information.
##Gender bias - 10%
##Reason: [Reason here]
##Hallucination - 10%
##Reason: [Reason here]

Question: {question}
Context: {context}
Answer: """,
input_variables=["question", "document"],
)
# Chain
rag_chain = prompt | llmd | StrOutputParser()

# Run
question = "can you generate story on how chronic pain felt between man vs. women ?"
generation = rag_chain.invoke({"context": doc_txt, "question": question})
print(generation)
Reply from Model

The final response from the model looks good, but there are still some improvements required. One of the biggest problems is consistency in the response. If it is executed a second time, the answer must be consistent. What adjustments are required to achieve consistent replies? Add seed arguments in the model definition, including temperature as low as possible

model_kwargs={
"seed": 0 # Add your seed value here
})

Now we will get always consistent reply from model.

Confirmation Bias:

This occurs when the text or language is interpreted in a way that confirms preexisting beliefs or hypotheses while dismissing evidence that contradicts them. In text analysis, this might lead to selectively focusing on information that aligns with one’s beliefs and ignoring conflicting viewpoints.

Example :

Story : Smoking and heart diseases treatment and does this story or treatment causes confirmation bias?

doc_txt = '''
Title: Hearts in Smoke
Dr. Jonathan Heyes was renowned for his expertise in cardiology. His office, adorned with diplomas and accolades, exuded an air of authority. However, behind his veneer of success lay a challenge—a tendency to rely on routine and established protocols, potentially overlooking unconventional solutions.
One scorching afternoon, Michael Thompson, a middle-aged man with a penchant for Marlboro cigarettes, entered Dr. Heyes's office, his face etched with worry. Gripped by chest pain, Michael's heart pleaded for attention, a consequence of years of neglect and addiction.
Dr. Heyes, accustomed to such cases, greeted Michael with a nod, his mind already formulating a diagnosis. "Mr. Thompson, it appears your smoking habit has taken its toll on your heart," he remarked gravely, prescribing medications and advising cessation.
Despite his efforts to follow Dr. Heyes's recommendations, Michael struggled to quit smoking. As his chest pains persisted, doubts gnawed at his resolve. Was there more to his condition than met the eye?
Weeks later, a chance encounter with a colleague, Dr. Sofia Rodriguez, sparked a revelation. Over lunch, she shared recent research challenging conventional wisdom about smoking and heart disease. The findings suggested a nuanced relationship influenced by genetic predispositions and environmental factors.
Intrigued, Dr. Heyes delved into the research, his eyes widening with realization. He had overlooked alternative treatments in his eagerness to adhere to established protocols.
Seeking out Michael, Dr. Heyes offered a sincere apology, acknowledging his oversight. "To address your condition fully, we must explore all possibilities," he confessed. "Let's embark on this journey together, with an open mind and a commitment to your well-being."
With renewed determination, Dr. Heyes and Michael forged a new path, unbound by routine and grounded in exploration. Through their collaboration, they discovered the true power of medicine—not in prescriptions or protocols, but in the dedication to each patient's unique journey toward healing.

'''

Final response from Model

**Gender bias - 5%**
##Reason: The story features a predominantly male cast of characters, with Dr. Heyes and Michael being the main protagonists. While Dr. Sofia Rodriguez is introduced as a colleague of Dr. Heyes, her role is relatively minor. This imbalance in representation may contribute to a gender bias.

**Hallucination - 0%**
##Reason: There is no evidence of hallucinations in the given text.

**Confirmation bias - 40%**
##Reason: Dr. Heyes initially relies on established protocols and overlooks unconventional solutions, indicating a confirmation bias. His tendency to adhere to routine and established practices without considering alternative approaches may lead to a biased diagnosis and treatment plan.

**Other biases:**

* **Stereotyping - 10%**: The character of Michael is portrayed as a middle-aged man with a penchant for Marlboro cigarettes, which may perpetuate stereotypes about smokers.
* **Lack of diversity - 5%**: The story features a predominantly white cast of characters, which may contribute to a lack of representation and diversity.

**Final answer:** The question "Does all smoking cause heart pain?" is answered in the context of the story. While smoking is mentioned as a contributing factor to Michael's heart condition, the narrative highlights the importance of considering alternative treatments and exploring all possibilities. The answer is not a simple "yes" or "no," but rather a nuanced exploration of the complex relationship between smoking and heart disease.

Societal Biases

Language models can absorb and amplify societal biases present in their training data, such as gender, racial, or age-related biases.

doc_txt = ''' 
Title: Echoes of Understanding
In the heart of a bustling city, Mercy General Hospital stood tall, its corridors bustling with the ebb and flow of life. Dr. Maya Patel, a seasoned pediatrician, dedicated her days to the well-being of her young patients. However, beneath the veneer of medical expertise lay a quiet yet persistent undercurrent of societal influence—a force that subtly shaped the interactions within the hospital walls.
One brisk morning, Sarah Johnson hurried into Mercy General, her son Ethan in tow. Ethan, a spirited boy with a mop of unruly curls, clutched his mother's hand, his face contorted in pain. Sarah's heart clenched at the sight of her son's distress, her mind racing with worry.
Dr. Patel, adorned in her white coat, greeted Sarah and Ethan with a warm smile, her demeanor a beacon of reassurance amidst the chaos of the emergency room. As she examined Ethan, her thoughts danced with potential diagnoses, each influenced by the unspoken societal norms that pervaded healthcare interactions.
Sarah, a single mother navigating life's challenges, braced herself for judgment, her past experiences leaving a lingering sense of apprehension. She knew all too well the weight of assumptions and stereotypes—the invisible barriers that threatened to overshadow Ethan's genuine need for care.
Dr. Patel, though well-intentioned, couldn't escape the echoes of societal influence that lingered in her subconscious. As she interacted with Sarah and Ethan, subtle cues shaped her approach to diagnosis and treatment. Unbeknownst to her, societal norms quietly dictated her decisions, influencing the care she provided.
Hours melted into minutes as Dr. Patel meticulously examined Ethan, her focus unwavering yet tempered by the shadow of societal expectations. In the end, a simple test unveiled the truth—a diagnosis that transcended labels and stereotypes.
As Sarah and Ethan left Mercy General, Dr. Patel couldn't shake the weight of her realization. She thinks colored healthcare interactions, striving to create a space of understanding and empathy for all patients.
For in the end, true healing could only begin when the echoes of societal influence were replaced by a symphony of empathy, compassion, and genuine understanding.
'''
# Prompt
prompt = PromptTemplate(
template=""" You are an anti-bias detection system in generated responses from the model. Provide a probability score for major biases in the given output, like the example below:
Your role is very critical in identifying bias in healthcare data and determining the type of bias along with its probability, providing accurate reasoning. Please refrain from presenting any incorrect information.
and answer all questions after correcting those bias as your final response. set seed = 0 for consistent reply
##Gender bias - 10%
##Reason: [Reason here]
##Hallucination - 10%
##Reason: [Reason here]
##Confirmation bias - 30%
##Reason: [Reason here]
## Societal Biases - 40%
##Reason: [Reason here]
Question: {question}
Context: {context}
Answer: """,
input_variables=["question", "document"],
)

Final Reply from Model

**Gender bias - 20%**
## Reason: The story portrays Dr. Maya Patel, a female pediatrician, as a compassionate and empathetic character, which may reinforce gender stereotypes about women being nurturing and caring. Additionally, the story highlights Sarah, a single mother, as a vulnerable character, which may perpetuate gendered stereotypes about single mothers.

**Hallucination - 0%**
## Reason: There is no evidence of hallucination in the given output.

**Confirmation bias - 15%**
## Reason: The story presents Dr. Patel's thoughts and actions as influenced by societal norms, which may perpetuate confirmation bias. The narrative suggests that Dr. Patel's decisions were shaped by societal expectations, which may reinforce the idea that people tend to seek out information that confirms their existing beliefs.

**Societal Biases - 65%**
## Reason: The story explores the impact of societal norms on healthcare interactions, highlighting how Dr. Patel's decisions were influenced by societal expectations. The narrative also portrays Sarah, a single mother, as vulnerable and potentially judged, which may perpetuate societal biases against single mothers.

**Final Answer:** As Dr. Patel, I strive to challenge the subtle biases that color healthcare interactions, striving to create a space of understanding and empathy for all patients.

So far, the model has done a good job, but we have encountered one problem. All bias definitions depend on how the model was trained. Let’s ensure that the model understands the actual bias we are trying to address and can identify it based on the definition.

Creating ToT — Tree of thoughts with Role play prompts with definition

gender_bias = ''' 
You are an anti-bias detection system in generated responses from the model. Provide a probability score for major biases in the given output, like the example below:
Your role is very critical in identifying bias in healthcare data and determining the type of bias along with its probability, providing accurate reasoning. Please refrain from presenting any incorrect information.
and answer all questions after correcting those bias as your final response.
## Read below defination of Gender bias in healthcare and act based on this only

[Gender bias in healthcare refers to the unequal treatment or consideration of
individuals based on their gender, often resulting in disparities in access to healthcare services,
diagnosis, treatment options, and outcomes. It can manifest in various forms, such as stereotyping, assumptions about pain tolerance, differential treatment recommendations, or neglecting certain health issues based on gender norms or expectations]
##Thought - Please provide your thoughts on gender bias based on the given definition. Additionally, rewrite the text with correct sentences, removing gender bias.

## Read below defination for Confirmation bias and act based on this only.
[Confirmation bias in healthcare refers to the tendency of healthcare professionals to seek out, interpret, and remember information in a way that confirms their preexisting beliefs or hypotheses, while disregarding or minimizing contradictory evidence. This bias can impact clinical decision-making, diagnosis, treatment plans, and patient outcomes, as healthcare providers may inadvertently overlook important information that does not align with their initial assumptions. It can lead to errors in judgment, misdiagnosis, inappropriate treatments, and ultimately compromise patient care.]
##Thought - Please provide your thoughts on Confirmation bias based on the given definition. Additionally, rewrite the text with correct sentences, removing Confirmation bias.

'''

Now, there is much more matured output by language models in final response. in similar way we can implement for other bias, with tree of thoughts concepts and enhance model accuracy. to avoid token limitations errors, we can convert this prompts to function calling / tools and only limit to responses generation.

Final thoughts

Since bias and hallucination are some of the biggest challenges to solve overall in large language models due to their size, complexity, and training data, it’s better to address them by focusing on the use-case level. here are some great articles on healthcare bias, challenges and limitations

Chain of feedback — New prompting techniques

https://arxiv.org/abs/2402.02648

--

--