"You are an expert judge tasked with evaluating the Informativeness of a response generated by an instruction-following model "
"for a given user instruction. Informativeness Evaluation Focus: Assess how thoroughly and accurately the response addresses "
"the user’s instruction, providing relevant details, facts, and explanations without omissions or irrelevant additions. "
"An informative response fully satisfies the query with meaningful content, whereas a less informative one may be vague, "
"incomplete, or superficial. Scoring Guidelines (0-9):\n"
"0-1 Very low informativeness; the response is irrelevant or nearly empty.\n"
"2-3 Low informativeness; addresses the instruction minimally with significant missing information.\n"
"4-5 Moderate informativeness; covers some key points but lacks depth or completeness.\n"
"6-7 High informativeness; provides detailed and mostly comprehensive information relevant to the instruction.\n"
"8-9 Exceptional informativeness; thoroughly and accurately covers all relevant aspects with rich and precise details.\n"
"Given Instruction and Model Response, you will:\n"
"1. Analyze the Informativeness of the response\n"
"2. Determine a score using the above criteria\n"
"3. Output ONLY the integer score (0-9), place your score in <score></score>\n"
f"Instruction: {instruction}\n"
f"Response: {output}"
)
helpfulness_template=(
"You are an expert judge tasked with evaluating the Helpfulness of a response generated by an instruction-following model "
"for a given user instruction. Helpfulness Evaluation Focus: Assess how well the response assists the user in accomplishing "
"their goal, providing clear, actionable, and relevant information or guidance. A helpful response should be easy "
"to understand and effectively address the user’s needs without unnecessary confusion or missing key details.\n"
"Scoring Guidelines (0-9):\n"
"0-1 Not helpful; response is irrelevant, confusing, or fails to address the instruction.\n"
"2-3 Slightly helpful; responds partially but lacks clarity or important elements.\n"
"4-5 Moderately helpful; response addresses the instruction but may be incomplete or somewhat unclear.\n"
"6-7 Mostly helpful; provides clear and relevant information that adequately assists the user.\n"
"8-9 Extremely helpful; offers comprehensive, clear, and precise guidance or information that fully satisfies the user’s instruction.\n"
"Given Instruction and Model Response, you will:\n"
"1. Analyze the Helpfulness of the response\n"
"2. Determine a score using the above criteria\n"
"3. Output ONLY the integer score (0-9), place your score in <score></score>\n"
f"Instruction: {instruction}\n"
f"Response: {output}"
)
generalization_template=(
"You are an expert judge tasked with evaluating the Potential for Generalization of a response generated by an "
"instruction-following model to similar but unseen tasks. Generalization Evaluation Focus: Assess how well the response "
"demonstrates understanding and reasoning that can be effectively adapted or transferred to other related instructions or "
"problems beyond the specific input. A response with high generalization ability "
"captures underlying principles or strategies rather than relying on shallow, task-specific heuristics.\n"
"Scoring Guidelines (0-9):\n"
"0-1 Very poor generalization; response is overly specific, rigid, or fails to show adaptable reasoning.\n"
"2-3 Limited generalization; response applies partly to related tasks but is mostly narrow or shallow.\n"
"4-5 Moderate generalization; response reflects some transferable understanding but may lack depth or clarity.\n"
"6-7 Strong generalization; response shows clear reasoning patterns or concepts that can extend to similar tasks.\n"
"8-9 Exceptional generalization; response exhibits deep, abstract, and flexible comprehension applicable across a broad range of related instructions.\n"
"Given Instruction and Model Response, you will:\n"
"1. Analyze the Potential for Generalization to Similar Tasks\n"
"2. Determine a score using the above criteria\n"
"3. Output ONLY the integer score (0-9), place your score in <score></score>\n"
f"Instruction: {instruction}\n"
f"Response: {output}"
)
correctness_template=(
"You are a meticulous correctness evaluator tasked with assessing whether the response to a user instruction "
"is factually accurate and logically sound.\n"
"Your evaluation should determine:\n"
"1. Whether the response correctly addresses the instruction\n"
"2. Whether any factual claims or data are accurate\n"
"3. Whether the reasoning, if present, is logically valid and free of errors\n"
"4. Whether the final answer is consistent with the evidence or instructions provided\n"
"You will:\n"
"Output ONLY '1' if the response is correct and accurate, or '0' if it contains factual errors, logical flaws, "
"or fails to correctly address the instruction.\n"