Reading the column names from left to right that represent the judge’s names between **Jimena Hoffner** and **Noelia Barsel** you’ll see that:

- 1st-5th and 11th-15th judges belong to what we will denote as
**panel 1**. - The 6th-10th judges and 16th-20th judges belong to what we will denote as
**panel 2**.

**Notice anything?** Notice how dancers that were judged by panel 2 show up in much larger proportion and dancers that were judge by panel 1. If you scroll through the **PDF**** of **this data table you’ll see that this proportional difference holds up throughout the competitors that scored well enough to advance to the semi-final round.

**Note: **The dancers shaded in GREEN advanced to the semi-final round. While dancers NOT shaded in Green didn’t advance to the semi-final round.

So this begs the question,** is this proportional difference real or is it due to random sampling**, random assignment of dancers to one panel over the other? Well, there’s a statistical test we can use to answer this question.

## Two-Tailed Test for Equality between Two Population Proportions

We are going to use the two-tailed z-test to test if there is a significant difference between the two proportions in either direction. We are interested in whether one proportion is significantly different from the other, regardless of whether it is larger or smaller.

**Statistical Test Assumptions**

**Random Sampling**: The samples must be independently and randomly drawn from their respective populations.**Large Sample Size**: The sample sizes must be large enough for the sampling distribution of the difference in sample proportions to be approximately normal. This approximation comes from the**Central Limit Theorem.****Expected Number of Successes and Failures**: To ensure the normal approximation holds, the number of expected successes and failures in each group should be at least 5.

Our dataset mets all these assumptions.

**Conduct the Test**

**Define our Hypotheses**

**Null Hypothesis:** The proportions from each distribution are the same.

**Alt. Hypothesis: **The proportions from each distribution are the NOT the same.

**2. Pick a Statistical Significance level**

The default value for alpha is 0.05 (5%). We don’t have a reason to relax this value (i.e. 10%) or to make it more stringent (i.e. 1%). So we’ll use the default value. Alpha represents our tolerance for falsely rejecting the Null Hyp. in favor of the Alt. Hyp due to random sampling (i.e. Type 1 Error).

Next, we carry out the test using the Python code provided below.

`def plot_two_tailed_test(z_value):`

# Generate a range of x values

x = np.linspace(-4, 4, 1000)

# Get the standard normal distribution values for these x values

y = stats.norm.pdf(x)# Create the plot

plt.figure(figsize=(10, 6))

plt.plot(x, y, label='Standard Normal Distribution', color='black')

# Shade the areas in both tails with red

plt.fill_between(x, y, where=(x >= z_value), color='red', alpha=0.5, label='Right Tail Area')

plt.fill_between(x, y, where=(x <= -z_value), color='red', alpha=0.5, label='Left Tail Area')

# Define critical values for alpha = 0.05

alpha = 0.05

critical_value = stats.norm.ppf(1 - alpha / 2)

# Add vertical dashed blue lines for critical values

plt.axvline(critical_value, color='blue', linestyle='dashed', linewidth=1, label=f'Critical Value: {critical_value:.2f}')

plt.axvline(-critical_value, color='blue', linestyle='dashed', linewidth=1, label=f'Critical Value: {-critical_value:.2f}')

# Mark the z-value

plt.axvline(z_value, color='red', linestyle='dashed', linewidth=1, label=f'Z-Value: {z_value:.2f}')

# Add labels and title

plt.title('Two-Tailed Z-Test Visualization')

plt.xlabel('Z-Score')

plt.ylabel('Probability Density')

plt.legend()

plt.grid(True)

# Show plot

plt.savefig(f'../images/p-value_location_in_z_dist_z_test_proportionality.png')

plt.show()

def two_proportion_z_test(successes1, total1, successes2, total2):

"""

Perform a two-proportion z-test to check if two population proportions are significantly different.

Parameters:

- successes1: Number of successes in the first sample

- total1: Total number of observations in the first sample

- successes2: Number of successes in the second sample

- total2: Total number of observations in the second sample

Returns:

- z_value: The z-statistic

- p_value: The p-value of the test

"""

# Calculate sample proportions

p1 = successes1 / total1

p2 = successes2 / total2

# Combined proportion

p_combined = (successes1 + successes2) / (total1 + total2)

# Standard error

se = np.sqrt(p_combined * (1 - p_combined) * (1/total1 + 1/total2))

# Z-value

z_value = (p1 - p2) / se

# P-value for two-tailed test

p_value = 2 * (1 - stats.norm.cdf(np.abs(z_value)))

return z_value, p_value

min_score_for_semi_finals = 7.040

is_semi_finalist = df.PROMEDIO >= min_score_for_semi_finals

# Number of couples scored by panel 1 advancing to semi-finals

successes_1 = df[is_semi_finalist][panel_1].dropna(axis=0).shape[0]

# Number of couples scored by panel 2 advancing to semi-finals

successes_2 = df[is_semi_finalist][panel_2].dropna(axis=0).shape[0]

# Total number of couples that where scored by panel 1

n1 = df[panel_1].dropna(axis=0).shape[0]

# Total sample of couples that where scored by panel 2

n2 = df[panel_2].dropna(axis=0).shape[0]

# Perform the test

z_value, p_value = two_proportion_z_test(successes_1, n1, successes_2, n2)

# Print the results

print(f"Z-Value: {z_value:.4f}")

print(f"P-Value: {p_value:.4f}")

# Check significance at alpha = 0.05

alpha = 0.05

if p_value < alpha:

print("The difference between the two proportions is statistically significant.")

else:

print("The difference between the two proportions is not statistically significant.")

# Generate the plot

# P-Value: 0.0000

plot_two_tailed_test(z_value)

The plot shows that the Z-value calculated exists far outside the range of z-values that we’d expect to see if the null hypothesis is true. Thus resulting in a p-value of 0.0 indicating that we must reject the null hypothesis in favor of the alternative.

This means that the differences in proportions is real and not due to random sampling.

- 17% of dance coupes judged by panel 1 advanced to the semi-finals
- 42% of dance couples judged by panel 2 advanced to the semi-finals

Our first statistical test for bias has provided evidence that there is a positive bias in scores for dancers judged by panel 2, representing a nearly 2x boost.

Next we dive into the scoring distributions of each individual judge and see how their individual biases affect their panel’s overall bias.