Quantitative Biology : MSC Questions and Answers

Chapter 1: Descriptive statistics

Q. Find regression equation for following data:

x    6     2    10    4    8 

y    9     11    5    8    7

-

Data:

+---+----+

| x |  y |

+---+----+

| 6 |  9 |

| 2 | 11 |

|10 |  5 |

| 4 |  8 |

| 8 |  7 |

+---+----+


Calculations:

+---+----+-------+-------+------+------+

| x |  y |    dx    |   dy   |     dx*dy | dx^2 |

+---+----+-------+-------+------+------+

| 6 |  9 |     0   |     1   |       0        |   0  |

| 2 | 11 |    -4   |   3   |       -12     |  16  |

|10 |  5 |    4   |    -3   |      -12     |  16  |

| 4 |  8 |   -2    |     0   |       0      |   4  |

| 8 |  7 |     2   |   -1   |        -2     |   4  |

+---+----+-------+-------+------+------+


Summation:

Σ(dx*dy) = -26

Σ(dx^2) = 40


Regression Equation:

m = Σ(dx*dy) / Σ(dx^2) = -26 / 40 = -0.65

b = ȳ - m * x̄ = 8 - (-0.65 * 6) = 8 + 3.9 = 11.9


Regression Equation:

y=mx+b

y = -0.65x + 11.9


-When 10 sterile nutrient agar plates were exposed (10 min.) in a fruit juice manufacturing unit. Following number of colonies were obtained after incubation. Number of colonies (CFU) on each agar plate: 18, 11, 15, 20, 25, 23, 26, 13, 17. Calculate coefficient of variation. 
Data:
+-----+
| CFU |
+-----+
|  18 |
|  11 |
|  15 |
|  20 |
|  25 |
|  23 |
|  26 |
|  13 |
|  17 |
+-----+

Calculations:
+-----+-----------+-------------------+
| CFU | (CFU - μ)² | (CFU - μ)² / (n-1) |
+-----+-----------+-------------------+
|  18 |   0.3025  |     0.03781       |
|  11 |  50.0625  |     6.25781       |
|  15 |   7.0225  |     0.87781       |
|  20 |   2.8225  |     0.35281       |
|  25 |  36.7225  |     4.59031       |
|  23 |   8.1225  |     1.01531       |
|  26 |  45.5625  |     5.69531       |
|  13 |  33.4225  |     4.17781       |
|  17 |   2.8225  |     0.35281       |
+-----+-----------+-------------------+

Summation:
Σ(CFU - μ)² = 186.3
Σ(CFU - μ)² / (n-1) = 23.2875

Standard Deviation:
σ = √(Σ(CFU - μ)² / (n-1)) = √(23.2875) = 4.826 (approx.)

Coefficient of Variation:
CV = (σ / μ) * 100 = (4.826 / 18.66) * 100 = 25.86 (approx.)


Q. Write a short note on poisson distribution.

-

Poisson Distribution:

The Poisson distribution is a probability distribution that is commonly used to model the number of events that occur within a fixed interval of time or space. It is often applied to situations where events occur randomly and independently at a constant average rate over time.


Key characteristics of the Poisson distribution include:


1) Average Rate: The distribution is determined by a single parameter, λ (lambda), which represents the average rate.


2) Discrete Values: The Poisson distribution is discrete, meaning that it describes the probabilities of observing a specific number of events (0, 1, 2, 3, and so on) within the interval.


3) Independence: The events must occur independently of each other, with no influence from previous or future events.


4) Memorylessness: The probability of an event occurring in a given interval is not influenced by the time since the last event occurred.


Q. Write a short note on level of significanec.

-Significance Level:

The significance level, often denoted as α (alpha), is a critical component in hypothesis testing and represents the threshold for accepting or rejecting a null hypothesis.


Key points about the significance level include:


1) Determining Statistical Significance:It helps determine if the observed results are statistically significant or simply due to random chance.


2) Commonly Used Values: The significance level is typically set at 0.05 (5%) or 0.01 (1%).


3) Type I Error: The significance level is directly related to the Type I error, which occurs when the null hypothesis is rejected even though it is true. A lower significance level reduces the probability of committing a Type I error.


4) Contextual Interpretation: The significance level should be chosen carefully, considering the consequences of both Type I and Type II errors in the specific context of the study.


Q. Following is the data recorded on two variables in population. 

Calculate correlation and regression coefficient and comment on it. 

X  9   8   10    9  10   12  9    11   13    9 

Y 11 16  18   20 16   11  17  18   17    19

-

XYX - x̄Y - ȳ(X - x̄)(Y - ȳ)(X - x̄)²(Y - ȳ)²
911-1-5.35.3128.09
816-2-0.30.640.09
101801.7002.89
920-13.7-3.7113.69
10160-0.3000.09
12112-5.3-10.6428.09
917-10.7-0.710.49
111811.71.712.89
131730.72.190.49
919-12.7-2.717.29


Sum of (X - x̄)(Y - ȳ): 5.3 + 0.6 + 0 - 3.7 + 0 - 10.6 - 0.7 + 1.7 + 2.1 - 2.7 = -7.9

Sum of (X - x̄)²: 1 + 4 + 0 + 1 + 0 + 4 + 1 + 1 + 9 + 1 = 22

Sum of (Y - ȳ)²: 28.09 + 0.09 + 2.89 + 13.69 + 0.09 + 28.09 + 0.49 + 2.89 + 0.49 + 7.29 = 83.1

Now we can calculate the correlation coefficient (r) and the regression coefficient (b):


r = Sum of (X - x̄)(Y - ȳ) / sqrt((Sum of (X - x̄)²) * (Sum of (Y - ȳ)²))

= -7.9 / sqrt(22 * 83.1)

≈ -7.9 / sqrt(1826.2)

≈ -7.9 / 42.77

≈ -0.1845


b = Sum of (X - x̄)(Y - ȳ) / Sum of (X - x̄)²

= -7.9 / 22

≈ -0.3591


The correlation coefficient (r) is approximately -0.1845, and the regression coefficient (b) is approximately -0.3591.


Comment: The correlation coefficient (r) indicates a weak negative linear relationship between the variables X and Y. As the value of X increases, the value of Y tends to slightly decrease


Q. Calculate mean and mode of the following data:

Class Interval   0-5   5-10    10-15      15-20    20-25 25-30 

Frequency          2        4           8            5           4        1 

-

1. Calculate mean with its table

Class Interval Midpoint Frequency Midpoint x Frequency

0-5                    2.5          2                         5

5-10                  7.5           4                        30

10-15               12.5           8                        100

15-20                 17.5           5                        87.5

20-25                 22.5           4                        90

25-30                 27.5           1                        27.5


Step 3: Sum up the values in the "Midpoint x Frequency" column.


Sum of Midpoint x Frequency = 5 + 30 + 100 + 87.5 + 90 + 27.5 = 340


Step 4: Sum up the frequencies.


Total Frequency = 2 + 4 + 8 + 5 + 4 + 1 = 24


Step 5: Calculate the mean by dividing the sum of Midpoint x Frequency by the total frequency.


Mean = Sum of Midpoint x Frequency / Total Frequency = 340 / 24 = 14.17 


To calculate the mode:

The mode is the value that appears most frequently in the data set.


Looking at the frequencies, the class interval 10-15 has the highest frequency of 8. Therefore, the mode of the given data is 10-15.


Q. What is standard error of mean.

Standard Error of Mean:

  • Standard Error of Mean (SEM) measures the uncertainty or variability of the sample mean.
  • It tells you how much the sample mean is likely to deviate from the true population mean.
  • A smaller SEM means that the sample mean is a more precise estimate of the population mean.
  • SEM is calculated by (standard deviation of the sample)/sq.root(sample size)

Q. What is degrees of freedom.

-Degrees of Freedom:

  • -Degrees of Freedom (df) is a concept used in statistical analysis.
  • -It refers to the number of values in a calculation that are free to vary.
  • -In simple terms, it represents the number of observations in a sample that are independent and can provide information.
  • -Degrees of Freedom are often used in hypothesis testing and estimating population parameters.
  • -The df value affects the accuracy of statistical tests and determines the critical values from the distribution tables.
  • -For example, if you have a sample of 10 data points, you would typically have 9 degrees of freedom because the last data point's value is determined by the previous 9.

Q. Calculate the mean of the following data.

Sr. No. Bonus      No. of Persons

1                500               1

2               600               3

3               700               5

4              800                7

5              900               6

6             1000              2

7             1100                1


-

Sr. No. | Bonus | No. of Persons | Product (Bonus * No. of Persons)

1          | 500     | 1                       | 500

2          | 600     | 3                       | 1800

3          | 700     | 5                       | 3500

4          | 800     | 7                       | 5600

5          | 900     | 6                       | 5400

6          | 1000   | 2                       | 2000

7          | 1100    | 1                       | 1100


Sum of Product = 500 + 1800 + 3500 + 5600 + 5400 + 2000 + 1100 = 19900


Total number of persons = 1 + 3 + 5 + 7 + 6 + 2 + 1 = 25


Mean = Sum of Product / Total number of persons

= 19900 / 25

= 796


Therefore, the mean of the given data is 796.

Q. Define Variance

- Variance is a statistical measure that quantifies the spread or dispersion of a set of data points around the mean. It provides information about how the individual data points deviate from the average value.


The formula for variance, assuming a sample, is as follows:


s^2 = Σ((x - x̄)^2) / (n - 1)


s represents the standard deviation

Σ denotes the sum of

x represents each data point

x̄ represents the sample mean

n represents the sample size

Q. Determine the standard deviation from the following data: 10, 15, 25, 30

and 50.

Answer: 

-Data  | σ =(Data  - Mean)            |  (σ^2)

10       | -16                                   | 256

15      | -11                                    | 121

25      | -1                                      | 1

30      | 4                                       | 16

50      | 24                                     | 576


Variance = (256 + 121 + 1 + 16 + 576) / 5 = 970 / 5 = 194

Standard Deviation = √(Variance) ≈ √194 ≈ 13.928

 Chapter 2: Inferential Statistics- I


Q. Write a note on type 1 and type 2 errors.

Type 1 and Type 2 errors are concepts used in hypothesis testing. 

Type 1 Error (False Positive):

1) A Type 1 Error occurs when we reject a null hypothesis that is actually true. 

2) We conclude that there is relationship or effect, when in reality, there is no such effect or 

relationship present.

3) It is also known as false positive

4) Probability of commiting type 1 error is denoted by α (alpha).

5) By choosing a smaller significance level we decrease the chance of making a type 1 error.

Type 2 Error (False Negative):

1) A Type 2 Error occurs when we accept a null hypothesis that is actually false.

2) We conclude that there is no significant effect or relationship, even though in reality there is a significant effect or relationship.

3) This error is known as false negative.

4) The probability of coming type 2 error is denoted by β (beta).

5) Increasing the sample size can reduce risk of Type 2 error.


Q. Explain one tailed and two tailed tests.

- One Tailed Test:

1) It is used to test a directional hypothesis, when data supports hypothesis in one particular direction.

2) The critical region is only on one side of distribution.

3) It can be upper tailed test and lower tailed test.

4) For upper tailed test, alternative hypothesis is formulated to have parameter greater than certain value.

5) For lower tailed test, alternative hypothesis is formulated to have parameter smaller than certain value.


Two Tailed Test:

1) It is used to test a non directional hypothesis, when data supports hypothesis in either of the direction.

2) It allows parameter to be significantly different from hypothesized value either greater or smaller.

3) It does not specify any specific direction of effect.




Q. Write a note on Null Hypothesis.

1) Null hypothesis (H₀) is a basic idea that says there's no difference or effect between things being studied.
2) It assumes that any differences or effects observed are due to chance or random variation.
3) The null hypothesis is compared to an alternative hypothesis (H₁), which suggests there is a specific difference or effect.
4) Hypothesis testing is done to gather evidence to support or reject the null hypothesis based on data.
5) Statistical tests are used to see how likely the data would occur if the null hypothesis were true.
6) Examples of null hypotheses could be: "There's no difference in outcomes between two treatments" or "The coin is fair and not biased."
7) If the evidence strongly suggests that the data is unlikely to happen by chance under the null hypothesis, the null hypothesis is rejected.
8) If there's not enough evidence to reject it, the null hypothesis is accepted (but it doesn't mean it's proven true).
9) The null hypothesis helps scientists stay objective and avoid bias in their research.
10) It's an essential part of the scientific method and allows researchers to draw conclusions based on evidence gathered.
Q. What is Central Limit theorem.
- The Central Limit Theorem (CLT) is a fundamental concept in statistics that states that, under certain conditions, the distribution of the sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. 1) Sample means tend to follow a normal distribution: When you take multiple random samples from a population and calculate the mean of each sample, the distribution of those sample means will approximate a bell-shaped, normal distribution.
2) Sample size matters: The Central Limit Theorem holds true as long as the sample size is sufficiently large. As the sample size increases, the sample means more closely approximate a normal distribution.
3) Independent and identically distributed samples: The samples should be drawn independently and have the same distribution as the population.
4) Application to real-world scenarios: The Central Limit Theorem is widely used in inferential statistics. It allows us to make assumptions and draw conclusions about population parameters (e.g., population mean) based on sample statistics (e.g., sample mean).

Q. What is sampling method. What is Census method. Give 4 advantages of methods of Sampling Method:
1) Sampling method is a way to select a smaller group of people or items from a larger population for study.
2) It helps researchers make conclusions about the whole population based on the characteristics of the selected sample.
3) Random selection is often used to ensure fairness in choosing the sample.
4) It is cost-effective, saves time, and reduces the burden on participants. Census Method: 1) Census method means collecting information from every single person or item in the entire population.
2) It aims to get data from everyone to have a complete picture of the population.
3) It provides highly accurate information about the population's characteristics.
4) It is useful for analyzing specific subgroups and tracking changes over time. Advantages of Sampling Method over Census Method: 1) Cost-effective: Sampling method is cheaper because data is collected from a smaller group instead of the entire population.
2) Time-efficient: Sampling method saves time as researchers only need to collect data from a portion of the population.
3) Feasibility: Sampling is more practical for large or scattered populations, while a census may be difficult or impossible to conduct.
4) Reduced burden: Sampling reduces the burden on participants since only a fraction of the population needs to provide data, leading to higher response rates and better accuracy. Q. A complaint was registered stating that boys in the municipal school were underfed. Average weight of boys of age 10 is 32 kg with standard deviation 9 kg. A sample of 25 boys was selected from municipal school and average was found to be 29.5 kg. At alpha (0.05). Check whether this complaint is true or not by applying Z test. -
Z test : We use z-test when we want to compare sample mean (average of 25 boys sample) with population mean (average of all boys) and we know standard deviation of population (9kg).
Step 1: Hypothesis Null hypothesis (H₀): The average weight of boys in the municipal school is not significantly different from 32 kg.
Alternative hypothesis (H₁): The average weight of boys in the municipal school is significantly less than 32 kg.
Step 2: Siginifance level :(α):=0.05
Step 3: Calculate the test statistic (Z-score): Z = (x̄ - μ) / (σ / √n)
Z = (Sample Mean - Population Mean) / (Population Standard Deviation / √Sample Size) Z = (29.5 - 32) / (9 / √25) Z = -2.5 / (9 / 5) Z = -2.5 / 1.8 Z ≈ -1.39
Step 4: Determine the critical value: Look up the critical value corresponding to the chosen significance level from the Z-table critical value is approximately -1.645 Step 5: Compare the test statistic with the critical value:
If Calculated Z-score < Critical Value : Reject Null Hypothesis
If Calculated Z-score >= Critical Value : Accept Null Hypothesis
Since the calculated Z-score (-1.39) is greater than the critical value (-1.645), we accept the null hypothesis. Step 6: Interpret the Results: Based on the Z-test, there is not enough evidence to support the complaint that boys in the municipal school are underfed. The average weight of the sample (29.5 kg) is not significantly lower than the population mean of 32 kg.
Therefore, the conclusion is that the complaint about boys being underfed in the municipal school is not supported by the data.

Q. In a mutation breeding experiment, effect of gamma radiation on weight

of 10 seeds was determined. Mean weight in grams per plant of bean

variety is given. Analyze the data using t-test.

Control: 2.9, 3.1, 3.5, 3.4, 3.0, 4.0, 3.7, 3.0, 4.0, 4.0.

Test: 2.7, 2.8, 3.0, 3.5, 3.7, 3.2, 3.0, 3.0, 2.9, 2.8.


- T Test : We use t test when we want to compare means of two groups and determine if there is any significant difference between them.


Step 1: State the null hypothesis (H0) and alternative hypothesis (H1):


Null hypothesis (H0): There is no significant difference between the mean weights of the control and test groups.

Alternative hypothesis (H1): There is a significant difference between the mean weights of the control and test groups.


Step 2: Calculate the means of the control and test groups:

Control group mean (X1) = (2.9 + 3.1 + 3.5 + 3.4 + 3.0 + 4.0 + 3.7 + 3.0 + 4.0 + 4.0) / 10 = 3.46

Test group mean (X2) = (2.7 + 2.8 + 3.0 + 3.5 + 3.7 + 3.2 + 3.0 + 3.0 + 2.9 + 2.8) / 10 = 3.06


Step 3 : Calculate the standard deviation (s) for each group:


S = √[(Σ(X - X̄)^2) / (n - 1)]


Where:


Σ represents the sum of

X represents each individual value in the data set

X̄ represents the mean of the data set

n represents the number of data points in the data set


For the control group:

Deviation from mean (X1 - X̄1):

(2.9 - 3.44), (3.1 - 3.44), (3.5 - 3.44), (3.4 - 3.44), (3.0 - 3.44), (4.0 - 3.44), (3.7 - 3.44), (3.0 - 3.44), (4.0 - 3.44), (4.0 - 3.44)


S1 = √[(Σ(X1 - X̄1)^2) / (n1 - 1)]

Substituting the values:

s1 = √[(0.2024 + 0.0256 + 0.1024 + 0.0064 + 0.1936 + 0.3481 + 0.0864 + 0.1936 + 0.3481 + 0.3481) / (10 - 1)]

= √(1.8543 / 9)

= 0.4308


For the test group:

Deviation from mean (X2 - X̄2):

(2.7 - 3.08), (2.8 - 3.08), (3.0 - 3.08), (3.5 - 3.08), (3.7 - 3.08), (3.2 - 3.08), (3.0 - 3.08), (3.0 - 3.08), (2.9 - 3.08), (2.8 - 3.08)


S2 = √[(Σ(X2 - X̄2)^2) / (n2  - 1)]

Substituting the values:

s2 = √[(0.0064 + 0.0196 + 0.0004 + 0.1681 + 0.3481 + 0.0016 + 0.0004 + 0.0004 + 0.0064 + 0.0196) / (10 - 1)]

= √(0.5722 / 9)

= 0.2534


Step 4: Calculate the pooled standard deviation (sp):

sp = √[((n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2)]


Substituting the values:

sp = √[((10 - 1) * 0.4308^2 + (10 - 1) * 0.2534^2) / (10 + 10 - 2)]

= √[(9 * 0.1858 + 9 * 0.0642) / 18]

= √(0.361 + 0.058)

= √0.419

= 0.6472


Step 5: Calculate the t-value:

t = (X̄1 - X̄2) / (sp * √[(1/n1) + (1/n2)])

Substituting the values:

t = (3.44 - 3.08) / (0.6472 * √[(1/10) + (1/10)])

= 0.36 / (0.6472 * √(0.1 + 0.1))

= 0.36 / (0.6472 * √0.2)

= 0.36 / (0.6472 * 0.4472)

= 0.36 / 0.2893

= 1.245


Step 6: Determine the degrees of freedom (df):

df = n1 + n2 - 2

= 10 + 10 - 2

= 18


Step 8: Determine the critical t-value

at a(α = 0.05) with df = 18, the critical t-value is approximately 2.101.


If Calculated t-value > critical value we reject null hypothesis

If calculated t-value <= critical value we accept null hypothesis


Since the calculated t-value (1.245) is smaller than the critical t-value (2.101), we accept the null hypothesis. This means that there is not enough evidence to suggest a significant difference between the mean weights of the control and test groups in the mutation breeding experiment.

 Chapter 3: Inferential Statistics- II

Q. From the data given below find out whether the means of three samples

differ significantly or not by ANOVA.

Sample  One Sample   Two SampleThree

20               19                       13

10                13                        12

17                 17                        10

17                 12                          15

16                 09                        05


- ANOVA (Analysis of Variance) is used to find whether are are siginificant differences between the means of three or more groups. 

Here we find if there is significant difference between the means of the three samples.


Step 1: Hypothesis

Null hypothesis (H0): The means of the three samples are equal.

Alternative hypothesis (Ha): The means of the three samples are not equal (at least one pair differs significantly).


Step 2: Calculate the group means and the overall mean:

Let's calculate the means for each sample and the overall mean.


Sample One: Mean = (20 + 10 + 17 + 17 + 16) / 5 = 16

Sample Two: Mean = (19 + 13 + 17 + 12 + 9) / 5 = 14

Sample Three: Mean = (13 + 12 + 10 + 15 + 5) / 5 = 11

Overall Mean = (16 + 14 + 11) / 3 = 13.67


Step 3: Calculate the sum of squares between groups (SSB):

SSB measures the variation between the group means.


SSB = Σ(ni * (x̄i - x̄)^2)


Where:


ni represents the number of observations in the i-th group.

x̄i represents the mean of the i-th group.

x̄ represents the overall mean (mean of all groups combined).


SSB = (5 * (16 - 13.67)^2) + (5 * (14 - 13.67)^2) + (5 * (11 - 13.67)^2)

= 3.67 + 0.11 + 9.44

= 13.22


Step 4: Calculate the sum of squares within groups (SSW):

SSW measures the variation within each group.


SSW = Σ((xi - x̄i)^2)


Where:


xi represents an individual observation in the i-th group.

x̄i represents the mean of the i-th group.



SSW = (20 - 16)^2 + (10 - 16)^2 + (17 - 16)^2 + (17 - 16)^2 + (16 - 16)^2

+ (19 - 14)^2 + (13 - 14)^2 + (17 - 14)^2 + (12 - 14)^2 + (9 - 14)^2

+ (13 - 11)^2 + (12 - 11)^2 + (10 - 11)^2 + (15 - 11)^2 + (5 - 11)^2

= 46 + 36 + 1 + 1 + 0 + 25 + 1 + 9 + 4 + 25 + 4 + 1 + 1 + 16 + 36

= 230


Step 5: Calculate the degrees of freedom:

Degrees of freedom (df) are calculated as follows:


df_between = number of groups - 1 = 3 - 1 = 2

MSB = SSB / df_between

= 13.22 / 2

= 6.61


df_within = total number of observations - number of groups = 15 - 3 = 12

MSW = SSW / df_within

= 230 / 12

= 19.17


Step 7: Calculate the F-statistic:

The F-statistic is the ratio of the mean squares.


F = MSB / MSW = 6.61 / 19.17 = 0.344


Step 8: Determine the critical value and compare it with the calculated F-statistic:


let's assume a significance level of 0.05. Looking up the critical value for df_between = 2 and df_within = 12 in the F-distribution table, we find a critical value of 3.885.


Step 9: Make a decision:


If Caculated F-statistics > Critical Value we reject null hypothesis 

If calculated F-statistics <= Critical value, we accept null hypothesis


Here F statistics = 0.344 < 3.885 

We accept null hypothesis.

We conclude that there is insufficient evidence to suggest significant differences between the means of the three samples.

 Q, Nephropathy was observed in 100 patients of four classes of diabetes as

per severity of the disease.

Class                           I       II     III       IV

Number of patents       8    15     14      7

Is this difference due to chance? Test by chi square test.

-


Step 1: State the null hypothesis (H0) and alternative hypothesis (H1):

H0: The observed distribution of nephropathy among the four classes of diabetes is due to chance.

H1: The observed distribution of nephropathy among the four classes of diabetes is not due to chance.


Step 2: Setup observed and expected frequencies table

Class      Observed Frequency Expected Frequency

I                8                                  (8/44) * 100 = 18.18

II               15                                 (15/44) * 100 = 34.09

III              14                                (14/44) * 100 = 31.82

IV              7                                (7/44) * 100 = 15.91


Step 3: Calculate the chi-square statistic (χ^2):

Using the formula:

χ^2 = Σ((Observed frequency - Expected frequency)^2 / Expected frequency)

where Σ represents the sum of, 

and the calculation is performed for each class.


Class Obs freq Exp Freq    (O - E)^2 / E

I         8               18.            18 (8 - 18.18)^2 / 18.18 = 5.0317

II        15             34.09      (15 - 34.09)^2 / 34.09 = 8.0505

III       14            31.82       (14 - 31.82)^2 / 31.82 = 9.1929

IV       7              15.91        (7 - 15.91)^2 / 15.91 = 5.7673

Chi square value

Sum of (O - E)^2 / E = 5.0317 + 8.0505 + 9.1929 + 5.7673 = 28.0424


Step 5: Determine the degrees of freedom (df):

df = (Number of categories - 1)

In this case, df = 4 - 1 = 3.


Step 6 : Find critical value

With a significance level (α) of 0.05 and 3 degrees of freedom (df), the critical chi-square value from the chi-square distribution table is approximately 7.815.


Here, Calculated value > Critical Chi Square Value, so we reject null hypothesis


Comparing the calculated chi-square value (28.0424) with the critical chi-square value (7.815), we can conclude that the observed difference in nephropathy among the four classes of diabetes is statistically significant at the 0.05 significance level.


Q. In an experiment of pea breeding following frequency of the seeds in F2

generation were obtained. With the help of chi square determined whether

the obtained ratio match with Mendel’s dihybrid ratio. [7]

Round yellow : 315

Wrinkled yellow : 101

Round green : 108

Wrinkled green : 32


Answer: 

The chi-square test is used to determine whether the observed frequencies in a categorical data set differ significantly from the expected frequencies.


Step 1: State the null hypothesis (H0) and alternative hypothesis (H1):

H0: The observed frequencies match the expected frequencies based on Mendel's dihybrid ratio.

H1: The observed frequencies do not match the expected frequencies based on Mendel's dihybrid ratio.


Step 2: Calculate the expected frequencies:

Based on Mendel's dihybrid ratio, the expected frequencies can be calculated. The expected ratio for each category in Mendel's dihybrid ratio is 9:3:3:1.


Category              Observed Frequency           Expected Frequency

Round yellow          315                                  (9/16) * 556 = 310.875

Wrinkled yellow       101                                   (3/16) * 556 = 104.625

Round green           108                                   (3/16) * 556 = 104.625

Wrinkled green         32                                    (1/16) * 556 = 34.875


Step 3: proceed to calculate the chi-square statistic.


Category                       Obs Freq    Exp Freq      (O - E)^2 / E

Round yellow                315              310.875                      (315 - 310.875)^2 / 310.875 = 0.0516

Wrinkled yellow             101              104.625                   (101 - 104.625)^2 / 104.625 = 0.1308

Round green                  108             104.625                     (108 - 104.625)^2 / 104.625 = 0.1285

Wrinkled green                 32             34.875                        (32 - 34.875)^2 / 34.875 = 0.1685


Sum of (O - E)^2 / E = 0.0516 + 0.1308 + 0.1285 + 0.1685 = 0.4794


The calculated chi-square statistic (χ^2) is 0.4794.


Step 4: Find degrees of freedom 

degrees of freedom (df = 4 - 1 = 3)


Step 5: compare this calculated chi-square value with the critical chi-square value


The critical chi-square value for a significance level of 0.05 and 3 degrees of freedom is approximately 7.815.


Since the calculated chi-square value (0.4794) is smaller than the critical chi-square value (7.815), we accept the null hypothesis. 


 the obtained ratio of the pea breeding experiment does not significantly differ from Mendel's dihybrid ratio, based on the chi-square test at a significance level of 0.05.


Q. In F2 generation Mendel obtained 621 tall and 187 dwarf plants. Suggest

by applying chi square test, whether this ratio is in accordance with the

Mendel monohybrid ratio or it deviates from this ratio.


-


Step 1: State the null hypothesis (H0) and alternative hypothesis (H1):

H0: The observed frequencies match the expected frequencies based on Mendel's monohybrid ratio.

H1: The observed frequencies do not match the expected frequencies based on Mendel's monohybrid ratio.


Step 2: Set up the observed and expected frequency tables:


Category     Observed Frequency       Expected Frequency

Tall                     621                             (3/4) * 808 = 606.75

Dwarf                 187                              (1/4) * 808 = 201.25


Now, we can proceed to calculate the chi-square statistic.


Category      Obs Freq  Exp Freq              (O - E)^2 / E

Tall                  621          606.75          (621 - 606.75)^2 / 606.75 = 2.1025

Dwarf               187         201.25          (187 - 201.25)^2 / 201.25 = 4.0301


Sum of (O - E)^2 / E = 2.1025 + 4.0301 = 6.1326




Steps 3: Degree of Freedom


Since there are 2 categories (Tall and Dwarf), the degrees of freedom (df) is 2 - 1 = 1.

significance level (α) of 0.05 and 1 degree of freedom (df), 


Step 4: Find critical value


the critical chi-square value from the chi-square distribution table is approximately 3.841.


Step 5: Result 


Comparing the calculated chi-square value (6.1326) with the critical chi-square value (3.841), we can conclude that the observed ratio of tall to dwarf plants deviates significantly from Mendel's monohybrid ratio at the 0.05 significance level.


based on the chi-square test, we can suggest that the observed ratio of tall to dwarf plants in the F2 generation does not match Mendel's monohybrid ratio and deviates significantly from it.


 Chapter 4: Probability and Probability distribution.

Q. An average five cars arrive at toll booth every min. Assume it is Poisson distribution, what is the probability that exactly 0, 1, 2, 3 and 4 cars arrive in one min - Poisson Distribution: We use poisson distribution to find number of events occuring in a fixed interval of time and space when average rate of occurence is known. 1. FInd average rate of occurence : 5 cars/min 2. Use poison formula: P(x; λ) = (e^(-λ) * λ^x) / x! P(x; λ) represents the probability of x events occurring given the average rate of occurrence λ. e= (approximately 2.71828). λ= average rate of occurrence. x is the number of events you want to calculate the probability for. 3. Calculate the probabilities: Use poisson formula for each number of cars (0,1,2,3,4,) P(0; 5) = (e^(-5) * 5^0) / 0! P(1; 5) = (e^(-5) * 5^1) / 1! P(2; 5) = (e^(-5) * 5^2) / 2! P(3; 5) = (e^(-5) * 5^3) / 3! P(4; 5) = (e^(-5) * 5^4) / 4! P(0; 5) = (0.00674 * 1) / 1 ≈ 0.00674 P(1; 5) = (0.00674 * 5) / 1 ≈ 0.0337 P(2; 5) = (0.00674 * 25) / 2 ≈ 0.0843 P(3; 5) = (0.00674 * 125) / 6 ≈ 0.1405 P(4; 5) = (0.00674 * 625) / 24 ≈ 0.1756 4. Interpret the Results: The calculated values are probabilities of observing 0,1,2,3,4 cars ariving at toll booth 1 one minute.


Q.Random testing of ABO blood group in the offspring of only AB couples

in an Europian population obtained the following distribution of blood

groups.

A-312, AB-575 & B-313

Test whether the data is consistent with the normal segregation of alleles

in the population (i.e. 1:2:1 ratio)

-

-let's set up the null hypothesis (H0) and alternative hypothesis (Ha):

Step 1: 

H0: The observed frequencies are consistent with the expected frequencies under normal segregation of alleles.

Ha: The observed frequencies are not consistent with the expected frequencies under normal segregation of alleles.


Step 2: Expected Frequencies

E(A) = (312 + 313) / 4 = 625 / 4 = 156.25

E(AB) = 2 * E(A) = 2 * 156.25 = 312.5

E(B) = E(A) = 156.25



We can calculate the chi-square statistic using the following formula:


χ^2 = Σ [(O(i) - E(i))^2 / E(i)]


Applying this formula to our data:


χ^2 = [(312 - 156.25)^2 / 156.25] + [(575 - 312.5)^2 / 312.5] + [(313 - 156.25)^2 / 156.25]


Calculating this:


χ^2 = (155.75^2 / 156.25) + (262.5^2 / 312.5) + (156.75^2 / 156.25)


χ^2 = 155.75 + 175 + 156.75


χ^2 = 487.5


degrees of freedom (df) is (3 - 1) = 2.


Assuming a significance level of 0.05, the critical chi-square value with 2 degrees of freedom is approximately 5.991.


Since 487.5 > 5.991, we reject the null hypothesis (H0). This means that the observed distribution of blood groups is not consistent with the expected distribution under normal segregation of alleles (1:2:1 ratio).


Q. A traffic police records an average of three road accidents per week.

The number of accidents is distributed according to a poisson distribution.

Calculate the probability of exactly two accidents in any week.


- The Poisson distribution is often used to model the number of events occurring in a fixed interval of time or space, given the average rate of occurrence.


In this case, we know that the average number of accidents per week is three. Let's denote this average as λ (lambda).


The formula for the Poisson distribution is:


P(x; λ) = (e^(-λ) * λ^x) / x!


Where:


P(x; λ) is the probability of x events occurring given the average rate λ.

e is the mathematical constant approximately equal to 2.71828.

x is the actual number of events (in this case, two).

λ is the average rate of events (in this case, three).

x! denotes the factorial of x.


Now let's calculate the probability of exactly two accidents in any week using the Poisson distribution formula:


P(2; 3) = (e^(-3) * 3^2) / 2!


Calculating this:


P(2; 3) = (2.71828^(-3) * 3^2) / 2!


P(2; 3) = (0.049787 * 9) / 2


P(2; 3) = 0.14936


So, the probability of exactly two accidents in any given week is approximately 0.14936, or 14.94%.

Q. In a town, 10 accidents take place in 50 days. Assuming its PD, find out the probability of at least 3 accidents in a day. - Poisson Distribution: We use poisson distribution to find number of events occuring in a fixed interval of time and space when average rate of occurence is known. 1. FInd average rate of occurence : 10 accidents in 50 days therefore, 10 accidents/ 50 days = 0.2/per day 2. Use poison formula: P(x; λ) = (e^(-λ) * λ^x) / x! P(x; λ) represents the probability of x events occurring given the average rate of occurrence λ. e= (approximately 2.71828). λ= average rate of occurrence. x is the number of events you want to calculate the probability for. 3. Calculate the probabilities: To find at least 3 accidents probability, add probabilities of three or more accidents p(atleast 3 accidents)= P(3;0.2)+P(4;0.2)+P(5;0.2)+...P(10;0.2) 4: Perform Calculations: P(3; 0.2) ≈ (0.8187 * 0.008) / 6 P(4; 0.2) ≈ (0.8187 * 0.0016) / 24 P(5; 0.2) ≈ (0.8187 * 0.00032) / 120 ... P(10; 0.2) ≈ (0.8187 * 0.0000001024) / 3,628,800 P(at least 3 accidents) = P(3; 0.2) + P(4; 0.2) + P(5; 0.2) + ... + P(10; 0.2) P(at least 3 accidents) ≈ 0.0010916 + 0.00005417 + 0.00000136 + 2.614e-08 + 2.059e-10 + 1.546e-12 + 8.466e-15 + 1.414e-17 5. Interpret the result: Poisson distribution with an average of 10 accidents in 50 days, is approximately 0.00114717 Q. If a chairman is to be selected from five persons with their profile as follows: Sex Age Male 40 Male 43 Female 38 Female 27 Male 65 What is the probability that it would be female or a person over 30 years? - 1. Condition: the Chairman should either be a female or a person over 30 years old. 2. Count the candidates: There are five candidates in total. 3. Calculate the probability of selecting a female candidate: There are two female candidates, so the probability of selecting a female candidate is 2 out of 5 or 2/5, which is 0.4. 4. Calculate the probability of selecting a person over 30 years: There are three candidates over 30 years old (two males and one female). So, the probability of selecting a person over 30 years is 3 out of 5 or 3/5, which is 0.6. 5. Adjust for double counting: One female candidate satisfies both conditions (female and over 30 years old). So, we need to subtract the probability of selecting a candidate who is both female and over 30 years old. There is one such candidate out of the five, so the probability is 1 out of 5 or 1/5, which is 0.2. 6. Add the probabilities of selecting a female candidate and selecting a person over 30 years, and then subtract the probability of selecting both conditions. P(female or over 30 years) = P(female) + P(over 30 years) - P(female and over 30 years) = 0.4 + 0.6 - 0.2 = 0.8 7. Interpret the result: The probability that the selected chairman would be either female or a person over 30 years is 0.8 or 80%.

Q. From a pack of 52 cards, one card is drawn at random. What is the

probability that it is a king or queen of heart?

- There are 52 cards, so 52 possible outcomes.

- There are 2 favourable outcomes, 1 king of hearts and 1 queen of hearts.

- Calculate probability

Probability= Number of favourable outcomes/ Number of Possible Outcomes

Probability= 2/56

Probability= 1/26

Therefore, the probability of drawing a king or queen of hearts from a deck of 52 cards is 1/26.


Q. What is the probability of getting either ace or spade from a pack of 52 cards?

- - There are 52 cards, so 52 possible outcomes.

- There are 4 aces in one deck (spades, hearts, diamonds, clubs). Thre are 13 spades in one deck (ace,2,3,4,5,6,7,8,9,10,jack,queen,king)

- Dont double count the ace of spade. 

- Total favourable outcomes = 4+12=16

- Calculate probability.

Probability= Number of favourable outcomes/ Number of Possible Outcomes

Probability= 16/52

Probability= 4/13


Therefore, probability of drawing either an ace or a spade from a deck of 52 cards is 4/13.

Q. The probability that evening college students will graduate is 0.6. Find the probability that out of 5 students i) None graduate ii) One graduate iii) At least one graduate

- Here since there are fixed number of trials with 2 possible outcomes of success or failure , we will use bionmial distribution formula.

- It has specific number of successes(k) , out of given number of trials (n) with a known probability of success (p).

- The formula is P(X = k) = C(n, k) * p^k * (1 - p)^(n - k)

1) None Graduate

- -  n= 5, k= 0, p=0.6 

C(n, k) = n! / (k! * (n - k)!)

P(X = 0) = C(5, 0) * 0.6^0 * (1 - 0.6)^(5 - 0) = 1 * 1 * 0.4^5 = 0.4^5 = 0.01024

2) One Graduate

- -  n= 5, k= 1, p=0.6 

P(X = 1) = C(5, 1) * 0.6^1 * (1 - 0.6)^(5 - 1) = 5 * 0.6 * 0.4^4 = 5 * 0.6 * 0.4^4 = 0.1536

3) Atleast one graduate.

- -  n= 5

To find probability of atleast one graduate

P(atleast one graduate )= 1- P (none graduate)

=1-0.01024

=0.98976


Q. What is parametric and non parametric test.

-Parametric tests:


-Assume that data follows a specific pattern, like a bell-shaped curve.

-They make assumptions about how the data is spread out and located.

-Examples include tests like t-tests, ANOVA, and correlation.

-They work better when data follows these assumptions.

-If the assumptions are not met, results may be wrong.


Non-parametric tests:


-Don't make assumptions about how the data is shaped or spread out.

-They are more flexible and can be used with different types of data.

-Examples include tests like Mann-Whitney U test, Kruskal-Wallis test, and correlation without assuming a specific pattern.

-They work well even when data doesn't follow a specific pattern.

-They are useful when dealing with small sample sizes or data that doesn't fit the assumptions of parametric tests.

-In simple terms, parametric tests assume specific patterns in the data and work well when those patterns are met. Non-parametric tests don't assume any specific pattern and can be used with different types of data or when the assumptions of parametric tests aren't met.

Q. When 10 coins are tossed, find the probability of exactly six heads.

- Here since there are fixed number of trials with 2 possible outcomes of Head or Tail , we will use bionmial distribution formula.

- It has specific number of successes(k) , out of given number of trials (n) with a known probability of success (p).

- The formula is P(X = k) = C(n, k) * p^k * (1 - p)^(n - k)

- -  n= 10, k= 6, p=0.5 (probability of getting a head or tail)

 - P(X = 6) = C(10, 6) * (0.5)^6 * (1 - 0.5)^(10 - 6)

P(X = 6) = 210 * (0.5)^6 * (0.5)^4 = 210 * 0.015625 * 0.0625 = 0.328125

Therefore, the probability of exactly six heads when 10 coins are tossed is approximately 0.328125 or 32.81%.

Q. What is the probability that a queen, king and Joker are drawn in the

same order from pack of 52 cards without replacement?

- There are 52 cards, so 52 possible outcomes.

- There are 16 favourable outcomes,

4 queens,  4 kings and 1 joker.

For first card drawn= there are 4 queens availble.

For second card drawn (after the queen), there are 4 kings availbale

For third card drawn (after queen and king), there is 1 joker availbale.

4*4*1=16


- Calculate probability

Probability= Number of favourable outcomes/ Number of Possible Outcomes

Probability= 16/52

Probability=4/13

Therefore, the probability of drawing a queen, king, and joker in the same order from a deck of 52 cards without replacement is 4/13.

Comments

Popular

Microbial Systematics : PYQ

Quantitative Biology : PYQ