Virtual Labs

Binary Classification and Decision Boundary Analysis using Logistic Regression

In the dengue classification experiment, the model achieved a training accuracy of 0.9732 and a testing accuracy of 0.9740. What does this indicate about the model?

a: The model is severely overfitting the training data Explanation

Explanation

b: The model generalises well since training and testing accuracies are very close Explanation

Explanation

c: The model is underfitting and needs more features Explanation

Explanation

d: The testing accuracy should always be much lower than training accuracy Explanation

Explanation

The confusion matrix for the testing data shows some false negatives (dengue-positive patients classified as negative). In a medical diagnosis context, why is minimising false negatives critical?

a: False negatives reduce the precision of the model Explanation

Explanation

b: False negatives increase the specificity of the model Explanation

Explanation

c: A false negative means a sick patient is incorrectly cleared, potentially delaying life-saving treatment Explanation

Explanation

d: False negatives only affect the ROC curve and have no clinical significance Explanation

Explanation

The F1-Score is defined as 2TP2TP + FP + FN. When is the F1-Score a more appropriate evaluation metric than accuracy?

a: When the dataset has perfectly balanced classes Explanation

Explanation

b: When the dataset has imbalanced class distributions Explanation

Explanation

c: When only true negatives matter Explanation

Explanation

d: When the model outputs continuous values Explanation

Explanation

In the simulation, the ROC curve was plotted with AUC = 0.998. What does an AUC value close to 1.0 signify?

a: The model performs no better than random guessing Explanation

Explanation

b: The model has near-perfect ability to distinguish between dengue-positive and dengue-negative patients across all thresholds Explanation

Explanation

c: The model always predicts the majority class Explanation

Explanation

d: The model has high bias and low variance Explanation

Explanation

During the simulation, individual sigmoid curves were plotted for features like Platelets and Hematocrit. If the sigmoid curve for a feature shows a steep transition from 0 to 1, what can be inferred?

a: The feature has no discriminative power for dengue classification Explanation

Explanation

b: The feature values overlap completely between the two classes Explanation

Explanation

c: The feature strongly separates the two classes, and the model assigns high confidence near the transition region Explanation

Explanation

d: The model is underfitting that particular feature Explanation

Explanation

In the experiment, precision on the test set was 0.9657 and recall was 0.9783. If the application requires minimising false positives (e.g., avoiding unnecessary treatments), which metric should be prioritised?

a: Recall, because it captures all positive cases Explanation

Explanation

b: Precision, because it measures the proportion of predicted positives that are truly positive Explanation

Explanation

c: Accuracy, because it reflects overall correctness Explanation

Explanation

d: Specificity, because it measures the true negative rate Explanation

Explanation

In the simulation, the dataset was split into 80% training and 20% testing with stratified sampling. What is the purpose of stratified splitting?

a: To ensure that both training and testing sets have the same proportion of dengue-positive and dengue-negative cases Explanation

Explanation

b: To randomly shuffle the data without preserving class ratios Explanation

Explanation

c: To allocate all positive cases to the training set Explanation

Explanation

d: To increase the size of the testing set Explanation

Explanation

The logistic regression model in the experiment used L2 regularization with C = 1.0. If C is decreased to 0.01, what is the expected effect?

a: Stronger regularization, leading to smaller weight coefficients and potentially higher bias Explanation

Explanation

b: Weaker regularization, allowing larger weights and potentially higher variance Explanation

Explanation

c: No change, since C does not affect the model Explanation

Explanation

d: The model will switch from classification to regression Explanation

Explanation

If the ROC curve is plotted for the dengue classification model and the curve passes through the point (FPR = 0.05, TPR = 0.95), what does this specific operating point represent?

a: The model misclassifies 95% of negative cases and correctly identifies 5% of positive cases Explanation

Explanation

b: At this threshold, 95% of actual dengue-positive patients are detected while only 5% of dengue-negative patients are incorrectly flagged as positive Explanation

Explanation

c: The model has 95% precision and 5% recall Explanation

Explanation

d: The total error rate of the model is 5% Explanation

Explanation

In the experiment, the Platelets feature was dropped before training. If a highly correlated but less clinically relevant feature is kept instead, what issue might arise in the model coefficients?

a: The coefficients become uninterpretable due to multicollinearity, though predictions may still be accurate Explanation

Explanation

b: The model automatically removes correlated features Explanation

Explanation

c: The sigmoid function changes its shape Explanation

Explanation

d: The model will produce an AUC of exactly 0.5 Explanation

Explanation