Binary Classification and Decision Boundary Analysis using Logistic Regression
In the dengue classification experiment, the model achieved a training accuracy of 0.9732 and a testing accuracy of 0.9740. What does this indicate about the model?
The confusion matrix for the testing data shows some false negatives (dengue-positive patients classified as negative). In a medical diagnosis context, why is minimising false negatives critical?
The F1-Score is defined as 2TP2TP + FP + FN. When is the F1-Score a more appropriate evaluation metric than accuracy?
In the simulation, the ROC curve was plotted with AUC = 0.998. What does an AUC value close to 1.0 signify?
During the simulation, individual sigmoid curves were plotted for features like Platelets and Hematocrit. If the sigmoid curve for a feature shows a steep transition from 0 to 1, what can be inferred?
In the experiment, precision on the test set was 0.9657 and recall was 0.9783. If the application requires minimising false positives (e.g., avoiding unnecessary treatments), which metric should be prioritised?
In the simulation, the dataset was split into 80% training and 20% testing with stratified sampling. What is the purpose of stratified splitting?
The logistic regression model in the experiment used L2 regularization with C = 1.0. If C is decreased to 0.01, what is the expected effect?
If the ROC curve is plotted for the dengue classification model and the curve passes through the point (FPR = 0.05, TPR = 0.95), what does this specific operating point represent?
In the experiment, the Platelets feature was dropped before training. If a highly correlated but less clinically relevant feature is kept instead, what issue might arise in the model coefficients?