Virtual Labs

Naïve Bayes Classification for Text and Categorical Data using Prior and Posterior Probabilities

Which output does the classifier in the experiment predict?

a: Probability Explanation

Explanation

b: Feature vector Explanation

Explanation

c: Class label Explanation

Explanation

d: Training data Explanation

Explanation

Why do we perform train-test split ?

a: To ensure equal dataset size Explanation

Explanation

b: Same class distribution in train and test sets Explanation

Explanation

c: Faster training Explanation

Explanation

d: Higher recall Explanation

Explanation

How does class prior probability influence the final prediction in the Naive Bayes classifier used in the experiment?

a: It removes irrelevant features from the dataset Explanation

Explanation

b: It biases the prediction toward classes that occur more frequently in the training data Explanation

Explanation

c: It converts text data into numerical vectors Explanation

Explanation

d: It eliminates the need for likelihood estimation Explanation

Explanation

Why does Multinomial Naive Bayes use logarithmic probabilities during classification?

a: To increase model accuracy Explanation

Explanation

b: To simplify model interpretation Explanation

Explanation

c: To avoid numerical underflow when multiplying small probabilities Explanation

Explanation

d: To remove the need for smoothing Explanation

Explanation

During preprocessing, text is converted to lowercase, punctuation is removed, and extra spaces are eliminated. How do these steps improve the performance of the Naive Bayes classifier?

a: They increase the size of the vocabulary Explanation

Explanation

b: They ensure that different forms of the same word are treated consistently Explanation

Explanation

c: They introduce additional features Explanation

Explanation

d: They reduce the number of class labels Explanation

Explanation

The experiment uses Multinomial Naive Bayes rather than Gaussian Naive Bayes. Which explanation best justifies this choice for text classification?

a: Multinomial Naive Bayes assumes normally distributed features Explanation

Explanation

b: Multinomial Naive Bayes works effectively with frequency-based and count-like text features Explanation

Explanation

c: Gaussian Naive Bayes performs better with sparse matrices Explanation

Explanation

d: Gaussian Naive Bayes does not support binary classification Explanation

Explanation

In the context of the confusion matrix generated in the experiment, what does a false positive indicate?

a: A spam message correctly classified as spam Explanation

Explanation

b: A ham message incorrectly classified as spam Explanation

Explanation

c: A spam message incorrectly classified as ham Explanation

Explanation

d: A ham message correctly classified as ham Explanation

Explanation

In the spam detection experiment, which algorithm is used to classify messages as spam or ham?

a: Decision Tree Explanation

Explanation

b: Gaussian Naïve Bayes Explanation

Explanation

c: Multinomial Naive Bayes Explanation

Explanation

d: K-Means Clustering Explanation

Explanation

Which technique is used in the experiment to convert text messages into feature vectors?

a: One-Hot Encoding Explanation

Explanation

b: TF-IDF Vectorization Explanation

Explanation

c: Label Encoding Explanation

Explanation

d: Image Scaling Explanation

Explanation

Based on the overall experiment, why is Naive Bayes considered a strong baseline model for text classification research and applications?

a: It requires no feature engineering Explanation

Explanation

b: It is computationally efficient, interpretable, and performs well on high-dimensional sparse data Explanation

Explanation

c: It always outperforms deep learning models Explanation

Explanation

d: It does not require labelled data Explanation

Explanation