Data Preprocessing and Feature Engineering
How is the Cabin attribute handled during preprocessing in this experiment?
Suppose a numerical attribute contains several extreme outliers. Which imputation method would generally be more appropriate than mean imputation?
Why was the mode (most frequent value) used to fill missing values in the Embarked attribute?
Why must categorical variables such as Sex be encoded before applying machine learning algorithms?
Which potential issue may arise when using One-Hot Encoding for categorical variables with many unique categories?
Why is normalization applied to numerical features during preprocessing?
Which type of machine learning algorithm is most sensitive to feature scaling?
Which visualization technique is most suitable for identifying outliers in numerical data?
Why is data visualization performed after preprocessing in this experiment?
After completing preprocessing and feature engineering, what is the next logical step in a machine learning workflow?