Evaluating GPT-OSS-20B Model for Hate Speech Detection: Advances in Parameter-Efficient Adaptation
Abstract
Hate speech detection continues to pose methodological challenges due to annotation ambiguity, class imbalance and the fine grained distinction be-tween offensive and hateful expressions. This work examines a parameter ef-ficient adaptation of a 20-billion-parameter large language model for three class hate speech classification. The approach consolidates annotator deci-sions into a single label per instance, applies balanced sampling to reduce minority class sparsity, and incorporates instruction templates with agree-ment based metadata to stabilise predictions in borderline cases. The adapted model is evaluated against transformer encoder baselines and prompted Large language models (LLMs) configurations. The results show that the proposed system attains a macro F1-score of 80.66% and an accuracy of 83.37%, outperforming all comparative baselines, with particularly strong gains in the Hate Speech category. An additional analysis of computational usage indicates that the adaptation procedure operates within moderate re-source constraints. These findings indicate that lightweight parameter effi-cient adaptation offers a viable solution for fine grained hate speech classifi-cation when full finetuning of LLMs is impractical.
