A multinomial logistic regression model for text in Albanian language

Authors

  • Denisa Salillari Polytechnic University of Tirana, Sheshi “Nene Tereza”, nr.
  • Luela Prifti Polytechnic University of Tirana, Sheshi “Nene Tereza”,

DOI:

https://doi.org/10.24297/jam.v12i7.5486

Keywords:

Multinomial logistic regression, classification.

Abstract

In this paper we present a multinomial logistic regression model for authorship identification in the Albanian language texts. In the model fitted the dependent variable is categorical which takes different values from 1 to 10 for each of the author and the independent variables are number of words, number of letters, number of vowels, number of consonants, number of punctuations and number of sentences for each text. The model was applied with success in the set of ten authors, each of them being represented by a set of one hundred texts they authored. As results first, second and the third authors have the higher correct predicted percentage and the highest overall correct predicted probability taken was 0.738. As conclusion adding in the model number of consonants, number of punctuations and number of sentences as independent variables the overall correct predicted percentage is increased.

Downloads

Download data is not yet available.

Author Biographies

Denisa Salillari, Polytechnic University of Tirana, Sheshi “Nene Tereza”, nr.

Department of Mathematical Engineering

Luela Prifti, Polytechnic University of Tirana, Sheshi “Nene Tereza”,

Department of Mathematical Engineering

References

1. Alan Julian Izenman Modern Multivariate Statistical Techniques Regression, Classification,and Manifold Learning.
2. T. Zhang and F. Oles. Text categorization based on regularized linear classifiers. Information Retrieval, 4(1):5.31, April 2001.
3. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization., 2004
4. D. Salillari, L. Prifti, Sh. Kuka “Logistic regression for authorship attribution in albanian text ” Alb-shkenca Conference

Downloads

Published

2016-07-18

How to Cite

Salillari, D., & Prifti, L. (2016). A multinomial logistic regression model for text in Albanian language. JOURNAL OF ADVANCES IN MATHEMATICS, 12(7), 6407–6411. https://doi.org/10.24297/jam.v12i7.5486

Issue

Section

Articles