Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Naufal, Tsaqif; Mahendra, Rahmad; Wicaksono, Alfan Farizki

doi:10.1186/s13326-025-00329-2

Journal of Biomedical Semantics

Table 5 List of handcrafted features used for each task

From: Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Feature	Explanation	SR	MER	KE
token	current token	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
token_before	previous token	\(\checkmark\)	\(\checkmark\)	\(\times\)
token_after	next token	\(\checkmark\)	\(\checkmark\)	\(\times\)
is_digit	whether current token is digit	\(\checkmark\)	\(\times\)	\(\times\)
is_begin	whether current token is the beginning of input text	\(\checkmark\)	\(\times\)	\(\times\)
is_end	whether current token is the end of input text	\(\checkmark\)	\(\times\)	\(\times\)
token_before_is_closure	whether previous token is one of the following characters:. (period),! (exclamation mark),? (question mark), or, (comma)	\(\checkmark\)	\(\times\)	\(\times\)
token_length	length of current token	\(\times\)	\(\times\)	\(\checkmark\)
absolute_position	index of current token, starting from 0	\(\times\)	\(\times\)	\(\checkmark\)
relative_position	index of current token divided by total number of tokens in the input	\(\checkmark\)	\(\times\)	\(\times\)
site_code	code of the website from which the input text was obtained	\(\checkmark\)	\(\times\)	\(\times\)
pos_tag	part-of-speech tag for current token	\(\checkmark\)	\(\times\)	\(\checkmark\)
is_np	whether current token is part of a noun phrase	\(\times\)	\(\checkmark\)	\(\times\)
is_vp	whether current token is part of a verb phrase	\(\times\)	\(\checkmark\)	\(\times\)
is_stopword	whether current token is a stopword	\(\times\)	\(\checkmark\)	\(\times\)
is_abbreviation	whether current token is an abbreviation	\(\times\)	\(\times\)	\(\checkmark\)
abbreviation_inverse	full form of current token if it is an abbreviation	\(\checkmark\)	\(\times\)	\(\times\)
is_disease	whether current token is in disease dictionary	\(\times\)	\(\checkmark\)	\(\times\)
is_symptom	whether current token is in symptom dictionary	\(\times\)	\(\checkmark\)	\(\times\)
is_treatment	whether current token is in treatment dictionary	\(\times\)	\(\checkmark\)	\(\times\)
in_medical_dict	whether current token is in one of the disease, drug, symptom, or treatment dictionaries	\(\times\)	\(\times\)	\(\checkmark\)
is_medical_entity	whether current token is part of a medical entity	\(\times\)	\(\times\)	\(\checkmark\)
max_lcs_disease	maximum ratio of longest common substring between current token and entries in disease dictionary	\(\checkmark\)	\(\times\)	\(\times\)
max_lcs_symptom	maximum ratio of longest common substring between current token and entries in symptom dictionary	\(\checkmark\)	\(\times\)	\(\times\)
max_lcs_treatment	maximum ratio of longest common substring between current token and entries in treatment dictionary	\(\checkmark\)	\(\times\)	\(\times\)
stickiness	stickiness value between cureent token with its preceding and succeeding tokens, computed using pointwise mutual information (PMI)	\(\times\)	\(\times\)	\(\checkmark\)

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com

You are viewing the site in preview mode

Journal of Biomedical Semantics

Contact us