You are viewing the site in preview mode

Skip to main content

Table 5 List of handcrafted features used for each task

From: Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning

Feature

Explanation

SR

MER

KE

token

current token

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

token_before

previous token

\(\checkmark\)

\(\checkmark\)

\(\times\)

token_after

next token

\(\checkmark\)

\(\checkmark\)

\(\times\)

is_digit

whether current token is digit

\(\checkmark\)

\(\times\)

\(\times\)

is_begin

whether current token is the beginning of input text

\(\checkmark\)

\(\times\)

\(\times\)

is_end

whether current token is the end of input text

\(\checkmark\)

\(\times\)

\(\times\)

token_before_is_closure

whether previous token is one of the following characters:. (period),! (exclamation mark),? (question mark), or, (comma)

\(\checkmark\)

\(\times\)

\(\times\)

token_length

length of current token

\(\times\)

\(\times\)

\(\checkmark\)

absolute_position

index of current token, starting from 0

\(\times\)

\(\times\)

\(\checkmark\)

relative_position

index of current token divided by total number of tokens in the input

\(\checkmark\)

\(\times\)

\(\times\)

site_code

code of the website from which the input text was obtained

\(\checkmark\)

\(\times\)

\(\times\)

pos_tag

part-of-speech tag for current token

\(\checkmark\)

\(\times\)

\(\checkmark\)

is_np

whether current token is part of a noun phrase

\(\times\)

\(\checkmark\)

\(\times\)

is_vp

whether current token is part of a verb phrase

\(\times\)

\(\checkmark\)

\(\times\)

is_stopword

whether current token is a stopword

\(\times\)

\(\checkmark\)

\(\times\)

is_abbreviation

whether current token is an abbreviation

\(\times\)

\(\times\)

\(\checkmark\)

abbreviation_inverse

full form of current token if it is an abbreviation

\(\checkmark\)

\(\times\)

\(\times\)

is_disease

whether current token is in disease dictionary

\(\times\)

\(\checkmark\)

\(\times\)

is_symptom

whether current token is in symptom dictionary

\(\times\)

\(\checkmark\)

\(\times\)

is_treatment

whether current token is in treatment dictionary

\(\times\)

\(\checkmark\)

\(\times\)

in_medical_dict

whether current token is in one of the disease, drug, symptom, or treatment dictionaries

\(\times\)

\(\times\)

\(\checkmark\)

is_medical_entity

whether current token is part of a medical entity

\(\times\)

\(\times\)

\(\checkmark\)

max_lcs_disease

maximum ratio of longest common substring between current token and entries in disease dictionary

\(\checkmark\)

\(\times\)

\(\times\)

max_lcs_symptom

maximum ratio of longest common substring between current token and entries in symptom dictionary

\(\checkmark\)

\(\times\)

\(\times\)

max_lcs_treatment

maximum ratio of longest common substring between current token and entries in treatment dictionary

\(\checkmark\)

\(\times\)

\(\times\)

stickiness

stickiness value between cureent token with its preceding and succeeding tokens, computed using pointwise mutual information (PMI)

\(\times\)

\(\times\)

\(\checkmark\)