Entity Based Query Processing For Retrieval And Summarization In Biomedical Domain
Abstract
Exponential growth of biomedical literature poses different challenges in searching. To address complex information needs of the users, rigorous semantic processing of biomedical text is required. Biomedical information access emerges out as a new discipline for this reason. Traditional information access methods of matching, ranking, entity processing, entity-entity relationship processing, etc. are challenged in this domain. These are the major building blocks used to frame queries that represent complex information need in the area of biomedical and clinical information access. This thesis aims to do query processing using different IR and bioNLP techniques and to study their effects in retrieval and summarization. Various techniques of biomedical query reformulations are carried out and compared for biomedical document retrieval. Query expansion is one query reformulation technique which was carried out using relevance feedback and pseudo relevance feedback for biomedical document retrieval. Relevance feedback approach uses information regarding actual relevant documents to the query for feedback while pseudo relevance feedback approach does not have such information and uses top retrieved documents for feedback as they are assumed to be relevant to the query. One combined approach of relevance feedback and pseudo relevance feedback has been proposed which is based on feedback documentdiscovery and uses various classification and clustering techniques on biomedical documents to identify good document for feedback. This approach uses relevance feedback for a number of documents and tries to learn relevance for other documents for feedback. This feedback document discovery based query expansion approach shows improvement over relevance feedback based query expansion technique for biomedical document retrieval. An improved version of this feedback document discovery based query expansion approach where the features of entities are weighted based on the type of the entities and query is also proposed which shows improvement of the document retrieval system over the previous one without feature weighting. Automatic query expansion techniques based on feedback relies on two feedbacksources: feedback documents selection and feedback terms selection. In biomedical domain, medical entities are more meaningful than surface words. Therefore the entity based processing is necessary for any application in this domain. This thesis also includes a survey on advances in biomedical entity identification which includes biomedical entity identification process, various community identified challenges in the area, various resources available, approaches for biomedical entity identification and comparison of various techniques proposed in the literature for biomedical entity identification. UMLS is one biomedical resource which brings together many health and biomedical vocabularies and standards. UMLS contains biomedical entities with categorization and their relations with semantic information. A novel query expansion technique which uses knowledge from UMLS for feedback term selection is proposed where the queries are expanded using biomedical entities. The proposed method considers UMLS entities from a query with their related entities identified by UMLS and constructs query specific graph of biomedical entities for term selection. This query reformulation approach shows improvement over pseudo relevance feedback and state-of-the-art UMLS based query reformulation approaches. The amount of information for clinicians and clinical researchers is growing exponentially. These documents are long and number of topical documents are more. To synthesize the documents, text summarization attempts to reduce text so that the users can quickly understand relevant source information. In the biomedical domain, various summarization techniques are developed in recent years. Text summarization may be useful to medical practitioners with their information and knowledge management tasks. In this work we focus on query focused biomedical text summarization where the summary should be related to the query. The entity-based processing is incorporated in the summarization process along with word-embedding based similarity. The aim of this work is to use query reformulation in the summarization and see how it affects the summaries, whether expanded queries help to get better summaries.
Collections
- PhD Theses [87]