Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/4925
Title: Intelligent data extraction system using regular expressions
Authors: Hashesh, Ala'
Alkhamra, Othman
Salameh, Ahmad
Sayyad, Abdel Salam
Keywords: Genetic programming (Computer science)
Text processing (Computer science)
Programming languages (Electronic computers)
Electronic data processing
Text editors (Computer programs)
Issue Date: Apr-2017
Abstract: Data is everywhere, but to extract specific information from huge data could be an exhausting process. However, there are many concepts introduced in computer science can be used to make this problem simpler, such as regular expressions. But, generating a regular expression capable of extracting a predefined string from a text is not an everyday task. In this research, Regular Expression are generated using Genetic Programming. The validity and correctness of a regular expression is decided by making it extract a set of positive examples and ignore another set of negative examples. We validate this method with three datasets related to IPv4 address extraction, article title extraction, and HTML Header extraction. The resulting regular expressions achieved very good accuracy of extraction for the given tasks.
URI: http://hdl.handle.net/20.500.11889/4925
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat 
revised_regex_paper.pdf853.76 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.