Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.11889/4925
Title: | Intelligent data extraction system using regular expressions | Authors: | Hashesh, Ala' Alkhamra, Othman Salameh, Ahmad Sayyad, Abdel Salam |
Keywords: | Genetic programming (Computer science);Text processing (Computer science);Programming languages (Electronic computers);Electronic data processing;Text editors (Computer programs) | Issue Date: | Apr-2017 | Abstract: | Data is everywhere, but to extract specific information from huge data could be an exhausting process. However, there are many concepts introduced in computer science can be used to make this problem simpler, such as regular expressions. But, generating a regular expression capable of extracting a predefined string from a text is not an everyday task. In this research, Regular Expression are generated using Genetic Programming. The validity and correctness of a regular expression is decided by making it extract a set of positive examples and ignore another set of negative examples. We validate this method with three datasets related to IPv4 address extraction, article title extraction, and HTML Header extraction. The resulting regular expressions achieved very good accuracy of extraction for the given tasks. | URI: | http://hdl.handle.net/20.500.11889/4925 |
Appears in Collections: | Fulltext Publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
revised_regex_paper.pdf | 853.76 kB | Adobe PDF | View/Open |
Page view(s)
238
Last Week
0
0
Last month
7
7
checked on Apr 14, 2024
Download(s)
123
checked on Apr 14, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.