Please use this identifier to cite or link to this item: http://hdl.handle.net/10316/82847
DC FieldValueLanguage
dc.contributor.advisorRibeiro, Bernardete Martins-
dc.contributor.advisorTeixeira, César Alexandre Domingues-
dc.contributor.authorSilva, José Miguel Parreira e-
dc.date.accessioned2018-12-22T18:38:04Z-
dc.date.available2018-12-22T18:38:04Z-
dc.date.issued2017-07-14-
dc.date.submitted2019-01-22-
dc.identifier.urihttp://hdl.handle.net/10316/82847-
dc.descriptionDissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia-
dc.description.abstractBig Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby raising two of the most important problems when handling large datasets: sample and feature selection. This work addresses the sampling problem and presents a heuristic method to find the “critical sampling” of big datasets. The concept of the critical sampling size of a dataset is defined as the minimum number of examples that are required for a given data analytic task to achieve a satisfactory performance. The problem is very important in data mining, since the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the optimal solution for the Critical Sampling Size problem is intractable, in this dissertation a heuristic method is tested, in order to infer its capability to find practical solutions. Results have shown an apparent Critical Sampling Size for all the tested datasets, which is rather smaller than the their original sizes. Further, the proposed heuristic method shows a promising utility, providing a practical solution to find a useful critical sample for data mining tasks.por
dc.description.abstractBig Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby raising two of the most important problems when handling large datasets: sample and feature selection. This work addresses the sampling problem and presents a heuristic method to find the “critical sampling” of big datasets. The concept of the critical sampling size of a dataset is defined as the minimum number of examples that are required for a given data analytic task to achieve a satisfactory performance. The problem is very important in data mining, since the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the optimal solution for the Critical Sampling Size problem is intractable, in this dissertation a heuristic method is tested, in order to infer its capability to find practical solutions. Results have shown an apparent Critical Sampling Size for all the tested datasets, which is rather smaller than the their original sizes. Further, the proposed heuristic method shows a promising utility, providing a practical solution to find a useful critical sample for data mining tasks.eng
dc.language.isoeng-
dc.rightsopenAccess-
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/-
dc.subjectBig Datapor
dc.subjectCritical Samplepor
dc.subjectData Miningpor
dc.subjectBig Dataeng
dc.subjectCritical Sampleeng
dc.subjectData Miningeng
dc.titleFinding the Critical Feature Dimension of Big Datasetseng
dc.title.alternativeProcura do Tamanho Crítico de Amostragem de Grandes Conjuntos de Dadospor
dc.typemasterThesis-
degois.publication.locationDEI-FCTUC-
degois.publication.titleFinding the Critical Feature Dimension of Big Datasetseng
dc.peerreviewedyes-
dc.identifier.tid202124010-
thesis.degree.disciplineInformática-
thesis.degree.grantorUniversidade de Coimbra-
thesis.degree.level1-
thesis.degree.nameMestrado em Engenharia Informática-
uc.degree.grantorUnitFaculdade de Ciências e Tecnologia - Departamento de Engenharia Informática-
uc.degree.grantorID0500-
uc.contributor.authorSilva, José Miguel Parreira e::0000-0003-0284-4429-
uc.degree.classification18-
uc.degree.presidentejuriCorreia, António Dourado Pereira-
uc.degree.elementojuriRibeiro, Bernardete Martins-
uc.degree.elementojuriFonseca, Carlos Manuel Mira da-
uc.contributor.advisorRibeiro, Bernardete Martins::0000-0002-9770-7672-
uc.contributor.advisorTeixeira, César Alexandre Domingues::0000-0001-9396-1211-
uc.controloAutoridadeSim-
item.fulltextCom Texto completo-
item.languageiso639-1en-
item.grantfulltextopen-
crisitem.advisor.deptFaculdade de Ciências e Tecnologia, Universidade de Coimbra-
crisitem.advisor.deptFaculdade de Ciências e Tecnologia, Universidade de Coimbra-
crisitem.advisor.parentdeptUniversidade de Coimbra-
crisitem.advisor.parentdeptUniversidade de Coimbra-
crisitem.advisor.researchunitCENTRE FOR INFORMATICS AND SYSTEMS OF THE UNIVERSITY OF COIMBRA-
crisitem.advisor.researchunitCENTRE FOR INFORMATICS AND SYSTEMS OF THE UNIVERSITY OF COIMBRA-
crisitem.advisor.orcid0000-0002-9770-7672-
crisitem.advisor.orcid0000-0001-9396-1211-
Appears in Collections:UC - Dissertações de Mestrado
Files in This Item:
File Description SizeFormat
dissertation.pdf2.09 MBAdobe PDFView/Open
Show simple item record

Page view(s) 50

434
checked on Aug 11, 2020

Download(s) 50

400
checked on Aug 11, 2020

Google ScholarTM

Check


This item is licensed under a Creative Commons License Creative Commons