Towards Dataology and Data Science
The First International Workshop on Dataology and Data Science(IWDS 2010) was held at Fudan University,host by the Research Center for Dataology and Datascience,School of Computer Science,Fudan University,Shanghai,China on June 22-23,2010,More than 30 well-known scholars from universities and academic institutes of the United States,Canada,Japan,Australia and China,participated in the workshop.Some key issue on Dataology and Data Science were discussed at the workshop,including definitions and concepts,fudamental theories,innovative methods and research topics.
It was recongnized by the workshop that data is playing a more important role in the human being society in the big data era,and it is necessary to undastand data from new angles,with increasing challenges and opportunities showing up.it was also agreed that it is very significant and timing to propose a new science Dataology and Data Science,and suggested that theories and architecture should be built up and IWDS should continue its journey in the future.
1.Challenges by Data
IWDS2010 discussed some challenges in the big data era,and hereby we list them as follows.
1.1 Truth in Data
How do we know the data is telling the truth or giving false information when you hold them at hand?How do we deal with a dataset consisting of some false data?If some false informative data is mixed with true information,how do we measure the confidence level of a dataset?For example,if some products reviews are given by users not having used the products,or even by competitor,those reviews may be not credible.therefore,the analysis result(e.g.,credit rating) based on a dataset including such data would not be credible either.
These are critical challenges in data research area and will be the first step of research in Dataology and Data Science.With social networks such as Facebook and Blog expanding,the challenges are getting much more severe than ever.
1.2 Survival problem in the cyberspace
Cyberspace would become a part of the living space of human being,and in that case human being would live in both the physical space and cyberspace.How do we survive in the cyberspace?For example,one of the basic survival problems,how can we communicate with each other in the syberspace?This may become one of the most difficult problems
in the future data-related research,since there would be a problems of the commuincation context.In fact,it has already existed languages for communication online among the teenagers,can be regarded a communication mode in the cyberspace. it is very difficult to understand for others as it adopts words from vairous languages(such as English,Chinese,and Japanese etc.)and mixes them all together in one sentence.
1.3 Scientific Research with Data
Since data is actually the represnetation/mapping of the real natur in the cyberspace,the former is used to discover rules of the latter.the discovery and exploration of phenomena and rules in the data nature can support the discovery of phenomena and rules in the real nature.Therefore,developing the methods in the data nature to explore the rules in real nature would be a potential research field and then be helpful for scientific research.
1.4 Knowledge Acquisition from Data
We used to focus on how to improve the performance and capabilities of computing in the early stages of computer science history.However,nowadays a more important problem is how to acquire valuble knowledge from mass increasing data instead of the computer not being powerful enough,since a huge amout of data(e.g.,data from both natual science and social science)have been and are still being accumulated.and the questions include:How can we find useful data from the cyberspace? How can we get knowledge from data? Those require us to understand and process data from new angel.
2.What is Dataology and Data Science?
The participants in the IWDS2010 discussed a few aspects of Dataology and Data Science,including the definitions and scope of Dataology and Data Science,and its boundary with other areas.
2.1 Definitions of Dataology and Data Science
Dataology and Data Science is an umbrella of theories,methods and technologies for studying data nature.It is a new science with data in the cyberspace being its research object.Dataology and Data Science engages in identifying data types,data status,data properties,patterns of data transformation,and mechanisms of data evolution,
providing supports for discovering the laws of the real nature and human beings'behaviors.However,as a new science,we should further clearly and precisely define Dataology and Data Science,clarify the boudary between Dataology and Data Science and other ralated areas,and address fundamental issues.
2.2 Differencess from other technologies
There have already been some methods and techniques in Dataology and Data Science including data acquisition,data storage and management,data safety,data analysis,and data visualization.However,Dataology and Data Science is different from any traditional methods.It overlaps with many areas,such as data mining,Information Retrieval,Data Integration and Artificial Intelligence,but they are still different.
Dataology and Data Science requires fundamental theory and new techniques,for instance,existence of data,measurement of data,time of data,data algebra,data similarity
and theory of cluster,data classification and data cyclopedia,data camouflage and data perception,data experiment,data awareness,and so on.
Dataology and Data Science will also improve current research methods to form new scientific research methods and develop specific therories,technologies and methods in various fields to form domain dataology,including behavior dataology,biological dataology,brain dataology,meteorological dataology,financial dataology,and geographical
2.3 Differences from other Science
Data is the formal represent of the real nature in computer system;Information is the phenomena of the nature,society and thinking activities;and knowledge is experience from practice.Data can be regarded as symbols and representations of information and kownledge,however,it should not be equivalent to information and knowledge.The research object,goal and methods of Dataology and Data Science are essentially different from those of Computer Science,Information Science and Knowledge Science.
On one hand,Dataology and Data Science supports natual science and social science.On the other hand,more and more scientific research will be directly targeting at data instead of the real nature with development of Dataology and Data Science,which will then promote human to reconginze data facilitate them to explore the nature and human behaviors.
3.What are the main research issues?
IWDS2010 disscussed about main research issues of Dataology and Data science as well.
Observing and logical reasoning are the basis of scientific research.In Dataology and Data science,we should focus on observation methods in the data nature and data reasoning,as well as the fundamental theories and technologies including existence of data,measurement of data,time of data,data algebra,data similarity and cluster theory,and data classification and data cyclopedia.In addition,we should emphasize on how to identify truth in data,how to support other scientific research,and how to acquire valuable knowlege from data.
The scholars are all willing to participant and promote Dataology and Data Science actively.The addressed that the monograph Dataology and Data Science written by Yangyong Zhu and Yun Xiong has been a good start.Meanwhile,the participants strongly agreed that all of us should spend more time and effort to explore fundamental theories and innovative technologies of Dataology and Data Science and built up more and wider communications and cooperation among various disciplines and different backgrouds,because there still are many problems to be solve and more problems might arise with our endeavor.That is never a short-term plan,but would be a task lasting haif a century or even more.It was agreed that we should:
Engage in developing Dataology and Data Science as a new science and let it show its potential;
Clarify and improve the definition(including context and boundary)on Dataology and Data Science；
Explore the differences and relationships between Dataology and Data Science and other related areas;
Build uo the theories of Dataology and Data Science;
Define research and illustrate topics,themes,directions and key issues of Dataology and Data Science;
Explore the methodology of Dataology and Data Science;
Develop Dataology and Data Science combined with domain knowledge(e.g.,Bioinformatics,Social Network);
Construct more research institutes and centers for Dataology and Data Science;
Hold a workshop once per year and organize related international conferences on Dataology and Data Science;
Incorporate people from other related backgrouds(e.g.,mathematics,statistics,physical sciences,neuroscience,systems theory);
Train graduate students and provide student exchanging chances;
Seek cooperation between colleges and enterprises and apply for funding jointly;
Establish an open international research platform;
Publish proceedings for the workshops and an international refereed journal on Dataology and Data Science.
The scholars have made highly remarks on IWDS2010 and affirmed Dataology and Data Science.There was a unanimous agreement that it is a meaningful and promising research direction and will become a new science in the future.IWDS2010 provided an excellent platform for the researchers to keep on a deep and wide dissussion and promote the development of Dataology and Data Science.Following the successful workshop,the Second International Workshop on Dataology and Data Science will be held in Beijing on May 29-30,2011,hosted by the Research Center on Fictition Economy and Data Science,Chinese Academy of Sciences and the Reserach Center for Dataology and DataScience,School of Computer Science,Fudan Unversity.