Using O*NET to explore and understand data science tasks
Much has been said and written about data science roles and the growth of the data science group of occupations1. The work of data science starts with the task of handling data in some way2. So, one place to start is to look in detail at the range of data tasks which exist at work. Using the O*NET occupational database3, there are over nineteen thousand (19,695) work tasks of which 788 (4% of all tasks) are to do with the handling of data in some way. It is this group of 788 tasks which are used here.
These 788 tasks are undertaken across 343 different occupations (O*NET SOC Codes) of which 87 (25%) undertake three or more of the tasks. It is interesting to note that those occupations undertaking the greatest number of data related tasks fall into three main categories: health, geospatial, and general (two database roles plus statistician).
Table 1: Occupations undertaking the greatest number of data related tasks4Number of TasksOccupation15Clinical Data Managers Bioinformatics Technicians14Database Administrators Geographic Information Systems Technicians Data Warehousing Specialists13Database Architects12Remote Sensing Scientists and Technologists Remote Sensing Technicians11None10Statisticians Biostatisticians Geophysical Data Technicians9Geospatial Information Scientists and Technologists
Source: O*NET v24
Those occupations undertaking 3 or more data related tasks, 87 in all, undertake 450 data tasks i.e. 57% of the data tasks in the O*NET dataset. Plus, most of the data tasks are core to the role (83% or 655) with 17% or 133 tasks are supplemental.
When it comes to looking at the actual tasks, 107 different action verbs are used to describe them which can be analysed using Bloom’s Taxonomy5 to show the levels of the tasks. The level of direct matches is relatively low (34%) there is a reasonably equal distribution across all of the six levels of the taxonomy (Level 6: 10 matches; Level 5: 6; Level 4: 5; Level 3: 7; Level 2: 6; and Level 1: 2) which suggests that data tasks can be undertaken at multiple levels of complexity and capability.
When you look at the equivalent level of detail around data science roles in the UK, the Institute for Apprenticeships and Technical Education are seeking to offer standards across 4 different levels, and these are detailed below.Occupation Standard TitleLevelStatusData Scientist (integrated degree)6ApprovedData Analyst4ApprovedData Technician3In developmentArtificial Intelligence Data Specialist7In developmentGeospatial Mapping and Science Specialist (degree)6ApprovedGeospatial Survey Technician3ApprovedBioinformation Scientist7ApprovedIntelligence Analyst4Approved
What emerges from these various listing of occupations is a series of groupings of key data roles, and these range between 47 and 88 (6 were identified as well10) and which possess the following types of skills found in the changing world of research information scientists and librarians (data gatherers, custodians and providers in the pre-digital age)9.Groups of Data Science RolesO’Reilly Strata (2013)7The Royal Society (2019)10Tech Partnership (2014)8Data business people Data creatives Data developers Data researchersData Scientist and Advanced Analysts Data Analysts Data Systems Developers Analytics Managers Functional Analysts Data-Driven Decision MakersBig data developer Big data architect Big data analyst Big data administrator Big data consultant Big data Project Manager Big data designer Big data scientistInformation Scientist and LibrariansSkillsEssential NowEssential in 2-5 yearsAbility to advise on preserving research output10%49%Knowledge to advise on data management and curation, including ingest, discovery, access, dissemination, preservation, and portability16%48%Knowledge to support researchers in complying with the various mandates of funders, including open access requirements16%40%Knowledge to advise on potential data manipulation tools used in the discipline/subject7%34%Knowledge to advise on data mining3%33%Knowledge to advocate, and advise on, the use of metadata10%29%Ability to advise on the preservation of project records3%24%Knowledge of sources of research funding to assist researchers to identify potential funders8%21%Skills to develop metadata schema, and advise on discipline/subject standards and practices, for individual research projects2%16%
One conclusion to draw is that the handling, use and management of data is becoming increasingly widespread across occupations, and the standards (definitions) being developed for specific data dense occupations are, at the element level, useful for many others. While the emphasis as regards data science is very much driven by digital developments, there are still a significant number of data driven roles from the pre-digital era.
The Royal Society (2019) Dynamics of Data Science: how can all sectors benefit from data science talent? The Royal Society, London. 104 pages; Ismail, N. A. and Abidin, W. Z. (2016) “Data scientist skills”, IOSR Journal of Mobile Computing and Application, 3 (4), 52-61
Edison (2016) Edison Data Science Framework: Part 1. Data Science Competence Framework Release 1. Initial output from the project, Education for Data Intensive Science to Open New Science frontiers. Grant Agreement Number: 675419
O*NET see https://www.onetonline.org
As you drop down the table and number of data tasks undertaken by an occupation holder; the following distribution is found:Number of Data TasksNumber of Occupations827462513419335
Anderson, L.W. and Krathwohl, D.R. (2001) A taxonomy for learning and assessing. Abridged Edition. Allyn and Bacon, Boston.
Four are identified in the 2013 O’Reilly Strata Survey see: https://cdn.oreillystatic.com/oreilly/radarreport/0636920029014/Analyzing_the_Analyzers.pdf – Harris, H.H.; Murphy, S.P. and Vaisman, M. (2013) Analysing the Analysers. An introspective survey of data scientists and their work. O’Reilly Strata. The four roles are: data business people, data creatives, data developers, and data researchers
Eight are identified in the 2014 study, Big Data Analytics: Assessment of demand for labour and skills 2013-2020. Tech Partnership Publications. See: https://www.e-skills.com/Documents/Research/General/BigData_report_Nov14.pdf The eight roles are: big data developer, big data architect, big data analyst, big data administrator, big data consultant, big data project manager, big data designer, and data scientist.
Auckland, M. (2012) Re-skilling for research. London: RLUK. See: https://www.rluk.ac.uk/files/RLUK%20Re-skilling.pdf
See: The Royal Society (2019) op. cit.