Changing Paradigms of Technical Skills for Data Engineers
This paper investigates the changing paradigms for technical skills that are needed by Data Engineers in 2018.
A decade ago, data engineers needed technical skills for Relational Database Management Systems (RDBMS), such as Oracle and Microsoft SQL Server. With the advent of Hadoop and NoSQL Databases in recent years, Data Engineers require new skills to support the large distributed datastores (Big Data) that currently exist. Job demand for Data Scientists and Data Engineers has increased over the last five years.
This research methodology leveraged the Pig programming language that used MapReduce software located on the Amazon Web Services (AWS) Cloud. Data was collected from 100 Indeed.com job advertisements during July of 2017 and then was uploaded to the AWS Cloud. Using MapReduce, phrases/words were counted and then sorted. The sorted phrase / word counts were then leveraged to create the list of the 20 top skills needed by a Data Engineer based on the job advertisements. This list was compared to the 20 top skills for a Data Engineer presented by Stitch that surveyed 6,500 Data Engineers in 2016.
This paper presents a list of the 20 top technical skills required by a Data Engineer.