Description:We are seeking a talented Senior Big Data Engineer to work on Qualys next-generation big data challenge. Working with a team of engineers and architects, you will be responsible for creation of Data Lakes by extracting and transforming data from various data sources. As a Senior Engineer you would be responsible for design, development, implementation, and support of Qualys first Data Lake and Data Pipeline framework. This is a great opportunity to build Qualys next generation data technology platform by leveraging open source technologies, and work on challenging and business-impacting projects.Responsibilities:
Design, develop and deploy of Big Data and NoSQL based full scale solutions that includes data acquisition, storage, transformation, security, data management and data analysis using Dockers, Kubernetes, AWS Elastic Services, Alluxio, Apache Kafka, Ignite, Spark, Presto, Drill, Hive, Cassandra, Postgres, Greenplum and other related technologies.Graph data processing, storing and creating visual representation of the underlying data. Ensure quality of master data in key systems, as well as, development and documentation of processes with other functional data owners to support ongoing maintenance and data integrity.Build large scale data architectures using Kafka/Spark/Flink/Cassandra in a hybrid environment.Data profiling and data analysis using emerging data technologies.Deploy data loaders to ingest data into Big Data Lakes.Troubleshoot production issues with Hadoop/Spark/Flink/NoSQL.
BS/MS or equivalent with 8+ years of working experience with data management with 5+ years in the Big Data ecosystem. Must have experience in architecting data at scale for Hadoop/Spark/NoSQL ecosystem of data stores to support different business consumption patterns off a centralized data platform.Hands on experience in Spark/MapReduce/ETL data processing, including Java, Python, Scala, SQL; for data analysis of production Big Data applications.Experience in populating and leveraging OLAP tools and SQL on Hadoop platforms (Presto, Hive, Greenplum, SparkSQL).Hands-on experience one or more of the following - GraphQL, Gremlin, Tinkerpop, Neo4J, JanusGraph, GraphX.Integration of streaming architectures in Big Data implementations using message queue like Kafka, RabbitMQ and/or related cloud offerings.Data modeling for data intensive application from core star schema and dimensional modelling to contemporary NoSQL and denormalized data lake architectures.Designing and implementing relational data models working with RDBMS and understanding of the challenges in these environment.