Description
We are looking to expand our Product & Engineering team with a data engineer who has experience working with big data tools and infrastructure. At EDITED, data is our most valuable asset, we currently hold information on over 150 million products, including daily pricing, descriptions and images.
As a data engineer, you’ll be working closely with both Data Scientists and Software Engineers to ensure our data is safely and efficiently processed, structured and stored. You would also be involved in helping us deploy statistical and machine learning models into the production pipeline, and building tooling for model training, anomaly detection, quality assurance, etc
You should have experience working with large datasets within a production environment and be a comfortable with the tooling to manage them. For example, you might have built an entire ETL or have been responsible for building tools to monitor its resource usage and availability. Ideally you would also have a keen interest in the concepts of data science that go hand in hand with big data modelling in order to understand how they can be efficiently included in the pipeline. This is an actual engineering role, where you would be developing a product alongside other developers and data scientists, not a report building pipeline.
This will be a pivotal new role for both the Product & Engineering team and company as we scale.
Required:
- 2+ years working with big data infrastructure
- Experience with Python/Java/C
- Good grasp of basic data science concepts
- Understanding of numerical computing and data science libraries internals. For example: numpy, scipy, scikit-learn, Tensorflow, etc.
- Experience with some of the technologies we use would be great, but not essential. For example: Apache Storm, Elasticsearch, Riak, Redis, Galera, S3, streamparse