Apply data mining techniques and conduct statistical analysis to large, structured and unstructured data sets to understand and analyse phenomena. Model complex business problems, discovering insights and opportunities through statistical, algorithmic, machine learning and visualisation techniques, working closely with clients, data and technology teams to turn data into critical information used to make sound business decisions. Execute intelligent automation and predictive modelling.
Essential Function
Acts as a subject matter expert from a data science perspective and provides input into all decisions relating to data science and the use thereof. Educate the organisation on data science perspectives on new approaches, such as testing hypotheses and statistical validation of results. Ensure ongoing knowledge of industry standards as well as best practice and identify gaps between these definitions/data elements and organisation data elements/definitions.
Builds machine learning models from and utilises distributed data processing and analysis methodologies. Competent in Machine Learning programming in R or Python, with supplementary still in Matlab, Java, etc. Familiar with the Hadoop distributed computational platform, including broader ecosystem of tools such as HDFS / Spark / Kafka.
Codes, tests and maintains scientific models and algorithms; identifies trends, patterns, and discrepancies in data; and determines additional data needed to support insight. Processes, cleanses, and verifies the integrity of data used for analysis.
Creates, maintains and optimises modelling solutions that enable the forecast of quality data outcomes. Ensures that volumetric predictions are modelled so that resource requirements are optimally considered. Develops and maintains optimal evaluation techniques to ensure that modelled outcomes are rigorous and creates model performance tracking. Drives sustainable and effective modelling solutions.
Designs various mathematical, statistical, and simulation techniques to typically large and unstructured data sets in order to answer critical business questions and create predictive solutions which drive improvement in business outcomes. Drives analytics and insights across the organisation by developing advanced statistical models and computational algorithms based on business initiatives.
Develops, implements, monitors and maintains a comprehensive operational IA plan, rules, methodologies and coding initiatives in order to drive IA for remediation efforts. Develops and co-ordinates a comprehensive strategy for productionalising automation software so that it is accurate and well maintained.
Directs the gathering of data for use in Data Science models, ensuring that chosen datasets best reflect the organisations goals. Performs data pre-processing including data manipulation, transformation, normalisation, standardisation, visualisation and derivation of new variables/features. Utilises advanced data analytics and mining techniques to analyse data, assessing data validity and usability; reviews data results to ensure accuracy; and communicates results and insights to stakeholders.
Ensure business integration through integrating model outputs into end-point production systems, where requirements must be understood and adopted relating to data collection, integration and retention requirements incorporating business requirements and knowledge of best practices.
Liaise and collaborate with the Data Science Guild, providing support to the entire department for its data centric needs. Collaborate with subject matter experts to select the relevant sources of information and translates the business requirements into data mining/science outcomes. Presents findings and observations to team for development of recommendations.
Mines data using state-of-the-art methods. Enhances data collection procedures to include information that is relevant for building data models.
Provides input into Data management and modelling infrastructure requirements and adheres to the organisations infrastructure development processes, including the management of User Acceptance Testing (UAT). Conducts regression testing across all relevant systems as required.
Use data profiling and visualisation techniques using tools to understand and explain data characteristics that will inform modelling approaches. Communicate data information to business with various skill levels and in various roles, presenting trends, correlations and patterns found in complicated datasets in a manner that clearly and concisely conveys meaningful insights and defend recommendations.