Technological advancement in the variety, volume, and even velocity of data has created a new business challenge: how to create actionable insights on diverse, large, and fluid data? Data science fills the gap and the financial sector is no stranger to the merits of data science. Data Science in Finance Data science, simply put, is a new discipline that harnesses the power of modern computing power and statistical concepts to create new insights on data that is diverse, extensive, and fast-accumulating. Not only does data science helpful in interpreting existing data but it also helps create more predictive models that could take advantage or market opportunities or even model market behavior. Machine learning makes this possible. As a subset of computer science and closely related to data science, machine learning is simply the ability of computational systems to recognize patterns and create observations without necessarily programming them to do so. Data science has profound impact on all facets of the financial sector. From asset pricing to risk management and compliance, data science allows financial sector professionals work with large and very fluid data sets. Data science allows data to be transformed into a more useful format that can be visualized to ease users of the data in creating insights from it. You might already ask yourself how did the discipline emerge. Before data science became mainstream in the financial sector and beyond, a community in the industry called quants became popular with their rigorous exercises on data extrapolation and interpretation. However, quants were notorious to use static analytical tools and data sets that didn’t really capture the dynamic nature of human behavior and markets. As computational and analytical tools became more welcome to dynamic data sets, quants slowly evolved to become data scientists in finance themselves. The Three Vs of Big Data Data science revolves around the presence of big data. Big data is simply the recent phenomenon of having large data sets that requires computational resources to be analyzed to reveal patterns, trends, and even causalities. Big data could be structured - stored in a fixed field within the same record/file or unstructured – data that does not have a central location or is not organized in a pre-defined manner. Aside from data being structured or unstructured, there are three defining elements of big data according to Gartner analyst Doug Laney in 2001: volume, variety, and velocity. In Laney’s research publication, 3D data management: controlling data volume, variety, and velocity, he referred to volume as to the amount of data to be processed. Volume is measured in familiar terms to us, be it in megabytes or in terabytes. On the other hand, variety was referred to the degree data variation. Is data based on a single format (.jpg or .xls) or is it multi-media (photo, video, audio, etc.)? Finally, velocity pertains to the speed of being able to process data. Is data being collected in real time, or is there a slight lag or is the program intended to collect data in periodic periods? IBM even added a fourth V in the three elements of big data. IBM argues that the veracity or uncertainty of data is also as important as the other three. Data veracity refers to the biases, noise, and abnormality of the data being stored and analyzed. The Skillset The requisite data science skillset is simply put a combination of a strong understanding of statistics, proficiency in programming (R, Pytheon, Stata, Tableau, etc), and finally strong domain knowledge (in finance if your pursuit of data science is for your career in finance). There is no single order on how one should focus on these three elements to be successful in data science. Nonetheless, proficiency in not at least three of these elements will be detrimental to your data science endeavors I must warn. Data science is conducted by collecting, analyzing, and communicating the analysis. Statistics plays an important role in collecting data. Statistics or even just at least the statistical mindset in data science exercises helps you design experiments albeit statistics falls short in refining your research questions. Moreover, confidence in your coding capabilities allows you to focus more on the data analysis exercise and experiment design than on what you want the programming packages to do. Since programs such as R and Stata are command line programs, coding also allows you to go back to any exercise and see what has exactly been done and replicate it. Additionally, domain knowledge which refers to your stock knowledge on the subject matter of the analytical exercise allows you to make good sense of the output of your experiment. If it’s in finance, it pertains to how strong you know your concepts and theories. Finally, the most under-rated skill in the discipline is also the most important skill to have to unlock its true value. Story-telling is the gel that keeps statistics, coding, and domain knowledge all together. Analysis is meaningless unless its conveyed effectively to its intended beneficiaries. Where in Finance? Data science is already a mainstream discipline in arenas such as risk management and compliance. With the power of machine learning and computing, risk managers are attempting to be ahead of unwarranted exposures, concentrations, and such. Furthermore, the increasing predictive power of data science allows for more inferences to be made by asset managers. Since data science is accustomed to dynamic models, data science feels like a strong fit for the financial market’s oscillations. You don’t need to be strong in these hard skills to even benefit professionally from data science. A mere mention of your interest in topics like Python, Matlab, Tableau, and such would be enough to make prospective employers and even HR management systems to pick up your resume.