Getting Started in Datascience

There's no denying that 'data scientist' is a hot job title to have right now, and for good reason. It's a tremendously fun and challenging field to be in, and despite all of the often underserved hoopla that surrounds it, data scientists are doing some pretty amazing things. So it's no surprise that many people are clamoring to find out how to become data scientists. We think that being a good data scientist just means knowing some programming languages and some algorithms. To be fair, there are many well-established algorithms with many great and increasingly user-friendly implementations in many programming languages. It's incredibly easy to estimate a linear or logistic regression. One of the primary things that separates a data scientist from someone just building models is the ability to think carefully about things like endogeneity, causal inference, and experimental and quasi-experimental design. Data scientists must understand and think about things like data generating processes and reason through how misspecifying them could influence or undermine the inferences they draw from their analyses.

Math There is no getting around it. You simply must study math and statistics.If you have never taken calculus and can only take one math course right now, you should also have taken a basic multivariate calculus course if you want to read research papers that implement new algorithm, and some knowledge of linear regression..

Statistics. The vast majority of datascientist job revolves around statistical inference. As mentioned above, linear regressions are incredibly simple to estimate, yet there are some core assumptions that, if not met, can render your results sketchy at best and completely invalid at worst. Training in statistics will teach you to know these assumptions, understand what happens when they're not met, and what to do about it. In fact, training in statistics usually takes the path of "here's some simple linear models" in a course or two. Then nearly every course following that tries to figure out how to estimate models that violate the assumptions of linear models, but in different ways: autocorrelation in time series data, non-independent observations due to time or spatial clustering, dependent variables that are counts with lots of zeros, and so on.

Experiments and causal inference. You should also be well-versed in thinking about research design. If you're going to be in charge of your company's split tests and experiments, you'll want to master this stuff. Judea Pearl's Causality is probably the most well-known and referenced work, but it's not for beginners. You could do really well for yourself by starting with a basic research methods textbook, especially from the social sciences as they're often concerned about doing experiments when you're not in a laboratory setting. Designing Social Inquiry, referred to by many as "KKV", is a really good starting point for some of this. Machine learning. Machine learning and statistics have significant overlap, but while statistics is often concerned with precise and unbiased estimates of parameters, machine learning is usually focused on making accurate predictions on unseen data. The barriers to entry in reading about machine learning are significantly higher than many other topics, as machine learning is an applied subfield of computer science. Since the math requirements for CS majors are often non-trivial, a working knowledge of multivariate calculus is often assumed. In fact, one of the fundamental estimation techniques used in many machine learning algorithms, stochastic gradient descent, assumes you know what a gradient is.

Software. R vs. Python. Julia. Scala. Clojure. Java. C! There are so many languages out there. Should you learn to program? Yes. Do you need to be a master of a language out of the gate? No. In fact, you can do a tremendous amount in Excel, though I wouldn't recommend it. It doesn't matter what language you learn first. I'll repeat that for emphasis and dramatic Fight Club effect. It doesn't matter what language you learn first. Pick a language and learn it. Write bad code that breaks. Just learn it. Any language can do all of the things that you'll need as a beginner. By the time you figure out what your language is bad at or can't do, you'll already know enough about programming and the languages that you'll know which language you need to learn next to solve your problem. That being said, do I think it's a GOOD idea to pick Javascript or C++ as a first language to do interactive data analysis? No. R and Python are popular for a reason. Programmers and data scientists are a fad-driven bunch, and new programming languages become en vogue and disappear quickly.

Finally, once you're well on your way to become an expert in all of these different areas, you'll want to get a job. DANGER! You need to be very careful about finding a job as a data scientist. The same buzz and hype that probably got your attention is getting the attention of recruiters and hiring managers everywhere. "WE NEED A DATA SCIENTIST!" is ringing out from human resources departments across the world. But you need to find an organization that can see beyond the hype, understands what is and isn't possible for a data scientist to do, and will value your input as well as your caution. Beware job listings that read like they're copied and pasted from Hacker News. I hope you've found this helpful.

Recommended

Tasks Performed by an Operating System

More recently, operating systems have started to pop up in smaller computers as well. If you like to tinker with electronic devices, you're probably pleased that operating systems can now be found on many of the devices we use every day, from cell phones to wireless access points.

How to crack A technical Interview

Getting a job after graduation or during bachelor’s Degree is sometimes proves to be a nightmare for a fresher. Most of the students, most of the time clears all the rounds but fails to clear the final round ie; The Technical Interview, which becomes a hurdle for them, In this article we will Meet the final hurdle between you and getting a job. The technical interview.

10 Intresting Facts about Machine Learning

Machine learning is about creating algorithms and systems that can learn from the data they process and analyze. The more data is processed, the better the algorithm will become. It is actually a science of getting computers to act without explicitly being programmed and is a branch of Artificial Intelligence (AI).