Send Close Add comments: (status displays here)
Got it!  This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.
Data science ideas: initial


1. Data science ideas: initial

2. Class project
A number of assignments will be for a individual or group project in data science.

You should find a problem for which you wish to apply data science techniques.

3. Scope
When I have an idea for a identified project I start with the following. I usually end up starting at the minimum way to get started, then add requirements as needed. YAGNI (You Ain't Gonna Need It) .

4. Software systems development
This concept is related to the Pareto principle (which is the 80-20 rule).

5. Phases
This page has ideas for data science projects, investigations, etc.

Any remarks in square brackets are from projects on which I have worked.

6. Geographic information systems

7. Bayesian classification
Any yes-no question where data is available.

8. Topic modeling
Documents (Customers), Vocabulary (Products), Words used in documents (Products bought by a customer).

9. Intellectual property forensic analysis

10. Dimensionality reduction

11. Time series data
Time series analysis involves data that has periodic cycles.

12. Data collection, analysis, and display

13. Decision trees and random forests
Grouping data in tree structure from most important/prevalent to least important/prevalent.

14. Natural language processing

15. Text processing

16. Regression

17. Clustering
Clustering is used to partition a set of data into groups.

18. Gaussian mixtures
Gaussian mixture models are used to infer multiple (normal) distributions in aggregate data.

19. Statistical distributions

20. Kernel density estimation
Kernel density estimation

21. Neural networks
Neural networks are intended to recognize patterns in a yes-no manner. [best buy for computer given competing data]

22. Manifold learning
Manifold learning provides a way to do NLDR (Non Linear Dimensionality Reduction) .

23. Support Vector Machines
A SVM (Support Vector Machine) is a mathematical way to do classification and regression.

24. Deep learning and tensor flow
Deep learning attempts to reduce the amount of "feature extraction" needed to analyze data.

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. https://tensorflow.org

25. Data sources

26. End of page

27. Multiple choice questions for this page

28. Acronyms and/or initialisms for this page