GitHub Repository¶
In this part, we will talk about:
1. Types of projects should be put on GitHub
2. Common structure of a data analysis project
Criteria¶
Any project that demonstrate your passion and your ability as a Biostatistician/ Data Science
You can pin the important projects. When you do this, always keep in mind who is your target audience
1. Data analysis project My Final project for the course R Intermediate at NYU
2. Data visualization project The code to generate the Map of current recruiting cancer trials The map
3. Self learning/ Class notesMy notes on reading the book: An Introduction to Statistical Learning
4. Research project ENAR DataFest project
5. A website: The code for this website
6. A book/course: The code for Introduction to Data Science at Harvard
7. A python/R package: The source code for the package dplyr
Minimum Structure of a Data Analysis project on GitHub¶
Example: ENAR DataFest project
README.md¶
This is the viewer's first impression of your project. This should have:
- A brieft review
- The question you are trying to answer
- A summary of what you did
- A summary of your result
- Conctact information
Raw Data folder¶
A separate folder for your raw data. This if for anyone who try to understand your research to know about what did you start with
Folder for intermediate results of data processing¶
This serves as reference for people to replicate your results
Folder for your deliverables (tables/images)¶
And more¶
Please remember that this is the minimum. For a more complex project, you will need more complex structure for your project.
A good thing is that you do not have to create everything from scratch. That is the power of GitHub, somebody somewhere already created a good structure for you to just fork and use. Example