Well, seems my last blog about thing I'd like to do in NYC has been blown to bits by COVID. While in lockdown, I've been going through my reading list and curating some good content that I have come across over the years.
The list
I have looked to group by a couple of key areas:
- Management and process
- Processes and checklists
- Machine learning
- Metrics
- Text analysis
- SQL
- Web
- Data quality
Each specific article link starts with header, provides link, some tags, then a brief description of the content of the article.
It might be cool to perform a site update to have a separate section for reading list. It would also be cool to have the tags which I currently have set up for blogs could include these tags, and be sorted alphabetically.
Management and process
12 manager readmes
https://hackernoon.com/12-manager-readmes-from-silicon-valleys-top-tech-companies-26588a660afe
- management, communication, teaming
- Ideas on how to communicate to others how best to work together
Busy person patterns
https://hillside.net/plop/2006/Papers/Library/PLoP%20Busy%20Person%20Pattern%20v8.pdf
- time management, get stuff done
- Exploration of common strategies to address getting work done. Appendix especially useful.
The Guerrilla guide to interviewing
https://www.joelonsoftware.com/2006/10/25/the-guerrilla-guide-to-interviewing-version-30/
- management
- How to interview and hire talent
Yes, and...
https://tomcritchlow.com/2019/11/18/yes-and/
- consulting, presenting
- Leveraging lessons from improv acting to think on feet faster in the business realm
- Four detailed sections not read
Improve your social skills
https://www.improveyoursocialskills.com/foundations/where-are-you-going
- social skills, personal
- Writing and lessons to help you reflect on your social situation and goals
- Detailed sections not read
Why I keep a research blog
http://gregorygundersen.com/blog/2020/01/12/why-research-blog/
- writing, learning
- A PhD researchers reflection on writing as a valuable method of learning
Processes and checklists
The Joel test, 12 steps to better code
https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/
- process, software development
- Twelve key software development practices businesses should implement
Some items from my reliability list
http://rachelbythebay.com/w/2019/07/21/reliability/
- software development, process
- Rachel‘a site, like Joel’s mentioned above are goldmine blog posts. Some more considerations for building reliable software
Do nothing scripting
https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/
- process, efficiency
- An approach to partially automate repetitive tasks and reduce working memory requirement
Machine Learning
Rules of machine learning
http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
- software development, process, machine learning, data analysis
- Big guide from google about best practices for working with production machine learning pipelines
ML vs Econometrics y(x) vs betas
https://scholar.harvard.edu/files/sendhil/files/jep.31.2.87.pdf
- data analysis, statistics, machine learning
- Good overview of differences in approach for machine learning vs econometric analysis. Beats more important in econometrics and understanding data assumptions
Metrics
Optipedia
https://www.optimizely.com/optimization-glossary/
- metrics
- Great repository of different metric concepts and processes in web A/B testing
Why not accuracy?
https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
- metrics, data analytics
- Great answer to why we eschew accuracy for other measures
Why you should summarise your data with the geometric mean
https://medium.com/@JLMC/understanding-three-simple-statistics-for-data-visualizations-2619dbb3677a
- data analysis, metrics
- Discussion of using other methods to reduce your data into an average. Would appear to work well for highly imbalanced or skewed data
Text analysis
The absolute minimum software developers should know about Unicode
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
- software development, text data
- Working with text data and conversions as a relatively new programmer can be dangerous, similar to date and Timezone.
Unicode in python, demystified
http://farmdev.com/talks/unicode/
- software development, text data
- Similar to the above, good resource to understand Unicode and best practice
How to code and analyse verbatim text
https://measuringu.com/code-verbatim/
- data analysis, text data
- Decent beginners guide to processing basic features from text data to use in other analysis (such as predicting NPS)
Self supervised representation learning in NLP
https://amitness.com/2020/05/self-supervised-learning-nlp/
- text data, data analysis
- Overview of more advanced concepts in machine learning for text data in text pre-processing
SQL
PostgresSQL exercises
https://pgexercises.com/
- SQL, learning
- Good introduction exercises for Postgres SQL, can do it all in the browser
SQL murder mystery
https://mystery.knightlab.com/walkthrough.html
- SQL, learning
- Interactive game using SQL, can complete in browser or with downloaded database
Web
Why do we need flask, celery, redis?
https://news.ycombinator.com/item?id=22901856
- Software development
- Link and comment section is good. Explains the concepts of the three technologies and how the process is similar to ordering take out food
Data quality
Starting a data quality checklist
https://medium.com/@TWB_BI/starting-a-data-quality-checklist-2d500e97ab5c
- data analysis, data cleaning
- Another good checklist guide of what to look for and request when working with new data sources
Quartz guide to bad data
https://qz.com/572338/the-quartz-guide-to-bad-data/
- data analysis, data cleaning
- Reasonable checklist for typical problems that can arise in data sources and how to proceed