DataSimp

DataSimp community public account. Sharing knowledge and news of data science or related fields.

06/11/2023

The Binomial Theorem allows to quickly expand expressions in the form (x+y)^n as it describes the algebraic expansion of powers of a binomial. 二项式定理允许以 (x+y)^n 形式快速展开表达式，因为它描述了二项式幂的代数展开。

Share & Translate: Chinou Gea (秦陇纪) @2023, SDIS-SM, IFS-AHSC. DSS-SDC, IFS-AHSC. DataScience Data Simplicity Community Facebook Group https://m.facebook.com/groups/290760182638656/

04/09/2023

Foundations and Fundamental Concepts of Mathematics 3rd Edition 470 pages athttps://www.dropbox.com/s/5qfmohscryjyvgu/Foundations%20and%20Fundamental%20Concepts%20of%20Mathematics.pdf
This third edition of a popular, well-received text offers undergraduates an opportunity to obtain an overview of the historical roots and the evolution of several areas of mathematics. The selection of topics conveys not only their role in this historical development of mathematics but also their value as bases for understanding the changing nature of mathematics. Among the topics covered in this wide-ranging text are: mathematics before Euclid, Euclid’s Elements, non-Euclidean geometry, algebraic structure, formal axiomatics, the real numbers system, sets, logic and philosophy and more. The emphasis on axiomatic procedures provides important background for studying and applying more advanced topics, while the inclusion of the historical roots of both algebra and geometry provides essential information for prospective teachers of school mathematics.
这本广受欢迎的第三版教科书为本科生提供一个机会，可以全面了解数学的多个领域的历史根源和演变。主题的选择不仅传达它们在数学历史发展中的作用，而且传达它们作为理解数学不断变化的本质的基础的价值。这本内容广泛的教材涵盖的主题包括：欧几里得之前的数学、欧几里得几何原本、非欧几何、代数结构、形式公理、实数系统、集合、逻辑和哲学等等。对公理过程的强调为研究和应用更高级的主题提供了重要的背景，而代数和几何的历史根源的包含为未来的学校数学教师提供了必要的信息。
Share & Translate: Chinou Gea (秦陇纪) @2023, IFS-AHSC.

02/03/2023

All the 7 types of SQL Joins, can convert to 5 types in essence: inner join, side join, empty side join, full outer join, and empty full outer join.

01/30/2023

Give chances to the everyone who born in the world with both human dignity and god dignity not living pressure only left.

As millions of people around the world flee their homes due to violence, persecution, war and disaster, we can all stand .
Whoever they are. Wherever they come from. Whenever they are forced to flee.

01/16/2023

The science-infographics of "The History of Math" from 77,000 B.C. to today. Credit to the author

01/10/2023

If you are really interested in machine learning in industry, 👉 Designing Machine Learning Systems (2022)👈 by Chip Huyen offers a holistic view. Each chapter can be expanded to 1+ books👍
1⃣ 👉 Overview Of Machine Learning Systems
Academic leaderboards (State-Of-The-Art or SOTA) do not benefit production.
Latency, throughput, data dynamics, fairness, interpretability are afterthoughts in research but critical in production.
2⃣ 👉 Introduction To Machine Learning Systems Design
Mind the gap between model and business metrics.
Developing an ml system is iterative and never ending just like traditional SWE.
3⃣ 👉 Data Engineering Fundamentals
data engineering, sql exploits relational structure or fixed schema. nosql let applications define schema.
OLTP and OLAP are outdated as the boundary is blurred.
Data can be passed via database, via microservices or via streaming services.
4⃣ 👉 Training Data
It covers Sampling, Labeling, Class Imbalance, Data Augmentation.
5⃣ 👉 Feature Engineering
Handling missing data for Missing not at random (MNAR) vs Missing at random (MAR).
Be cautious about the data leakage.
Engineer features not too specific and not too generic.
6⃣ 👉 Model Development And Offline Evaluation
When selecting a model, avoid the state-of-the-art trap and human biases, mind the performance now and in future, use ensembles, and track your experiments.
Monitor trends at ML conferences such as NeurIPS, ICLR, and ICML. Oh, there are also distributed training and AutoML.
Evaluate your model against a baseline and on different populations. Calibrate your model.
7⃣ 👉 Model Deployment And Prediction Service
Batch prediction using batch pipeline vs online prediction using streaming pipeline.
Compress the model for fast inference. Compiling and Optimizing Models for Edge Devices.
8⃣ 👉 Data Distribution Shifts And Monitoring
Google researchers have found 60 out of 96 failures were due to causes not directly related to ML.
Data Distribution Shifts includes covariant, label and concept shifts.
Monitoring is the act of tracking, and the observability is to set up the system such that it gives visibility into the system.
9⃣ 👉 Continual Learning And Test In Production
Continual learning means learning in batches or micro-batches.
“Online learning” makes one thinks of online education.
Continuous learning means your model continuously learns with each incoming sample and it could also mean continuous delivery of ML as in CI/CD.
Stateful training vs stateless retraining (no stateful retraining).
✳️Check the book for more. 👇
如果您真的对工业中的机器学习感兴趣，Chip Huyen 的 👉《设计机器学习系统 (2022)》👈 提供了一个整体视图。每章可以扩展到1+本书👍
1⃣️ 👉机器学习系统概述
学术排行榜(State-Of-The-Art 或 SOTA)对生产没有好处。
延迟、吞吐量、数据动态、公平性、可解释性在研究中是事后的想法，但在生产中却是至关重要的。
2⃣️ 👉机器学习系统设计简介
注意模型和业务指标之间的差距。
开发机器学习系统是迭代的，就像传统的SWE 一样永无止境。
3⃣️ 👉数据工程基础
sql利用关系结构或固定模式。 nosql让应用程序定义模式。
OLTP 和 OLAP 已经过时了，因为边界已经模糊了。
数据可以通过数据库、微服务或流媒体服务传递。
4⃣️ 👉训练数据
它涵盖了采样、标记、类不平衡、数据增强。
5⃣️ 👉特征工程
处理非随机缺失 (MNAR)与随机缺失 (MAR) 的缺失数据。
小心数据泄露。
工程师的特性不太具体也不太通用。
6⃣️ 👉模型开发及线下评估
选择模型时，避免最先进的陷阱和人为偏见，注意现在和将来的性能，使用集成并跟踪您的实验。
监控 ML会议的趋势，例如 NeurIPS、ICLR 和 ICML。哦，还有分布式训练和AutoML。
根据基线和不同人群评估您的模型。校准您的模型。
7⃣️ 👉模型部署与预测服务
使用批处理管道的批量预测与使用流处理管道的在线预测。
压缩模型以进行快速推理。为边缘设备编译和优化模型。
8⃣️ 👉数据分布转移和监控
谷歌研究人员发现 96 次失败中有 60 次是由于与 ML 没有直接关系的原因造成的。
数据分布转移包括协变、标签和概念转移。
监控是跟踪行为，可观察性是设置系统，使系统具有可见性。
9⃣ 👉持续学习和生产测试
持续学习意味着分批或微批学习。
“在线学习”让人联想到在线教育。
持续学习意味着您的模型不断学习每个传入的样本，也可能意味着像CI/CD中那样持续交付 ML。
有状态训练与无状态再训练(无状态再训练)。
✳️查看本书了解更多。 👇

01/06/2023

Encyclopedia of Distances
Distances, dissimilarities, and divergences, what else! 😀How it started... How it is going! If you want to study information technologies, especially data science and artificial intelligence, the mathematical methods especially informational geometry and one of its branches — the measurement of distances, are very important and foundational. This book Encyclopedia of Distances and the subject collection will bring you into the world of such fields of mathematics. website: https://franknielsen.github.io/Divergence/index.html

Address

Seattle
Seattle, WA
98125

Website

https://m.facebook.com/groups/290760182638656/

Alerts

Be the first to know and let us send you an email when DataSimp posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Shortcuts

Want your business to be the top-listed Computer & Electronics Service in Seattle?