测试机器学习系统:代码、数据和模型
## Intuition
鍦ㄦ湰璇句腑锛屽皢瀛︿範濡備綍娴嬭瘯浠g爜銆佹暟鎹拰妯″瀷锛屼互鏋勫缓鍙互鍙潬杩唬鐨勬満鍣ㄥ涔犵郴缁熴€傛祴璇曟槸纭繚鏌愪簺涓滆タ鎸夐鏈熷伐浣滅殑涓€绉嶆柟寮忋€傝婵€鍔卞湪寮€鍙戝懆鏈熶腑灏芥棭瀹炴柦娴嬭瘯骞跺彂鐜伴敊璇潵婧愶紝浠ヤ究鍙互闄嶄綆[涓嬫父鎴愭湰](https://assets.deepsource.io/39ed384/images/blog/cost-of-fixing-bugs/chart.jpg)鍜屾氮璐规椂闂淬€備竴鏃﹁璁′簡娴嬭瘯锛屽彲浠ュ湪姣忔鏇存敼鎴栨坊鍔犲埌浠g爜搴撴椂鑷姩鎵ц瀹冧滑銆?
> tip
>
> 寮虹儓寤鸿鎮ㄥ湪瀹屾垚涔嬪墠鐨勮绋媉鍚巁鎺㈢储鏈绋嬶紝鍥犱负涓婚锛堝拰浠g爜锛夋槸杩唬寮€鍙戠殑銆備絾鏄紝纭疄鍒涘缓浜?[testing-ml](https://github.com/GokuMohandas/testing-ml)瀛樺偍搴擄紝鍙€氳繃浜や簰寮弉ote鏈揩閫熸瑙堛€?
### 娴嬭瘯绫诲瀷
鍦ㄥ紑鍙戝懆鏈熺殑涓嶅悓闃舵浣跨敤浜嗗洓绉嶄富瑕佺被鍨嬬殑娴嬭瘯锛?
1. `Unit tests`锛氬姣忎釜鍏锋湁[鍗曚竴鑱岃矗](https://en.wikipedia.org/wiki/Single-responsibility_principle)鐨勫崟涓粍浠惰繘琛屾祴璇曪紙渚嬪杩囨护鍒楄〃鐨勫姛鑳斤級銆?
2. `Integration tests`锛氭祴璇曞崟涓粍浠剁殑缁勫悎鍔熻兘锛堜緥濡傛暟鎹鐞嗭級銆?
3. `System tests`锛氬缁欏畾杈撳叆锛堜緥濡傝缁冦€佹帹鐞嗙瓑锛夌殑棰勬湡杈撳嚭鐨勭郴缁熻璁¤繘琛屾祴璇曘€?
4. `Acceptance tests`锛氱敤浜庨獙璇佹槸鍚︽弧瓒宠姹傜殑娴嬭瘯锛岄€氬父绉颁负鐢ㄦ埛楠屾敹娴嬭瘯 (UAT)銆?
5. `Regression tests`锛氬熀浜庝箣鍓嶇湅鍒扮殑閿欒鐨勬祴璇曪紝浠ョ‘淇濇柊鐨勬洿鏀逛笉浼氶噸鏂板紩鍏ュ畠浠€?
铏界劧 ML 绯荤粺鏈川涓婃槸姒傜巼鎬х殑锛屼絾瀹冧滑鐢辫澶氱‘瀹氭€х粍浠剁粍鎴愶紝鍙互浠ヤ笌浼犵粺杞欢绯荤粺绫讳技鐨勬柟寮忚繘琛屾祴璇曘€傚綋浠庢祴璇曚唬鐮佽浆鍚戞祴璇昜鏁版嵁](https://franztao.github.io/2022/10/01/Testing//./#data)鍜孾妯″瀷](https://franztao.github.io/2022/10/01/Testing//./#models)鏃讹紝娴嬭瘯 ML 绯荤粺涔嬮棿鐨勫尯鍒氨寮€濮嬩簡銆?
![娴嬭瘯绫诲瀷](https://upload-images.jianshu.io/upload_images/27840083-744c45174eee9b23.png)
> 杩樻湁璁稿鍏朵粬绫诲瀷鐨勫姛鑳藉拰闈炲姛鑳芥祴璇曪紝渚嬪鍐掔儫娴嬭瘯锛堝揩閫熷仴搴锋鏌ワ級銆佹€ц兘娴嬭瘯锛堣礋杞姐€佸帇鍔涳級銆佸畨鍏ㄦ祴璇曠瓑锛屼絾鍙互鍦ㄤ笂闈㈢殑绯荤粺娴嬭瘯涓鎷墍鏈夎繖浜?
### 搴旇濡備綍娴嬭瘯锛?
缂栧啓娴嬭瘯鏃朵娇鐢ㄧ殑妗嗘灦鏄痆Arrange Act Assert](http://wiki.c2.com/?ArrangeActAssert)鏂规硶銆?
- `Arrange`锛氳缃笉鍚岀殑杈撳叆杩涜娴嬭瘯銆?
- `Act`锛氬皢杈撳叆搴旂敤鍒拌娴嬭瘯鐨勭粍浠朵笂銆?
- `Assert`锛氱‘璁ゆ敹鍒颁簡棰勬湡鐨勮緭鍑恒€?
> `Cleaning`鏄鏂规硶鐨勯潪瀹樻柟绗洓姝ワ紝鍥犱负閲嶈鐨勬槸涓嶈鐣欎笅鍙兘褰卞搷鍚庣画娴嬭瘯鐨勫厛鍓嶆祴璇曠殑娈嬬暀鐗┿€傚彲浠ヤ娇鐢╗pytest-randomly](https://github.com/pytest-dev/pytest-randomly)绛夊寘閫氳繃闅忔満鎵ц娴嬭瘯鏉ユ祴璇曠姸鎬佷緷璧栨€с€?
鍦?Python 涓紝鏈夎澶氬伐鍏凤紝渚嬪[unittest](https://docs.python.org/3/library/unittest.html)銆乕pytest](https://docs.pytest.org/en/stable/)绛夛紝鍙互璁╁湪閬靛畧_Arrange Act Assert_妗嗘灦鐨勫悓鏃惰交鏉惧疄鐜版祴璇曘€傝繖浜涘伐鍏峰叿鏈夊己澶х殑鍐呯疆鍔熻兘锛屼緥濡傚弬鏁板寲銆佽繃婊ゅ櫒绛夛紝鍙互澶ц妯℃祴璇曡澶氭潯浠躲€?
### 搴旇娴嬭瘯浠€涔堬紵
鍦╛瀹夋帓_杈撳叆鍜宊鏂█_棰勬湡杈撳嚭鏃讹紝搴旇娴嬭瘯杈撳叆鍜岃緭鍑虹殑鍝簺鏂归潰锛?
- **杈撳叆**锛氭暟鎹被鍨嬨€佹牸寮忋€侀暱搴︺€佽竟缂樻儏鍐碉紙鏈€灏?鏈€澶с€佸皬/澶х瓑锛?
- **杈撳嚭**锛氭暟鎹被鍨嬨€佹牸寮忋€佸紓甯搞€佷腑闂村拰鏈€缁堣緭鍑?
> [馃憠 灏嗗湪涓嬮潰浠嬬粛涓庢暟鎹甝(https://franztao.github.io/2022/10/01/Testing//./#data)鍜孾妯″瀷](https://franztao.github.io/2022/10/01/Testing//./#models)鏈夊叧鐨勬祴璇曞唴瀹圭殑鍏蜂綋缁嗚妭銆?
## 鏈€浣冲疄璺?
涓嶇浣跨敤浠€涔堟鏋讹紝灏嗘祴璇曚笌寮€鍙戣繃绋嬬揣瀵嗙粨鍚堟槸寰堥噸瑕佺殑銆?
- `atomic`锛氬湪鍒涘缓鍑芥暟鍜岀被鏃讹紝闇€瑕佺‘淇濆畠浠叿鏈塠鍗曚竴鐨勮亴璐(https://en.wikipedia.org/wiki/Single-responsibility_principle)锛屼互渚垮彲浠ヨ交鏉惧湴娴嬭瘯瀹冧滑銆傚鏋滄病鏈夛紝闇€瑕佸皢瀹冧滑鎷嗗垎鎴愭洿缁嗙矑搴︾殑缁勪欢銆?
- `compose`锛氬綋鍒涘缓鏂扮粍浠舵椂锛屽笇鏈涚紪鍐欐祴璇曟潵楠岃瘉瀹冧滑鐨勫姛鑳姐€傝繖鏄‘淇濆彲闈犳€у拰鍙婃棭鍙戠幇閿欒鐨勫ソ鏂规硶銆?
- `reuse`锛氬簲璇ョ淮鎶や腑澶瓨鍌ㄥ簱锛屽叾涓牳蹇冨姛鑳藉湪婧愬ご杩涜娴嬭瘯骞跺湪璁稿椤圭洰涓噸鐢ㄣ€傝繖鏄剧潃鍑忓皯浜嗘瘡涓柊椤圭洰浠g爜搴撶殑娴嬭瘯宸ヤ綔閲忋€?
- `regression`锛氭兂瑙i噴鍥炲綊娴嬭瘯涓亣鍒扮殑鏂伴敊璇紝杩欐牱灏卞彲浠ョ‘淇濆皢鏉ヤ笉浼氶噸鏂板紩鍏ョ浉鍚岀殑閿欒銆?
- `coverage`锛氬笇鏈涚‘淇濅唬鐮佸簱[100% 瑕嗙洊](https://franztao.github.io/2022/10/01/Testing//#coverage)銆傝繖骞朵笉鎰忓懗鐫€瑕佷负姣忎竴琛屼唬鐮佺紪鍐欐祴璇曪紝鑰屾槸瑕佽€冭檻姣忎竴琛屼唬鐮併€?
- `automate`锛氬鏋滃繕璁板湪鎻愪氦鍒板瓨鍌ㄥ簱涔嬪墠杩愯娴嬭瘯锛屽笇鏈涘湪瀵逛唬鐮佸簱杩涜鏇存敼鏃惰嚜鍔ㄨ繍琛屾祴璇曘€傚皢鍦ㄥ悗缁绋嬩腑瀛︿範濡備綍浣跨敤[棰勬彁浜ook鍦ㄦ湰鍦版墽琛屾鎿嶄綔锛屽苟閫氳繃](https://franztao.github.io/2022/10/01/Testing//../pre-commit/)[GitHub 鎿嶄綔](https://franztao.github.io/2022/10/01/Testing//../cicd/#github-actions)杩滅▼鎵ц姝ゆ搷浣溿€?
## 娴嬭瘯椹卞姩寮€鍙?
[娴嬭瘯椹卞姩寮€鍙?(TDD)](https://en.wikipedia.org/wiki/Test-driven_development)鏄湪缂栧啓鍔熻兘涔嬪墠缂栧啓娴嬭瘯浠ョ‘淇濆缁堢紪鍐欐祴璇曠殑杩囩▼銆傝繖涓庡厛缂栧啓鍔熻兘鐒跺悗鍐嶇紪鍐欐祴璇曞舰鎴愬姣斻€備互涓嬫槸瀵规鐨勬煡鐪嬶細
- 闅忕潃杩涙缂栧啓娴嬭瘯寰堝ソ锛屼絾杩欑‘瀹炴剰鍛崇潃 100% 鐨勬纭€с€?
- 鍦ㄨ繘鍏ヤ唬鐮佹垨娴嬭瘯涔嬪墠锛屾渶鍒濈殑鏃堕棿搴旇鑺卞湪璁捐涓娿€?
濡傛灉杩欎簺娴嬭瘯娌℃湁鎰忎箟骞朵笖涓嶅寘鍚彲鑳界殑杈撳叆銆佷腑闂翠綋鍜岃緭鍑虹殑棰嗗煙锛岄偅涔堝畬缇庣殑瑕嗙洊骞朵笉鎰忓懗鐫€搴旂敤绋嬪簭娌℃湁閿欒銆傚洜姝わ紝搴旇鍦ㄩ潰涓撮敊璇椂鏈濈潃鏇村ソ鐨勮璁″拰鏁忔嵎鎬у姫鍔涳紝蹇€熻В鍐冲畠浠苟鍥寸粫瀹冧滑缂栧啓娴嬭瘯鐢ㄤ緥浠ラ伩鍏嶄笅涓€娆°€?
## 搴旂敤
鍦╗搴旂敤绋嬪簭](https://github.com/GokuMohandas/mlops-course)涓紝灏嗘祴璇曚唬鐮併€佹暟鎹拰妯″瀷銆傚皢棣栧厛鍒涘缓涓€涓猔tests`甯︽湁`code`瀛愮洰褰曠殑鍗曠嫭鐩綍鏉ユ祴璇昤tagifai`鑴氭湰銆傚皢鍦ㄤ笅闈㈠垱寤虹敤浜庢祴璇昜鏁版嵁](https://franztao.github.io/2022/10/01/Testing//#馃敘nbsp-data)鍜孾妯″瀷](https://franztao.github.io/2022/10/01/Testing//#馃nbsp-models)鐨勫瓙鐩綍銆?
```
mkdir tests
cd tests
mkdir app config model tagifai
touch
cd ../
```
```
tests/
鈹斺攢鈹€ code/
鈹? 鈹溾攢鈹€ test_data.py
鈹? 鈹溾攢鈹€ test_evaluate.py
鈹? 鈹溾攢鈹€ test_main.py
鈹? 鈹溾攢鈹€ test_predict.py
鈹? 鈹斺攢鈹€ test_utils.py
```
鍦ㄥ涔犱簡鏈涓殑鎵€鏈夋蹇礯鍚巁锛岃闅忔剰缂栧啓娴嬭瘯骞跺皢瀹冧滑缁勭粐鍦ㄨ繖浜涜剼鏈腑銆傚缓璁娇鐢╗`tests`](https://github.com/GokuMohandas/mlops-course/tree/main/tests)鍦?GitHub 涓婄殑鐩綍浣滀负鍙傝€冦€?
> 璇锋敞鎰忥紝`tagifai/train.py`鑴氭湰娌℃湁鐩稿簲鐨刞tests/code/test_train.py`. 涓€浜涜剼鏈叿鏈夊甫鏈変緷璧栭」锛堜緥濡傚伐浠讹級鐨勫ぇ鍨嬪嚱鏁帮紙渚嬪`train.train()`銆乣train.optimize()`銆佺瓑锛夛紝閫氳繃.`predict.predict()``tests/code/test_main.py`
## 馃И Pytest
灏嗕娇鐢╗pytest](https://docs.pytest.org/en/stable/)浣滀负娴嬭瘯妗嗘灦锛屽洜涓哄畠鍏锋湁寮哄ぇ鐨勫唴缃姛鑳斤紝渚嬪[鍙傛暟鍖朷(https://franztao.github.io/2022/10/01/Testing//#parametrize)銆乕鍥哄畾瑁呯疆](https://franztao.github.io/2022/10/01/Testing//#fixtures)銆乕鏍囪](https://franztao.github.io/2022/10/01/Testing//#markers)绛夈€?
```
pip install pytest==7.1.2
```
鐢变簬杩欎釜娴嬭瘯鍖呬笉鏄牳蹇冩満鍣ㄥ涔犳搷浣滅殑缁勬垚閮ㄥ垎锛岃鍦ㄤ腑鍒涘缓涓€涓崟鐙殑鍒楄〃`setup.py`骞跺皢鍏舵坊鍔犲埌`extras_require`锛?
```
# setup.py
test_packages = [
"pytest==7.1.2",
]
# Define our package
setup(
...
extras_require={
"dev": docs_packages + style_packages + test_packages,
"docs": docs_packages,
"test": test_packages,
},
)
```
鍒涘缓浜嗕竴涓槑纭殑`test`閫夐」锛屽洜涓虹敤鎴峰彧鎯充笅杞芥祴璇曞寘銆俒褰撲娇鐢–I/CD 宸ヤ綔娴乚(https://franztao.github.io/2022/10/01/Testing//../cicd/)閫氳繃 GitHub Actions 杩愯娴嬭瘯鏃讹紝灏嗙湅鍒拌繖涓€鐐广€?
### 閰嶇疆
Pytest 鏈熸湜娴嬭瘯鍦╜tests`榛樿鎯呭喌涓嬬粍缁囧湪涓€涓洰褰曚笅銆備絾鏄紝涔熷彲浠ユ坊鍔犲埌鐜版湁`pyproject.toml`鏂囦欢涓互閰嶇疆浠讳綍鍏朵粬娴嬭瘯鐩綍銆傝繘鍏ョ洰褰曞悗锛宲ytest 浼氭煡鎵句互 寮€澶寸殑 python 鑴氭湰锛宍tests_*.py`浣嗕篃鍙互灏嗗叾閰嶇疆涓鸿鍙栦换浣曞叾浠栨枃浠舵ā寮忋€?
```
# Pytest
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
```
### 鏂█
璁╃湅鐪嬫牱鏈祴璇曞強鍏剁粨鏋滄槸浠€涔堟牱鐨勩€傚亣璁炬湁涓€涓畝鍗曠殑鍑芥暟鏉ョ‘瀹氭按鏋滄槸鍚﹁剢锛?
```
# food/fruits.py
def is_crisp(fruit):
if fruit:
fruit = fruit.lower()
if fruit in ["apple", "watermelon", "cherries"]:
return True
elif fruit in ["orange", "mango", "strawberry"]:
return False
else:
raise ValueError(f"{fruit} not in known list of fruits.")
return False
```
涓轰簡娴嬭瘯杩欎釜鍔熻兘锛屽彲浠ヤ娇鐢╗鏂█璇彞](https://docs.pytest.org/en/stable/assert.html)鏉ユ槧灏勮緭鍏ュ拰棰勬湡鐨勮緭鍑恒€傚崟璇嶅悗闈㈢殑璇彞`assert`蹇呴』杩斿洖 True銆?
```
# tests/food/test_fruits.py
def test_is_crisp():
assert is_crisp(fruit="apple")
assert is_crisp(fruit="Apple")
assert not is_crisp(fruit="orange")
with pytest.raises(ValueError):
is_crisp(fruit=None)
is_crisp(fruit="pear")
```
> 杩樺彲浠ュ[寮傚父](https://docs.pytest.org/en/stable/assert.html#assertions-about-expected-exceptions)杩涜鏂█锛屽氨鍍忓湪绗?6-8 琛屼腑鎵€鍋氱殑閭f牱锛屽叾涓?with 璇彞涓嬬殑鎵€鏈夋搷浣滈兘搴旇寮曞彂鎸囧畾鐨勫紓甯搞€?
> `assert`鍦ㄩ」鐩腑浣跨敤鐨勪緥瀛?
>
> ```
> # tests/code/test_evaluate.py
> def test_get_metrics():
> y_true = np.array([0, 0, 1, 1])
> y_pred = np.array([0, 1, 0, 1])
> classes = ["a", "b"]
> performance = evaluate.get_metrics(y_true=y_true, y_pred=y_pred, classes=classes, df=None)
> assert performance["overall"]["precision"] == 2/4
> assert performance["overall"]["recall"] == 2/4
> assert performance["class"]["a"]["precision"] == 1/2
> assert performance["class"]["a"]["recall"] == 1/2
> assert performance["class"]["b"]["precision"] == 1/2
> assert performance["class"]["b"]["recall"] == 1/2
> ```
### 鎵ц
鍙互浣跨敤鍑犱釜涓嶅悓鐨勭矑搴︾骇鍒墽琛屼笂闈㈢殑娴嬭瘯锛?
```
python3 -m pytest # all tests
python3 -m pytest tests/food # tests under a directory
python3 -m pytest tests/food/test_fruits.py # tests for a single file
python3 -m pytest tests/food/test_fruits.py::test_is_crisp # tests for a single function
```
鍦ㄤ笂闈㈣繍琛岀壒瀹氭祴璇曞皢浜х敓浠ヤ笅杈撳嚭锛?
```
python3 -m pytest tests/food/test_fruits.py::test_is_crisp
```
```
tests/food/test_fruits.py::test_is_crisp . [100%]
```
濡傛灉鍦ㄦ娴嬭瘯涓殑浠讳綍鏂█澶辫触锛屽皢鐪嬪埌澶辫触鐨勬柇瑷€锛屼互鍙婂嚱鏁扮殑棰勬湡鍜屽疄闄呰緭鍑恒€?
```
tests/food/test_fruits.py F [100%]
def test_is_crisp():
> assert is_crisp(fruit="orange")
E AssertionError: assert False
E + where False = is_crisp(fruit='orange')
```
> tip
>
> 閲嶈鐨勬槸瑕佹祴璇昜涓婇潰](https://franztao.github.io/2022/10/01/Testing//#how-should-we-test)姒傝堪鐨勫悇绉嶈緭鍏ュ拰棰勬湡杈撳嚭锛屽苟涓?*姘歌繙涓嶈鍋囪娴嬭瘯鏄井涓嶈冻閬撶殑**銆傚湪涓婇潰鐨勪緥瀛愪腑锛屽鏋滃嚱鏁版病鏈夎€冭檻澶у皬鍐欙紝娴嬭瘯鈥渁pple鈥濆拰鈥淎pple鈥濇槸寰堥噸瑕佺殑锛?
### Classes
杩樺彲浠ラ€氳繃鍒涘缓娴嬭瘯绫绘潵娴嬭瘯绫诲強鍏跺悇鑷殑鍔熻兘銆傚湪娴嬭瘯绫讳腑锛屽彲浠ラ€夋嫨瀹氫箟鍦ㄨ缃垨鎷嗛櫎绫诲疄渚嬫垨浣跨敤绫绘柟娉曟椂鑷姩鎵ц鐨刐鍑芥暟銆俔(https://docs.pytest.org/en/stable/xunit_setup.html)
- `setup_class`锛氫负浠讳綍绫诲疄渚嬭缃姸鎬併€?
- `teardown_class`: 鎷嗛櫎 setup\_class 涓垱寤虹殑鐘舵€併€?
- `setup_method`锛氬湪姣忎釜鏂规硶涔嬪墠璋冪敤浠ヨ缃换浣曠姸鎬併€?
- `teardown_method`锛氬湪姣忎釜鏂规硶涔嬪悗璋冪敤浠ユ媶闄や换浣曠姸鎬併€?
```
class Fruit(object):
def __init__(self, name):
self.name = name
class TestFruit(object):
@classmethod
def setup_class(cls):
"""Set up the state for any class instance."""
pass
@classmethod
def teardown_class(cls):
"""Teardown the state created in setup_class."""
pass
def setup_method(self):
"""Called before every method to setup any state."""
self.fruit = Fruit(name="apple")
def teardown_method(self):
"""Called after every method to teardown any state."""
del self.fruit
def test_init(self):
assert self.fruit.name == "apple"
```
鍙互閫氳繃鎸囧畾绫诲悕鏉ヤ负绫绘墽琛屾墍鏈夋祴璇曪細
```
python3 -m pytest tests/food/test_fruits.py::TestFruit
```
```
tests/food/test_fruits.py::TestFruit . [100%]
```
> `class`鍦ㄩ」鐩腑娴嬭瘯 鐨勭ず渚?
>
> ```
> # tests/code/test_data.py
> class TestLabelEncoder:
> @classmethod
> def setup_class(cls):
> """Called before every class initialization."""
> pass
>
> @classmethod
> def teardown_class(cls):
> """Called after every class initialization."""
> pass
>
> def setup_method(self):
> """Called before every method."""
> self.label_encoder = data.LabelEncoder()
>
> def teardown_method(self):
> """Called after every method."""
> del self.label_encoder
>
> def test_empty_init(self):
> label_encoder = data.LabelEncoder()
> assert label_encoder.index_to_class == {}
> assert len(label_encoder.classes) == 0
>
> def test_dict_init(self):
> class_to_index = {"apple": 0, "banana": 1}
> label_encoder = data.LabelEncoder(class_to_index=class_to_index)
> assert label_encoder.index_to_class == {0: "apple", 1: "banana"}
> assert len(label_encoder.classes) == 2
>
> def test_len(self):
> assert len(self.label_encoder) == 0
>
> def test_save_and_load(self):
> with tempfile.TemporaryDirectory() as dp:
> fp = Path(dp, "label_encoder.json")
> self.label_encoder.save(fp=fp)
> label_encoder = data.LabelEncoder.load(fp=fp)
> assert len(label_encoder.classes) == 0
>
> def test_str(self):
> assert str(data.LabelEncoder()) == ""
>
> def test_fit(self):
> label_encoder = data.LabelEncoder()
> label_encoder.fit(["apple", "apple", "banana"])
> assert "apple" in label_encoder.class_to_index
> assert "banana" in label_encoder.class_to_index
> assert len(label_encoder.classes) == 2
>
> def test_encode_decode(self):
> class_to_index = {"apple": 0, "banana": 1}
> y_encoded = [0, 0, 1]
> y_decoded = ["apple", "apple", "banana"]
> label_encoder = data.LabelEncoder(class_to_index=class_to_index)
> label_encoder.fit(["apple", "apple", "banana"])
> assert np.array_equal(label_encoder.encode(y_decoded), np.array(y_encoded))
> assert label_encoder.decode(y_encoded) == y_decoded
> ```
### 鍙傛暟鍖?
鍒扮洰鍓嶄负姝紝鍦ㄦ祴璇曚腑锛屽繀椤诲垱寤哄崟鐙殑鏂█璇彞鏉ラ獙璇佽緭鍏ュ拰棰勬湡杈撳嚭鐨勪笉鍚岀粍鍚堛€傜劧鑰岋紝杩欓噷鏈変竴鐐瑰啑浣欙紝鍥犱负杈撳叆鎬绘槸浣滀负鍙傛暟杈撳叆鍒板嚱鏁颁腑锛屽苟涓旇緭鍑轰笌棰勬湡杈撳嚭杩涜姣旇緝銆備负浜嗘秷闄よ繖绉嶅啑浣欙紝pytest 鏈変竴涓猍`@pytest.mark.parametrize`](https://docs.pytest.org/en/stable/parametrize.html)瑁呴グ鍣紝瀹冨厑璁稿皢杈撳叆鍜岃緭鍑鸿〃绀轰负鍙傛暟銆?
```
@pytest.mark.parametrize(
"fruit, crisp",
[
("apple", True),
("Apple", True),
("orange", False),
],
)
def test_is_crisp_parametrize(fruit, crisp):
assert is_crisp(fruit=fruit) == crisp
```
```
python3 -m pytest tests/food/test_is_crisp_parametrize.py ... [100%]
```
1. `[Line 2]`锛氬畾涔夎楗板櫒涓嬬殑鍙傛暟鍚嶇О锛屼緥濡傘€傗€渇ruit, crisp鈥濓紙娉ㄦ剰杩欐槸涓€涓瓧绗︿覆锛夈€?
2. `[Lines 3-7]`锛氭彁渚涙楠?1 涓弬鏁扮殑鍊肩粍鍚堝垪琛ㄣ€?
3. `[Line 9]`锛氬皢鍙傛暟鍚嶇О浼犻€掔粰娴嬭瘯鍑芥暟銆?
4. `[Line 10]`锛氬寘鎷繀瑕佺殑鏂█璇彞锛岃繖浜涜鍙ュ皢涓烘楠?2 涓垪琛ㄤ腑鐨勬瘡涓粍鍚堟墽琛屻€?
鍚屾牱锛屼篃鍙互浼犲叆涓€涓紓甯镐綔涓洪鏈熺粨鏋滐細
```
@pytest.mark.parametrize(
"fruit, exception",
[
("pear", ValueError),
],
)
def test_is_crisp_exceptions(fruit, exception):
with pytest.raises(exception):
is_crisp(fruit=fruit)
```
> `parametrize`椤圭洰涓殑绀轰緥
>
> ```
> # tests/code/test_data.py
> from tagifai import data
> @pytest.mark.parametrize(
> "text, lower, stem, stopwords, cleaned_text",
> [
> ("Hello worlds", False, False, [], "Hello worlds"),
> ("Hello worlds", True, False, [], "hello worlds"),
> ...
> ],
> )
> def test_preprocess(text, lower, stem, stopwords, cleaned_text):
> assert (
> data.clean_text(
> text=text,
> lower=lower,
> stem=stem,
> stopwords=stopwords,
> )
> == cleaned_text
> )
> ```
### Fixtures
鍙傛暟鍖栧厑璁稿噺灏戞祴璇曞嚱鏁板唴閮ㄧ殑鍐椾綑锛屼絾鏄浣曞噺灏戜笉鍚屾祴璇曞嚱鏁颁箣闂寸殑鍐椾綑鍛紵渚嬪锛屽亣璁句笉鍚岀殑鍑芥暟閮芥湁涓€涓暟鎹浣滀负杈撳叆銆傚湪杩欓噷锛屽彲浠ヤ娇鐢╬ytest鐨勫唴缃甗fixture](https://docs.pytest.org/en/stable/fixture.html)锛屽畠鏄竴涓湪test鍑芥暟涔嬪墠鎵ц鐨勫嚱鏁般€?
```
@pytest.fixture
def my_fruit():
fruit = Fruit(name="apple")
return fruit
def test_fruit(my_fruit):
assert my_fruit.name == "apple"
```
> 璇锋敞鎰忥紝fixture 鐨勫悕绉板拰 test 鍑芥暟鐨勮緭鍏ユ槸鐩稿悓鐨?( `my_fruit`)銆?
涔熷彲浠ュ皢fixture 搴旂敤鍒扮被涓紝褰撹皟鐢ㄧ被涓殑浠讳綍鏂规硶鏃堕兘浼氳皟鐢╢ixture 鍑芥暟銆?
```
@pytest.mark.usefixtures("my_fruit")
class TestFruit:
...
```
> `fixtures`椤圭洰涓殑绀轰緥
>
> 鍦╰ransformers椤圭洰涓紝浣跨敤鍥哄畾瑁呯疆鏈夋晥鍦板皢涓€缁勮緭鍏ワ紙渚嬪 Pandas DataFrame锛変紶閫掔粰闇€瑕佸畠浠殑涓嶅悓娴嬭瘯鍔熻兘锛堟竻鐞嗐€佹媶鍒嗙瓑锛夈€?
>
> ```
> # tests/code/test_data.py
> @pytest.fixture(scope="module")
> def df():
> data = [
> {"title": "a0", "description": "b0", "tag": "c0"},
> {"title": "a1", "description": "b1", "tag": "c1"},
> {"title": "a2", "description": "b2", "tag": "c1"},
> {"title": "a3", "description": "b3", "tag": "c2"},
> {"title": "a4", "description": "b4", "tag": "c2"},
> {"title": "a5", "description": "b5", "tag": "c2"},
> ]
> df = pd.DataFrame(data * 10)
> return df
>
>
> @pytest.mark.parametrize(
> "labels, unique_labels",
> [
> ([], ["other"]), # no set of approved labels
> (["c3"], ["other"]), # no overlap b/w approved/actual labels
> (["c0"], ["c0", "other"]), # partial overlap
> (["c0", "c1", "c2"], ["c0", "c1", "c2"]), # complete overlap
> ],
> )
> def test_replace_oos_labels(df, labels, unique_labels):
> replaced_df = data.replace_oos_labels(
> df=df.copy(), labels=labels, label_col="tag", oos_label="other"
> )
> assert set(replaced_df.tag.unique()) == set(unique_labels)
> ```
> 璇锋敞鎰忥紝涓嶅湪鍙傛暟鍖栨祴璇曞嚱鏁癭df`涓洿鎺ヤ娇鐢╢ixture锛堜紶鍏ワ級銆俙df.copy()`濡傛灉杩欐牱鍋氫簡锛岄偅涔堝皢`df`鍦ㄦ瘡娆″弬鏁板寲鍚庢洿鏀?鐨勫€笺€?
>
> > Tip
> >
> > 鍦ㄥ洿缁曟暟鎹泦鍒涘缓鍥哄畾瑁呯疆鏃讹紝鏈€浣冲仛娉曟槸鍒涘缓涓€涓粛鐒堕伒寰浉鍚屾ā寮忕殑绠€鍖栫増鏈€備緥濡傦紝鍦ㄤ笂闈㈢殑鏁版嵁妗嗗浐瀹氳缃腑锛屾鍦ㄥ垱寤轰竴涓緝灏忕殑鏁版嵁妗嗭紝瀹冧粛鐒跺叿鏈変笌瀹為檯鏁版嵁妗嗙浉鍚岀殑鍒楀悕銆傝櫧鐒跺彲浠ュ姞杞絫ransformers瀹為檯鏁版嵁闆嗭紝浣嗛殢鐫€transformers鏁版嵁闆嗛殢鏃堕棿鍙樺寲锛堟柊鏍囩銆佸垹闄ゆ爣绛俱€侀潪甯稿ぇ鐨勬暟鎹泦绛夛級锛屽畠鍙兘浼氬鑷撮棶棰?
Fixtures 鍙互鏈変笉鍚岀殑鑼冨洿锛岃繖鍙栧喅浜庡浣曚娇鐢ㄥ畠浠€備緥濡傦紝`df`fixture鍏锋湁妯″潡鑼冨洿锛屽洜涓轰笉鎯冲湪姣忔娴嬭瘯鍚庨兘閲嶆柊鍒涘缓瀹冿紝鑰屾槸甯屾湜涓烘ā鍧椾腑鐨勬墍鏈夋祴璇曞彧鍒涘缓涓€娆★紙`tests/test_data.py`锛夈€?
- `function`: 姣忔娴嬭瘯鍚庯紝fixture 閮戒細琚攢姣併€俙[default]`
- `class`锛歠ixture鍦ㄧ被涓殑鏈€鍚庝竴娆℃祴璇曞悗琚攢姣併€?
- `module`锛歠ixture鍦ㄦā鍧楋紙鑴氭湰锛変腑鐨勬渶鍚庝竴娆℃祴璇曞悗琚攢姣併€?
- `package`锛歠ixture鍦ㄥ寘涓殑鏈€鍚庝竴娆℃祴璇曞悗琚攢姣併€?
- `session`锛歠ixture鍦ㄤ細璇濈殑鏈€鍚庝竴娆℃祴璇曞悗琚攢姣併€?
鍔熻兘鏄渶浣庣骇鍒殑鑼冨洿锛岃€孾浼氳瘽](https://docs.pytest.org/en/6.2.x/fixture.html#scope-sharing-fixtures-across-classes-modules-packages-or-session)鏄渶楂樼骇鍒€傞鍏堟墽琛屾渶楂樼骇鍒殑鑼冨洿鍥哄畾瑁呯疆銆?
> 閫氬父锛屽綋鍦ㄤ竴涓壒瀹氱殑娴嬭瘯鏂囦欢涓湁璁稿fixture鏃讹紝鍙互灏嗗畠浠叏閮ㄧ粍缁囧湪涓€涓猔fixtures.py`鑴氭湰涓苟鏍规嵁闇€瑕佽皟鐢ㄥ畠浠€?
### 鏍囪
宸茬粡鑳藉浠ュ悇绉嶇矑搴︾骇鍒紙鎵€鏈夋祴璇曘€佽剼鏈€佸嚱鏁扮瓑锛夋墽琛屾祴璇曪紝浣嗗彲浠ヤ娇鐢╗鏍囪](https://docs.pytest.org/en/stable/mark.html)鍒涘缓鑷畾涔夌矑搴︺€傚凡缁忎娇鐢ㄤ簡涓€绉嶇被鍨嬬殑鏍囪锛堝弬鏁板寲锛夛紝浣嗚繕鏈夊叾浠栧嚑绉峓鍐呯疆鏍囪](https://docs.pytest.org/en/stable/mark.html#mark)銆備緥濡傦紝[`skipif`](https://docs.pytest.org/en/stable/skipping.html#id1)濡傛灉婊¤冻鏉′欢锛屾爣璁板厑璁歌烦杩囨祴璇曠殑鎵ц銆備緥濡傦紝鍋囪鍙兂鍦?GPU 鍙敤鏃舵祴璇曡缁冩ā鍨嬶細
```
@pytest.mark.skipif(
not torch.cuda.is_available(),
reason="Full training tests require a GPU."
)
def test_training():
pass
```
[闄や簡涓€浜涗繚鐣橾(https://docs.pytest.org/en/stable/reference.html#marks)鐨勬爣璁板悕绉板锛岃繕鍙互鍒涘缓鑷繁鐨勮嚜瀹氫箟鏍囪銆?
```
@pytest.mark.fruits
def test_fruit(my_fruit):
assert my_fruit.name == "apple"
```
`-m`鍙互浣跨敤闇€瑕侊紙鍖哄垎澶у皬鍐欙級鏍囪琛ㄨ揪寮忕殑鏍囧織鏉ユ墽琛屽畠浠紝濡備笅鎵€绀猴細
```
pytest -m "fruits" # runs all tests marked with `fruits`
pytest -m "not fruits" # runs all tests besides those marked with `fruits`
```
> tip
>
> 浣跨敤鏍囪鐨勬纭柟娉曟槸鏄庣‘鍒楀嚭鍦╗pyproject.toml](https://github.com/GokuMohandas/mlops-course/blob/main/pyproject.toml)鏂囦欢涓垱寤虹殑鏍囪銆傚湪杩欓噷锛屽彲浠ユ寚瀹氬繀椤诲湪姝ゆ枃浠朵腑浣跨敤`--strict-markers`鏍囧織瀹氫箟鎵€鏈夋爣璁帮紝鐒跺悗鍦╜markers`鍒楄〃涓0鏄庢爣璁帮紙浠ュ強鏈夊叧瀹冧滑鐨勪竴浜涗俊鎭級锛?
>
> ```
> @pytest.mark.training
> def test_train_model():
> assert ...
> ```
> ```
> # Pytest
> [tool.pytest.ini_options]
> testpaths = ["tests"]
> python_files = "test_*.py"
> addopts = "--strict-markers --disable-pytest-warnings"
> markers = [
> "training: tests that involve training",
> ]
> ```
> 瀹屾垚姝ゆ搷浣滃悗锛屽彲浠ラ€氳繃鎵ц鏌ョ湅鎵€鏈夌幇鏈夌殑鏍囪鍒楄〃锛宍pytest --markers`褰撳皾璇曚娇鐢ㄦ澶勬湭瀹氫箟鐨勬柊鏍囪鏃朵細鏀跺埌閿欒娑堟伅銆?
### 瑕嗙洊鑼冨洿
褰撲负搴旂敤绋嬪簭鐨勭粍浠跺紑鍙戞祴璇曟椂锛岄噸瑕佺殑鏄鐭ラ亾瀵逛唬鐮佸簱鐨勮鐩栫▼搴︿互鍙婄煡閬撴槸鍚﹂仐婕忎簡浠讳綍涓滆タ銆傚彲浠ヤ娇鐢╗Coverage](https://coverage.readthedocs.io/)搴撴潵璺熻釜鍜屽彲瑙嗗寲娴嬭瘯鍗犱唬鐮佸簱鐨勫灏戙€備娇鐢?pytest锛岀敱浜嶽pytest-cov](https://pytest-cov.readthedocs.io/)鎻掍欢锛屼娇鐢ㄨ繖涓寘鍙樺緱鏇村姞瀹规槗銆?
```
pip install pytest-cov==2.10.1
```
灏嗘妸瀹冩坊鍔犲埌`setup.py`鑴氭湰涓細
```
# setup.py
test_packages = [
"pytest==7.1.2",
"pytest-cov==2.10.1"
]
```
```
python3 -m pytest --cov tagifai --cov-report html
```
![pytest](https://upload-images.jianshu.io/upload_images/27840083-28b8d8e511a73d8d.png)
鍦ㄨ繖閲岋紝瑕佹眰瑕嗙洊 tagifai 鍜?app 鐩綍涓殑鎵€鏈変唬鐮侊紝骞朵互 HTML 鏍煎紡鐢熸垚鎶ュ憡銆傚綋杩愯瀹冩椂锛屽皢鐪嬪埌娴嬭瘯鐩綍涓殑娴嬭瘯姝e湪鎵ц锛岃€岃鐩栨彃浠舵鍦ㄨ窡韪簲鐢ㄧ▼搴忎腑鐨勫摢浜涜姝e湪鎵ц銆傛祴璇曞畬鎴愬悗锛屽彲浠ユ煡鐪嬬敓鎴愮殑鎶ュ憡锛堥粯璁や负`htmlcov/index.html`锛夊苟鍗曞嚮鍚勪釜鏂囦欢浠ユ煡鐪嬪摢浜涢儴鍒嗘湭琚换浣曟祴璇曡鐩栥€傚綋蹇樿娴嬭瘯鏌愪簺鏉′欢銆佸紓甯哥瓑鏃讹紝杩欏挨鍏舵湁鐢ㄣ€?
![娴嬭瘯瑕嗙洊鐜嘳(https://upload-images.jianshu.io/upload_images/27840083-88727a764a09e446.png)
> warning
>
> 铏界劧鏈?100% 鐨勮鐩栫巼锛屼絾杩欏苟涓嶆剰鍛崇潃搴旂敤绋嬪簭鏄畬缇庣殑銆傝鐩栫巼鍙槸琛ㄧず鍦ㄦ祴璇曚腑鎵ц鐨勪竴娈典唬鐮侊紝涓嶄竴瀹氭槸瀹冪殑姣忎竴閮ㄥ垎閮界粡杩囨祴璇曪紝鏇翠笉鐢ㄨ褰诲簳娴嬭瘯浜嗐€傚洜姝わ紝瑕嗙洊鐜?*姘歌繙**涓嶅簲琚敤浣滄纭€х殑琛ㄧず銆備絾鏄紝灏嗚鐩栫巼淇濇寔鍦?100% 闈炲父鏈夌敤锛岃繖鏍峰氨鍙互鐭ラ亾鏂板姛鑳戒綍鏃跺皻鏈祴璇曘€傚湪 CI/CD 璇剧▼涓紝灏嗕簡瑙e湪鎺ㄩ€佸埌鐗瑰畾鍒嗘敮鏃跺浣曚娇鐢?GitHub 鎿嶄綔鏉ュ疄鐜?100% 鐨勮鐩栫巼銆?
### 鎺掗櫎椤?
鏈夋椂缂栧啓娴嬭瘯鏉ヨ鐩栧簲鐢ㄧ▼搴忎腑鐨勬瘡涓€琛屾槸娌℃湁鎰忎箟鐨勶紝浣嗕粛鐒跺笇鏈涜€冭檻杩欎簺琛岋紝浠ヤ究鍙互淇濇寔 100% 鐨勮鐩栫巼銆傚簲鐢ㄦ帓闄ゆ椂锛屾湁涓や釜绾у埆鐨勬潈闄愶細
1. 閫氳繃娣诲姞姝よ瘎璁烘潵鍘熻皡琛宍# pragma: no cover, `
```
if trial: # pragma: no cover, optuna pruning
trial.report(val_loss, epoch)
if trial.should_prune():
raise optuna.TrialPruned()
```
2. `pyproject.toml`閫氳繃鍦ㄩ厤缃腑鎸囧畾鏂囦欢鏉ユ帓闄ゆ枃浠讹細
```
# Pytest coverage
[tool.coverage.run]
omit = ["app/gunicorn.py"]
```
> 閲嶇偣鏄兘澶熼€氳繃璇勮涓鸿繖浜涙帓闄ら」娣诲姞鐞嗙敱锛屼互渚垮洟闃熷彲浠ラ伒寰帹鐞嗐€?
鐜板湪宸茬粡鏈変簡娴嬭瘯浼犵粺杞欢鐨勫熀纭€锛岃鍦ㄦ満鍣ㄥ涔犵郴缁熺殑鑳屾櫙涓嬫繁鍏ユ祴璇曟暟鎹拰妯″瀷銆?
## 鏁版嵁
鍒扮洰鍓嶄负姝紝宸茬粡浣跨敤鍗曞厓娴嬭瘯鍜岄泦鎴愭祴璇曟潵娴嬭瘯涓巘ransformers鏁版嵁浜や簰鐨勫姛鑳斤紝浣嗚繕娌℃湁娴嬭瘯鏁版嵁鏈韩鐨勬湁鏁堟€с€傚皢浣跨敤[great expectations](https://github.com/great-expectations/great_expectations)搴撴潵娴嬭瘯transformers鏁版嵁棰勬湡鐨勬牱瀛愩€傚畠鏄竴涓簱锛屼娇鑳藉浠ユ爣鍑嗗寲鐨勬柟寮忓垱寤哄叧浜巘ransformers鏁版嵁搴旇鏄粈涔堟牱瀛愮殑鏈熸湜銆傚畠杩樻彁渚涗簡涓庡悗绔暟鎹簮锛堝鏈湴鏂囦欢绯荤粺銆丼3銆佹暟鎹簱绛夛級鏃犵紳杩炴帴鐨勬ā鍧椼€傝閫氳繃瀹炵幇瀵瑰簲鐢ㄧ▼搴忔墍闇€鐨勬湡鏈涙潵鎺㈢储璇ュ簱銆?
> 馃憠璺熼殢浜や簰寮弉ote鍦燵**testing-ml**](https://github.com/GokuMohandas/testing-ml)瀛樺偍搴擄紝鍥犱负瀹炵幇浜嗕互涓嬫蹇点€?
```
pip install great-expectations==0.15.15
```
灏嗘妸瀹冩坊鍔犲埌transformers`setup.py`鑴氭湰涓細
```
# setup.py
test_packages = [
"pytest==7.1.2",
"pytest-cov==2.10.1",
"great-expectations==0.15.15"
]
```
棣栧厛锛屽皢鍔犺浇鎯宠搴旂敤transformers鏈熸湜鐨勬暟鎹€傚彲浠ヤ粠鍚勭[鏉ユ簮](https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview)锛堟枃浠剁郴缁熴€佹暟鎹簱銆佷簯绛夛級鍔犺浇transformers鏁版嵁锛岀劧鍚庡彲浠ュ皢鍏跺寘瑁呭湪涓€涓猍鏁版嵁闆嗘ā鍧梋(https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/dataset/index.html)锛圥andas/Spark DataFrame銆丼QLAlchemy锛変腑銆?
```
import great_expectations as ge
import json
import pandas as pd
from urllib.request import urlopen
```
```
# Load labeled projects
projects = pd.read_csv("https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/projects.csv")
tags = pd.read_csv("https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/tags.csv")
df = ge.dataset.PandasDataset(pd.merge(projects, tags, on="id"))
print (f"{len(df)} projects")
df.head(5)
```
```
# Load labeled projects
projects = pd.read_csv("https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/projects.csv")
tags = pd.read_csv("https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/tags.csv")
df = ge.dataset.PandasDataset(pd.merge(projects, tags, on="id"))
print (f"{len(df)} projects")
df.head(5)
```
| | id | created_on | title | description | tag |
| --- | --- | ------------------- | ------------------------------------------------- | --------------------------------------------------- | ---------------------- |
| 0 | 6 | 2020-02-20 06:43:18 | Comparison between YOLO and RCNN on real world... | Bringing theory to experiment is cool. We can ... | computer-vision |
| 1 | 7 | 2020-02-20 06:47:21 | Show, Infer & Tell: Contextual Inference for C... | The beauty of the work lies in the way it arch... | computer-vision |
| 2 | 9 | 2020-02-24 16:24:45 | Awesome Graph Classification | A collection of important graph embedding, cla... | graph-learning |
| 3 | 15 | 2020-02-28 23:55:26 | Awesome Monte Carlo Tree Search | A curated list of Monte Carlo tree search papers... | reinforcement-learning |
| 4 | 19 | 2020-03-03 13:54:31 | Diffusion to Vector | Reference implementation of Diffusion2Vec (Com... | graph-learning |
### 鏈熸湜
鍦ㄥtransformers鏁版嵁搴旇鏄粈涔堟牱瀛愬缓绔嬫湡鏈涙椂锛岃鑰冭檻transformers鏁翠釜鏁版嵁闆嗗拰鍏朵腑鐨勬墍鏈夌壒寰侊紙鍒楋級銆?
`# Presence of specific features df.expect_table_columns_to_match_ordered_list( column_list=["id", "created_on", "title", "description", "tag"] )`
`# Unique combinations of features (detect data leaks!) df.expect_compound_columns_to_be_unique(column_list=["title", "description"])`
`# Missing values df.expect_column_values_to_not_be_null(column="tag")`
`# Unique values df.expect_column_values_to_be_unique(column="id")`
`# Type adherence df.expect_column_values_to_be_of_type(column="title", type_="str")`
`# List (categorical) / range (continuous) of allowed values tags = ["computer-vision", "graph-learning", "reinforcement-learning", "natural-language-processing", "mlops", "time-series"] df.expect_column_values_to_be_in_set(column="tag", value_set=tags)`
杩欎簺鏈熸湜涓殑姣忎竴涓兘浼氬垱寤轰竴涓緭鍑猴紝鍏朵腑鍖呭惈鏈夊叧鎴愬姛鎴栧け璐ャ€侀鏈熷拰瑙傚療鍒扮殑鍊笺€佹彁鍑虹殑鏈熸湜绛夎缁嗕俊鎭€備緥濡傦紝濡傛灉鎴愬姛锛屾湡鏈涘皢浜х敓浠ヤ笅鍐呭锛歚df.expect_column_values_to_be_of_type(column="title",聽type_="str")`
{
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"success": true,
"meta": {},
"expectation_config": {
"kwargs": {
"column": "title",
"type_": "str",
"result_format": "BASIC"
},
"meta": {},
"expectation_type": "_expect_column_values_to_be_of_type__map"
},
"result": {
"element_count": 955,
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"unexpected_percent_nonmissing": 0.0,
"partial_unexpected_list": []
}
}
濡傛灉鏈変竴涓け璐ョ殑鏈熸湜锛堜緥濡傦級锛屼細鏀跺埌杩欎釜杈撳嚭锛堟敞鎰忓鑷村け璐ョ殑鍘熷洜鐨勮鏁板拰绀轰緥锛夛細聽`df.expect_column_values_to_be_of_type(column="title",聽type_="int")`
{
"success": false,
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"expectation_config": {
"meta": {},
"kwargs": {
"column": "title",
"type_": "int",
"result_format": "BASIC"
},
"expectation_type": "_expect_column_values_to_be_of_type__map"
},
"result": {
"element_count": 955,
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_count": 955,
"unexpected_percent": 100.0,
"unexpected_percent_nonmissing": 100.0,
"partial_unexpected_list": [
"How to Deal with Files in Google Colab: What You Need to Know",
"Machine Learning Methods Explained (+ Examples)",
"OpenMMLab Computer Vision",
"...",
]
},
"meta": {}
}
鍙互鍒涢€犱竴浜涗笉鍚岀殑鏈熸湜銆備竴瀹氳鎺㈢储鎵€鏈夌殑[鏈熸湜](https://greatexpectations.io/expectations/)锛屽寘鎷琜鑷畾涔夋湡鏈沒(https://docs.greatexpectations.io/docs/guides/expectations/creating_custom_expectations/overview/)銆備互涓嬫槸涓€浜涗笌transformers鐗瑰畾鏁版嵁闆嗘棤鍏充絾骞挎硾閫傜敤鐨勫叾浠栨祦琛屾湡鏈涳細
- 鐗瑰緛鍊间笌鍏朵粬鐗瑰緛鍊肩殑鍏崇郴 鈫抈expect_column_pair_values_a_to_be_greater_than_b`
- 鏍锋湰鐨勮鏁帮紙绮剧‘鎴栬寖鍥达級鈫抈expect_table_row_count_to_be_between`
- 鏁板€肩粺璁★紙鍧囧€笺€佹爣鍑嗗樊銆佷腑浣嶆暟銆佹渶澶у€笺€佹渶灏忓€笺€佹€诲拰绛夛級鈫抈expect_column_mean_to_be_between`
### 缁勭粐
鍦ㄧ粍缁囨湡鏈涙椂锛屽缓璁粠琛ㄧ骇寮€濮嬶紝鐒跺悗杞埌鍚勪釜鍔熻兘鍒椼€?
#### Table expectations
```
# columns
df.expect_table_columns_to_match_ordered_list(
column_list=["id", "created_on", "title", "description", "tag"])
# data leak
df.expect_compound_columns_to_be_unique(column_list=["title", "description"])
```
#### Column鏈熸湜
```
# id
df.expect_column_values_to_be_unique(column="id")
# created_on
df.expect_column_values_to_not_be_null(column="created_on")
df.expect_column_values_to_match_strftime_format(
column="created_on", strftime_format="%Y-%m-%d %H:%M:%S")
# title
df.expect_column_values_to_not_be_null(column="title")
df.expect_column_values_to_be_of_type(column="title", type_="str")
# description
df.expect_column_values_to_not_be_null(column="description")
df.expect_column_values_to_be_of_type(column="description", type_="str")
# tag
df.expect_column_values_to_not_be_null(column="tag")
df.expect_column_values_to_be_of_type(column="tag", type_="str")
```
鍙互灏嗘墍鏈夋湡鏈涚粍鍚堝湪涓€璧蜂互鍒涘缓涓€涓猍Expectation Suite](https://docs.greatexpectations.io/en/latest/reference/core_concepts/expectations/expectations.html#expectation-suites)瀵硅薄锛屽彲浠ヤ娇鐢ㄥ畠鏉ラ獙璇佷换浣曟暟鎹泦妯″潡銆?
```
# Expectation suite
expectation_suite = df.get_expectation_suite(discard_failed_expectations=False)
print(df.validate(expectation_suite=expectation_suite, only_return_failures=True))
```
```
{
"success": true,
"results": [],
"statistics": {
"evaluated_expectations": 11,
"successful_expectations": 11,
"unsuccessful_expectations": 0,
"success_percent": 100.0
},
"evaluation_parameters": {}
}
```
### 椤圭洰
鍒扮洰鍓嶄负姝紝宸茬粡鍦ㄤ复鏃惰剼鏈?note绾у埆浣跨敤浜?Great Expectations 搴擄紝浣嗗彲浠ラ€氳繃鍒涘缓涓€涓」鐩潵杩涗竴姝ョ粍缁噒ransformers鏈熸湜銆?
```
cd tests
great_expectations init
```
杩欏皢寤虹珛涓€涓猔tests/great_expectations`鍏锋湁浠ヤ笅缁撴瀯鐨勭洰褰曪細
```
tests/great_expectations/
鈹溾攢鈹€ checkpoints/
鈹溾攢鈹€ expectations/
鈹溾攢鈹€ plugins/
鈹溾攢鈹€ uncommitted/
鈹溾攢鈹€ .gitignore
鈹斺攢鈹€ great_expectations.yml
```
#### 鏁版嵁婧?
绗竴姝ユ槸寤虹珛transformers`datasource`锛屽憡璇?Great Expectations transformers鏁版嵁鍦ㄥ摢閲岋細
```
great_expectations datasource new
```
```
What data would you like Great Expectations to connect to?
1. Files on a filesystem (for processing with Pandas or Spark) 馃憟
2. Relational database (SQL)
```
```
What are you processing your files with?
1. Pandas 馃憟
2. PySpark
```
```
Enter the path of the root directory where the data files are stored: ../data
```
#### Suites
鎵嬪姩銆佷氦浜掓垨鑷姩鍒涘缓鏈熸湜骞跺皢瀹冧滑淇濆瓨涓簊uite锛堝鐗瑰畾鏁版嵁assert鐨勪竴缁勬湡鏈涳級銆?
```
great_expectations suite new
```
```
How would you like to create your Expectation Suite?
1. Manually, without interacting with a sample batch of data (default)
2. Interactively, with a sample batch of data 馃憟
3. Automatically, using a profiler
```
```
Which data asset (accessible by data connector "default_inferred_data_connector_name") would you like to use?
1. labeled_projects.csv
2. projects.csv 馃憟
3. tags.csv
```
```
Name the new Expectation Suite [projects.csv.warning]: projects
```
杩欏皢鎵撳紑涓€涓氦浜掑紡note锛屽彲浠ュ湪鍏朵腑娣诲姞鏈熸湜銆傚鍒跺苟绮樿创涓嬮潰鐨勬湡鏈涘苟杩愯鎵€鏈夊崟鍏冩牸銆俙tags.csv`瀵瑰拰閲嶅姝ゆ楠labeled_projects.csv`銆?
![瀵勪簣鍘氭湜鐨勫鎴縘(https://upload-images.jianshu.io/upload_images/27840083-7f66bc4773236bf1.png)
> Expectations for聽`projects.csv`
>
> Table expectations
>
> ```
> # Presence of features
> validator.expect_table_columns_to_match_ordered_list(
> column_list=["id", "created_on", "title", "description"])
> validator.expect_compound_columns_to_be_unique(column_list=["title", "description"]) # data leak
>
> ```
> Column expectations:
>
> ```
> # id
> validator.expect_column_values_to_be_unique(column="id")
>
> # create_on
> validator.expect_column_values_to_not_be_null(column="created_on")
> validator.expect_column_values_to_match_strftime_format(
> column="created_on", strftime_format="%Y-%m-%d %H:%M:%S")
>
> # title
> validator.expect_column_values_to_not_be_null(column="title")
> validator.expect_column_values_to_be_of_type(column="title", type_="str")
>
> # description
> validator.expect_column_values_to_not_be_null(column="description")
> validator.expect_column_values_to_be_of_type(column="description", type_="str")
>
> ```
> Expectations for聽`tags.csv`
>
> Table expectations
>
> ```
> # Presence of features
> validator.expect_table_columns_to_match_ordered_list(column_list=["id", "tag"])
>
> ```
> Column expectations:
>
> ```
> # id
> validator.expect_column_values_to_be_unique(column="id")
>
> # tag
> validator.expect_column_values_to_not_be_null(column="tag")
> validator.expect_column_values_to_be_of_type(column="tag", type_="str")
>
> ```
> Expectations for聽`labeled_projects.csv`
>
> Table expectations
>
> ```
> # Presence of features
> validator.expect_table_columns_to_match_ordered_list(
> column_list=["id", "created_on", "title", "description", "tag"])
> validator.expect_compound_columns_to_be_unique(column_list=["title", "description"]) # data leak
>
> ```
> Column expectations:
>
> ```
> # id
> validator.expect_column_values_to_be_unique(column="id")
>
> # create_on
> validator.expect_column_values_to_not_be_null(column="created_on")
> validator.expect_column_values_to_match_strftime_format(
> column="created_on", strftime_format="%Y-%m-%d %H:%M:%S")
>
> # title
> validator.expect_column_values_to_not_be_null(column="title")
> validator.expect_column_values_to_be_of_type(column="title", type_="str")
>
> # description
> validator.expect_column_values_to_not_be_null(column="description")
> validator.expect_column_values_to_be_of_type(column="description", type_="str")
>
> # tag
> validator.expect_column_values_to_not_be_null(column="tag")
> validator.expect_column_values_to_be_of_type(column="tag", type_="str")
>
> ```
鎵€鏈夎繖浜涙湡鏈涢兘淇濆瓨鍦╜great_expectations/expectations`锛?
```
great_expectations/
鈹溾攢鈹€ expectations/
鈹? 鈹溾攢鈹€ labeled_projects.csv
鈹? 鈹溾攢鈹€ projects.csv
鈹? 鈹斺攢鈹€ tags.csv
```
杩樺彲浠ュ垪鍑簊uite锛?
`great_expectations suite list`
```
Using v3 (Batch Request) API
3 Expectation Suites found:
- labeled_projects
- projects
- tags
```
瑕佺紪杈憇uite锛屽彲浠ユ墽琛屼互涓?CLI 鍛戒护锛?
`great_expectations suite edit `
#### 妫€鏌ョ偣
鍒涘缓妫€鏌ョ偣锛屽叾涓皢涓€缁勬湡鏈涘簲鐢ㄤ簬鐗瑰畾鏁版嵁assert銆傝繖鏄竴绉嶄互缂栫▼鏂瑰紡鍦ㄧ幇鏈夌殑鍜屾柊鐨勬暟鎹簮涓婂簲鐢ㄦ鏌ョ偣鐨勫ソ鏂规硶銆?
`cd tests great_expectations checkpoint new CHECKPOINT_NAME`
鎵€浠ュ浜巘ransformers椤圭洰锛屽畠灏嗘槸锛?
```
great_expectations checkpoint new projects
great_expectations checkpoint new tags
great_expectations checkpoint new labeled_projects
```
杩欎簺妫€鏌ョ偣鍒涘缓璋冪敤涓殑姣忎竴涓兘灏嗗惎鍔ㄤ竴涓猲ote锛屽彲浠ュ湪鍏朵腑瀹氫箟瑕佸皢姝ゆ鏌ョ偣搴旂敤浜庡摢浜泂uite銆傚繀椤绘洿鏀筦data_asset_name`锛堣繍琛屾鏌ョ偣suite鐨勬暟鎹產ssert锛夊拰`expectation_suite_name`锛堣浣跨敤鐨剆uite鐨勫悕绉帮級鐨勮銆備緥濡傦紝`projects`妫€鏌ョ偣灏嗕娇鐢╜projects.csv`鏁版嵁assert鍜宍projects`suite銆?
> 鍙鏋舵瀯鍜岄獙璇侀€傜敤锛屾鏌ョ偣灏卞彲浠ュ叡浜悓涓€涓猻uite銆?
```
my_checkpoint_name = "projects" # This was populated from your CLI command.
yaml_config = f"""
name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
- batch_request:
datasource_name: local_data
data_connector_name: default_inferred_data_connector_name
data_asset_name: projects.csv
data_connector_query:
index: -1
expectation_suite_name: projects
"""
print(yaml_config)
```
> 楠岃瘉鑷姩濉厖
>
> 涓€瀹氳纭繚`datasource_name`,`data_asset_name`鍜宍expectation_suite_name`閮芥槸甯屾湜瀹冧滑鎴愪负鐨勬牱瀛愶紙Great Expectations 鑷姩濉厖閭d簺鍙兘骞朵笉鎬绘槸鍑嗙‘鐨勫亣璁撅級銆?
`tags`瀵瑰拰妫€鏌ョ偣閲嶅杩欎簺鐩稿悓鐨勬楠わ紝`labeled_projects`鐒跺悗灏卞彲浠ユ墽琛屽畠浠簡锛?
```
great_expectations checkpoint run projects
great_expectations checkpoint run tags
great_expectations checkpoint run labeled_projects
```
![瀵勪簣鍘氭湜鐨勬鏌ョ珯](https://upload-images.jianshu.io/upload_images/27840083-14165996614fed0a.png)
鍦ㄦ湰璇剧粨鏉熸椂锛屽皢鍦╰ransformers鐩爣涓垱寤轰竴涓猔Makefile`杩愯鎵€鏈夎繖浜涙祴璇曪紙浠g爜銆佹暟鎹拰妯″瀷锛夌殑鐩爣锛屽苟涓斿皢鍦╰ransformers[棰勬彁浜よ绋媇(https://franztao.github.io/2022/10/26/Pre_commit/)涓嚜鍔ㄦ墽琛屽畠浠€?
> note
>
> 宸茬粡瀵箃ransformers婧愭暟鎹泦搴旂敤浜嗛鏈燂紝浣嗚繕鏈夎澶氬叾浠栧叧閿鍩熼渶瑕佹祴璇曟暟鎹€備緥濡傦紝娓呮礂銆佹墿鍏呫€佹媶鍒嗐€侀澶勭悊銆佹爣璁板寲绛夎繃绋嬬殑涓棿杈撳嚭銆?
### 鏂囨。
褰撲娇鐢?CLI 搴旂敤绋嬪簭鍒涘缓鏈熸湜鏃讹紝Great Expectations 浼氳嚜鍔ㄤ负transformers娴嬭瘯鐢熸垚鏂囨。銆傚畠杩樺瓨鍌ㄦ湁鍏抽獙璇佽繍琛屽強鍏剁粨鏋滅殑淇℃伅銆傚彲浠ヤ娇鐢ㄤ互涓嬪懡浠ゅ惎鍔ㄧ敓鎴愭暟鎹枃妗o細`great_expectations docs build`
![鏁版嵁鏂囨。](https://upload-images.jianshu.io/upload_images/27840083-47f1aa6977b84948.png)
> 榛樿鎯呭喌涓嬶紝Great Expectations 鍦ㄦ湰鍦板瓨鍌╰ransformers鏈熸湜銆佺粨鏋滃拰鎸囨爣锛屼絾瀵逛簬鐢熶骇锛岄渶瑕佽缃繙绋媅鍏冩暟鎹瓨鍌╙(https://docs.greatexpectations.io/docs/guides/setup/#metadata-stores)銆?
### 鐢熶骇
涓庡绔嬬殑 assert 璇彞鐩告瘮锛屼娇鐢ㄨ濡?great expectations 涔嬬被鐨勫簱鐨勪紭鍔垮湪浜庡彲浠ワ細
- 鍑忓皯璺ㄦ暟鎹ā寮忓垱寤烘祴璇曠殑鍐椾綑宸ヤ綔
- 鑷姩鍒涘缓娴嬭瘯[妫€鏌ョ偣](https://franztao.github.io/2022/10/01/Testing/#checkpoints)浠ラ殢鐫€transformers鏁版嵁闆嗗闀胯€屾墽琛?
- 鑷姩鐢熸垚鍏充簬鏈熸湜鐨刐鏂囨。鍜岃繍琛屾姤鍛奭(https://franztao.github.io/2022/10/01/Testing/#documentation)
- 杞绘澗杩炴帴鍚庣鏁版嵁婧愶紝濡傛湰鍦版枃浠剁郴缁熴€丼3銆佹暟鎹簱绛夈€?
[鍦╰ransformersDataOps 宸ヤ綔娴乚(https://franztao.github.io/2022/11/10/Orchestration/#dataops)涓彁鍙栥€佸姞杞藉拰杞崲鏁版嵁鏃讹紝灏嗘墽琛屽叾涓澶氭湡鏈涖€傞€氬父锛屾暟鎹皢浠庢簮锛圼鏁版嵁搴揮(https://franztao.github.io/2022/11/10/Data_stack/#database)銆乕API](https://franztao.github.io/2022/10/01/RESTful_API/)绛夛級涓彁鍙栧苟鍔犺浇鍒版暟鎹郴缁燂紙渚嬪[鏁版嵁浠撳簱](https://franztao.github.io/2022/11/10/Data_stack/#data-warehouse)锛変腑锛岀劧鍚庡湪閭i噷杩涜杞崲锛堜緥濡備娇鐢╗dbt](https://www.getdbt.com/)锛変互渚涗笅娓稿簲鐢ㄧ▼搴忎娇鐢ㄣ€傚湪杩欎簺浠诲姟涓紝鍙互杩愯 Great Expectations 妫€鏌ョ偣楠岃瘉浠ョ‘淇濇暟鎹殑鏈夋晥鎬у拰搴旂敤浜庢暟鎹殑鏇存敼銆俒灏嗗湪缂栨帓璇剧▼](https://franztao.github.io/2022/11/10/Orchestration/#dataops)涓湅鍒颁竴涓畝鍖栫増鏈殑鏁版嵁楠岃瘉浣曟椂搴旇鍦╰ransformers鏁版嵁宸ヤ綔娴佷腑杩涜銆?
![鐢熶骇涓殑 ELT 娴佹按绾縘(https://upload-images.jianshu.io/upload_images/27840083-ab3a8c04aa359258.png)
> 濡傛灉鎮ㄤ笉鐔熸倝涓嶅悓鐨勬暟鎹郴缁燂紝璇峰湪transformers[鏁版嵁鍫嗘爤璇剧▼](https://franztao.github.io/2022/11/10/Data_stack/)涓簡瑙f洿澶氫俊鎭€?
## model
娴嬭瘯 ML 绯荤粺鐨勬渶鍚庝竴涓柟闈㈡秹鍙婂湪璁粌銆佽瘎浼般€佹帹鐞嗗拰閮ㄧ讲鏈熼棿娴嬭瘯妯″瀷銆?
### 璁粌
甯屾湜鍦ㄥ紑鍙戣缁冪閬撴椂杩唬鍦扮紪鍐欐祴璇曪紝浠ヤ究鍙互蹇€熷彂鐜伴敊璇€傝繖涓€鐐瑰挨涓洪噸瑕侊紝鍥犱负涓庝紶缁熻蒋浠朵笉鍚岋紝ML 绯荤粺鍙互杩愯瀹屾垚鑰屼笉浼氬紩鍙戜换浣曞紓甯?閿欒锛屼絾鍙兘浼氫骇鐢熶笉姝g‘鐨勭郴缁熴€傝繕甯屾湜蹇€熸崟鑾烽敊璇互鑺傜渷鏃堕棿鍜岃绠椼€?
- 妫€鏌ユā鍨嬭緭鍑虹殑褰㈢姸鍜屽€?
```
assert model(inputs).shape == torch.Size([len(inputs), num_classes])
```
- 鍦ㄤ竴鎵硅缁冨悗妫€鏌ユ崯澶辨槸鍚﹀噺灏?
```
assert epoch_loss < prev_epoch_loss
```
- 鎵归噺杩囨嫙鍚?
```
accuracy = train(model, inputs=batches[0])
assert accuracy == pytest.approx(0.95, abs=0.05) # 0.95 卤 0.05
```
- 璁粌瀹屾垚锛堟祴璇曟彁鍓嶅仠姝€佷繚瀛樼瓑锛?
```
train(model)
assert learning_rate >= min_learning_rate
assert artifacts
```
- 鍦ㄤ笉鍚岀殑璁惧涓?
```
assert train(model, device=torch.device("cpu"))
assert train(model, device=torch.device("cuda"))
```
> note
>
> 鎮ㄥ彲浠ヤ娇鐢?pytest 鏍囪鏍囪璁$畻瀵嗛泦鍨嬫祴璇曪紝骞朵笖浠呭湪瀵瑰奖鍝嶆ā鍨嬬殑绯荤粺杩涜鏇存敼鏃舵墠鎵ц瀹冧滑銆?
>
> ```
> @pytest.mark.training
> def test_train_model():
> ...
>
> ```
### 琛屼负娴嬭瘯
琛屼负娴嬭瘯鏄祴璇曡緭鍏ユ暟鎹拰棰勬湡杈撳嚭鐨勮繃绋嬶紝鍚屾椂灏嗘ā鍨嬭涓洪粦鐩掞紙涓庢ā鍨嬫棤鍏崇殑璇勪及锛夈€傚畠浠笉涓€瀹氬湪鏈川涓婃槸瀵规姉鎬х殑锛屼絾鏇村鐨勬槸鍦ㄩ儴缃叉ā鍨嬪悗鍙兘鏈熸湜鍦ㄧ幇瀹炰笘鐣屼腑鐪嬪埌鐨勬壈鍔ㄧ被鍨嬨€傚叧浜庤繖涓富棰樼殑鍏锋湁閲岀▼纰戞剰涔夌殑璁烘枃鏄痆Beyond Accuracy: Behavioral Testing of NLP Models with CheckList](https://arxiv.org/abs/2005.04118)锛屽畠灏嗚涓烘祴璇曞垎涓轰笁绉嶇被鍨嬬殑娴嬭瘯锛?
- `invariance`锛氭洿鏀逛笉搴斿奖鍝嶈緭鍑恒€?
```
# INVariance via verb injection (changes should not affect outputs)
tokens = ["revolutionized", "disrupted"]
texts = [f"Transformers applied to NLP have {token} the ML field." for token in tokens]
predict.predict(texts=texts, artifacts=artifacts)
```
```
['natural-language-processing', 'natural-language-processing']
```
- `directional`锛氬彉鍖栧簲璇ヤ細褰卞搷浜у嚭銆?
```
# DIRectional expectations (changes with known outputs)
tokens = ["text classification", "image classification"]
texts = [f"ML applied to {token}." for token in tokens]
predict.predict(texts=texts, artifacts=artifacts)
```
```
['natural-language-processing', 'computer-vision']
```
- `minimum functionality`锛氳緭鍏ュ拰棰勬湡杈撳嚭鐨勭畝鍗曠粍鍚堛€?
```
# Minimum Functionality Tests (simple input/output pairs)
tokens = ["natural language processing", "mlops"]
texts = [f"{token} is the next big wave in machine learning." for token in tokens]
predict.predict(texts=texts, artifacts=artifacts)
```
```
['natural-language-processing', 'mlops']
```
> 瀵规姉鎬ф祴璇?
>
> 杩欎簺绫诲瀷鐨勬祴璇曚腑鐨勬瘡涓€绉嶈繕鍙互鍖呮嫭瀵规姉鎬ф祴璇曪紝渚嬪浣跨敤甯歌鐨勬湁鍋忚鐨勪护鐗屾垨鍢堟潅鐨勪护鐗岃繘琛屾祴璇曠瓑銆?
>
> ```
> texts = [
> "CNNs for text classification.", # CNNs are typically seen in computer-vision projects
> "This should not produce any relevant topics." # should predict `other` label
> ]
> predict.predict(texts=texts, artifacts=artifacts)
>
> ```
鍙互灏嗚繖浜涙祴璇曡浆鎹负绯荤粺鐨勫弬鏁板寲娴嬭瘯锛?
```
mkdir tests/model
touch tests/model/test_behavioral.py
```
```
# tests/model/test_behavioral.py
from pathlib import Path
import pytest
from config import config
from tagifai import main, predict
@pytest.fixture(scope="module")
def artifacts():
run_id = open(Path(config.CONFIG_DIR, "run_id.txt")).read()
artifacts = main.load_artifacts(run_id=run_id)
return artifacts
@pytest.mark.parametrize(
"text_a, text_b, tag",
[
(
"Transformers applied to NLP have revolutionized machine learning.",
"Transformers applied to NLP have disrupted machine learning.",
"natural-language-processing",
),
],
)
def test_inv(text_a, text_b, tag, artifacts):
"""INVariance via verb injection (changes should not affect outputs)."""
tag_a = predict.predict(texts=[text_a], artifacts=artifacts)[0]["predicted_tag"]
tag_b = predict.predict(texts=[text_b], artifacts=artifacts)[0]["predicted_tag"]
assert tag_a == tag_b == tag
```
鏌ョ湅`tests/model/test_behavioral.py`
```
from pathlib import Path
import pytest
from config import config
from tagifai import main, predict
@pytest.fixture(scope="module")
def artifacts():
run_id = open(Path(config.CONFIG_DIR, "run_id.txt")).read()
artifacts = main.load_artifacts(run_id=run_id)
return artifacts
@pytest.mark.parametrize(
"text, tag",
[
(
"Transformers applied to NLP have revolutionized machine learning.",
"natural-language-processing",
),
(
"Transformers applied to NLP have disrupted machine learning.",
"natural-language-processing",
),
],
)
def test_inv(text, tag, artifacts):
"""INVariance via verb injection (changes should not affect outputs)."""
predicted_tag = predict.predict(texts=[text], artifacts=artifacts)[0]["predicted_tag"]
assert tag == predicted_tag
@pytest.mark.parametrize(
"text, tag",
[
(
"ML applied to text classification.",
"natural-language-processing",
),
(
"ML applied to image classification.",
"computer-vision",
),
(
"CNNs for text classification.",
"natural-language-processing",
)
],
)
def test_dir(text, tag, artifacts):
"""DIRectional expectations (changes with known outputs)."""
predicted_tag = predict.predict(texts=[text], artifacts=artifacts)[0]["predicted_tag"]
assert tag == predicted_tag
@pytest.mark.parametrize(
"text, tag",
[
(
"Natural language processing is the next big wave in machine learning.",
"natural-language-processing",
),
(
"MLOps is the next big wave in machine learning.",
"mlops",
),
(
"This should not produce any relevant topics.",
"other",
),
],
)
def test_mft(text, tag, artifacts):
"""Minimum Functionality Tests (simple input/output pairs)."""
predicted_tag = predict.predict(texts=[text], artifacts=artifacts)[0]["predicted_tag"]
assert tag == predicted_tag
```
### 鎺ㄧ悊
閮ㄧ讲妯″瀷鍚庯紝澶у鏁扮敤鎴峰皢浣跨敤瀹冭繘琛屾帹鐞嗭紙鐩存帴/闂存帴锛夛紝鍥犳娴嬭瘯瀹冪殑鍚勪釜鏂归潰闈炲父閲嶈銆?
#### 鍔犺浇artifacts
杩欐槸绗竴娆′笉浠庡唴瀛樹腑鍔犺浇缁勪欢锛屽洜姝ゅ笇鏈涚‘淇濇墍闇€鐨勫伐浠讹紙妯″瀷鏉冮噸銆佺紪鐮佸櫒銆侀厤缃瓑锛夐兘鑳藉琚姞杞姐€?
```
artifacts = main.load_artifacts(run_id=run_id)
assert isinstance(artifacts["label_encoder"], data.LabelEncoder)
...
```
#### 棰勮█
涓€鏃﹀姞杞戒簡宸ヤ欢锛屽氨鍑嗗濂芥祴璇曢娴嬬閬撱€傚簲璇ュ彧鐢ㄤ竴涓緭鍏ュ拰涓€鎵硅緭鍏ユ潵娴嬭瘯鏍锋湰锛堜緥濡傦紝濉厖鏈夋椂浼氫骇鐢熸剰鎯充笉鍒扮殑鍚庢灉锛夈€?
```
# test our API call directly
data = {
"texts": [
{"text": "Transfer learning with transformers for text classification."},
{"text": "Generative adversarial networks in both PyTorch and TensorFlow."},
]
}
response = client.post("/predict", json=data)
assert response.status_code == HTTPStatus.OK
assert response.request.method == "POST"
assert len(response.json()["data"]["predictions"]) == len(data["texts"])
...
```
## 鐢熸垚鏂囦欢
璁╁湪鍏朵腑鍒涘缓涓€涓洰鏍囷紝`Makefile`杩欏皢鍏佽涓€娆¤皟鐢ㄦ墽琛屾墍鏈夋祴璇曪細
```
# Test
.PHONY: test
test:
pytest -m "not training"
cd tests && great_expectations checkpoint run projects
cd tests && great_expectations checkpoint run tags
cd tests && great_expectations checkpoint run labeled_projects
```
```
make test
```
## 娴嬭瘯涓庣洃鎺?
鏈€鍚庯紝灏嗚璁烘祴璇曞拰[鐩戞帶](https://franztao.github.io/2022/10/01/Testing//../monitoring/)涔嬮棿鐨勭浉浼肩偣鍜屽尯鍒€傚畠浠兘鏄?ML 寮€鍙戠閬撶殑缁勬垚閮ㄥ垎锛屽苟涓旂浉浜掍緷璧栦互杩涜杩唬銆傛祴璇曞彲纭繚绯荤粺锛堜唬鐮併€佹暟鎹拰妯″瀷锛夎揪鍒板湪绂荤嚎鏃跺缓绔嬬殑棰勬湡銆傞壌浜庣洃鎺ф秹鍙婅繖浜涙湡鏈涚户缁湪绾夸紶閫掑疄鏃剁敓浜ф暟鎹紝鍚屾椂杩橀€氳繃浠ヤ笅鏂瑰紡纭繚鍏舵暟鎹垎甯僛涓嶿(https://franztao.github.io/2022/10/01/Testing//../monitoring/#measuring-drift)鍙傝€冪獥鍙o紙閫氬父鏄缁冩暟鎹殑瀛愰泦锛夊叿鏈夊彲姣旀€у惃n. 褰撹繖浜涙潯浠朵笉鍐嶆垚绔嬫椂锛岄渶瑕佹洿浠旂粏鍦版鏌ワ紙鍐嶅煿璁彲鑳藉苟涓嶆€昏兘瑙e喅鏍规湰闂锛夈€?
瀵逛簬[鐩戞帶](https://franztao.github.io/2022/10/01/Testing//../monitoring/)锛屽湪娴嬭瘯鏈熼棿涓嶅繀鑰冭檻寰堝涓嶅悓鐨勯棶棰橈紝鍥犱负瀹冩秹鍙婂皻鏈湅鍒扮殑锛堝疄鏃讹級鏁版嵁銆?
- 鐗瑰緛鍜岄娴嬪垎甯冿紙婕傜Щ锛夈€佺被鍨嬨€佹ā寮忎笉鍖归厤绛夈€?
- 浣跨敤闂存帴淇″彿锛堝洜涓烘爣绛惧彲鑳戒笉瀹规槗鑾峰緱锛夌‘瀹氭ā鍨嬫€ц兘锛堟暣浣撳拰鏁版嵁鍒囩墖鐨勬粴鍔ㄥ拰绐楀彛搴﹂噺锛夈€?
- 鍦ㄥぇ鏁版嵁鐨勬儏鍐典笅锛岄渶瑕佺煡閬撹鏍囪鍝簺鏁版嵁鐐瑰苟杩涜涓婇噰鏍蜂互杩涜璁粌銆?
- 璇嗗埆寮傚父鍜屽紓甯稿€笺€?
> [灏嗗湪鐩戞帶](https://franztao.github.io/2022/10/01/Testing//../monitoring/)璇剧▼涓洿娣卞叆鍦帮紙鍜屼唬鐮侊級浠嬬粛鎵€鏈夎繖浜涙蹇点€?
## 璧勬簮
- [Great Expectations](https://github.com/great-expectations/great_expectations)
- [The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf)
- [Beyond Accuracy: Behavioral Testing of NLP Models with CheckList](https://arxiv.org/abs/2005.04118)
- [Robustness Gym: Unifying the NLP Evaluation Landscape](https://arxiv.org/abs/2101.04840)
鏇村骞茶揣锛岀涓€鏃堕棿鏇存柊鍦ㄤ互涓嬪井淇″叕浼楀彿锛?
![](https://raw.githubusercontent.com/franztao/blog_picture/main/marktext/2022-12-03-12-49-27-weixin.png)
鎮ㄧ殑涓€鐐圭偣鏀寔锛屾槸鎴戝悗缁洿澶氱殑鍒涢€犲拰璐$尞
![](https://upload-images.jianshu.io/upload_images/27840083-e458640766afb594.png)
杞浇鍒拌鍖呮嫭鏈枃鍦板潃
鏇磋缁嗙殑杞浇浜嬪疁璇峰弬鑰僛鏂囩珷濡備綍杞浇/寮曠敤](https://franztao.github.io/2022/12/04/%E6%96%87%E7%AB%A0%E5%A6%82%E4%BD%95%E8%BD%AC%E8%BD%BD%E5%92%8C%E5%BC%95%E7%94%A8/)
鏈枃涓讳綋婧愯嚜浠ヤ笅閾炬帴锛?
```
@article{madewithml,
author = {Goku Mohandas},
title = { Made With ML },
howpublished = {\url{https://madewithml.com/}},
year = {2022}
}
```
鏈枃鐢盵mdnice](https://mdnice.com/?platform=6)澶氬钩鍙板彂甯?