缈昏瘧鑷狧ow does Stable Diffusion work
1. SD鑳藉仛浠€涔?/h2>
鏈€鍩烘湰鐨勫姛鑳芥槸锛氭枃鐢熷浘锛坱ext-to-image锛?/p>
2. 鎵╂暎妯″瀷
SD灞炰簬鎵╂暎妯″瀷銆傛墿鏁fā鍨嬫槸涓€绫荤敓鎴愬紡妯″瀷锛屽畠浠璁捐鐢ㄤ簬鐢熸垚涓庤缁冩暟鎹被浼肩殑鏂版暟鎹€?/p>
鎵╂暎妯″瀷涓轰粈涔堝彨鈥滄墿鏁b€濇ā鍨嬪憿锛熷洜涓烘ā鍨嬬敓鎴愭暟鎹殑杩囩▼绫讳技鐗╃悊涓婄殑鎵╂暎銆備笅闈互璁粌鐢熸垚馃惐馃悤鍥剧墖鐨勬墿鏁fā鍨嬩负渚?/strong>锛屼粙缁嶈缁冭繃绋嬨€?/p>
姝e悜鎵╂暎锛團orward Diffusion锛?/strong>鏄煇?馃悤鍥剧墖杞负鏃犵壒鐐圭殑鍣0鍥剧殑杩囩▼锛?strong>浠庡櫔澹板浘鏄湅涓嶅嚭鏈€鍒濇槸馃惐杩樻槸馃悤鐨?/strong>銆?br>
鍍忎竴婊村ⅷ鎺夎繘涓€鏉按锛屼笉涓€浼氬効灏变細鎱㈡參鎵╂暎锛屽皢鑷繁闅忔満鍒嗗竷鍒版按涓紝灏遍毦浠ヨ鲸璁ゅ嚭鏈€鍒濆ⅷ婊村湪杈圭紭銆佷腑蹇冭繕鏄埆鐨勫摢閲屼簡銆?br>
鑰?strong>閫嗗悜鎵╂暎锛圧everse Diffusion锛?/strong>鍒欎粠鏃犳剰涔夌殑鍣0鍥句腑澶嶅師鍑吼煇辨垨馃悤锛屾槸姝e悜鎵╂暎鐨勨€滃€掓斁鈥濄€?br>
鎶€鏈笂锛屾瘡娆℃墿鏁e寘鎷袱姝ワ細婕傜Щ锛坉rift锛?/strong>鍜?strong>闅忔満杩愬姩锛坮andom motion锛?/strong>銆傞€嗗悜鎵╂暎鐨勬紓绉昏涔堟湞鐫€澶嶅師鍑吼煇辩殑鏂瑰悜瑕佷箞鏈濈潃澶嶅師鍑吼煇曠殑鏂瑰悜锛岃€屼笉浼氭槸涓€涓腑闂村€笺€?/p>
閫嗗悜鎵╂暎杩欎釜idea鏃犵枒鏄仾鏄庝紭闆呯殑銆備絾浠峰€煎崈閲戠殑闅鹃鏄細瀹冩€庝箞瀹炵幇锛?/p>
涓轰簡灏嗘墿鏁h繃绋嬮€嗚繃鏉ワ紝鎴戜滑闇€瑕佺煡閬?strong>涓€寮犲浘琚坊鍔犱簡澶氬皯鍣0 缁忚繃璁粌锛屾垜浠緱鍒颁簡鑳介娴嬪浘涓坊鍔犲櫔澹伴噺澶氬皯鐨刵oise predictor銆?/p>
鏈変簡noise predictor锛屾垜浠敤瀹冨疄鐜伴€嗗悜鎵╂暎銆?/p>
閲嶅1.鍜?.鍑犳锛屽氨鑾峰緱馃惐鎴栶煇曠殑鍥惧儚銆?br>
鎴戜滑鏆傛椂杩樹笉鑳芥帶鍒跺鍘熷嚭鐨勫浘鐗囨槸馃惐杩樻槸馃悤锛屾帶鍒惰緭鍑哄彲浠ラ€氳繃鍔犳潯浠舵潵瀹炵幇銆?/p>
鐜板湪鏈変竴涓潖娑堟伅銆傚垰鍒氭墍璇寸殑涓嶆槸SD鐨勫伐浣滄柟寮忋€傛墿鏁h繃绋嬫槸鍦?strong>鍥惧儚绌洪棿杩涜鐨勩€傝繖鍦ㄨ绠椾笂鎱㈠埌闅句互鎺ュ彈锛岃€屼笖鍦ㄤ换浣曞崟涓殑GPU涓婇兘璺戜笉浜嗐€?12*512鐨凴GB鍥惧湪786,432缁寸殑绌洪棿銆傛垜浠渶瑕佷负涓€寮犲浘鎸囧畾鐨勫€煎お澶お澶氥€?br>
Google鐨処magen鍜孫pen AI鐨凞ALL-E鏄?strong>鍍忕礌绾?/strong>鐨勬墿鏁fā鍨嬶紝瀹冧滑鏈変竴浜涘姞閫熺殑鎶€宸т絾杩樹笉澶熴€?br>
鑰孲table Diffusion姝f槸涓鸿В鍐冲浘鍍忔墿鏁fā鍨嬬殑閫熷害闅鹃鑰岃璁$殑銆?/p>
Stable Diffusion鏄竴涓〃寰佹墿鏁fā鍨嬨€傚畠棣栧厛鎶婂浘鍍忓帇缂╁埌琛ㄥ緛绌洪棿锛屼互閬垮厤鍦ㄩ珮缁寸殑鍥惧儚绌洪棿杩涜鎿嶄綔銆傝繖灏卞揩澶氫簡銆?/p>
鍥惧儚鍒拌〃寰?/em>鍜?em>琛ㄥ緛鍒板浘鍍?/em>鐨勮浆鎹㈡槸閫氳繃VAE锛圴ariational Autoencoder锛夋潵瀹炵幇鐨勩€?br>
VAE鍖呮嫭encoder鍜宒ecoder涓ら儴鍒嗐€?br>
encoder灏嗗浘鐗囧帇缂╀负杈冧綆缁村害鐨勮〃寰侊紝decoder浠庤〃寰佷腑澶嶅師鍥剧墖銆?br>
Stable Diffusion涓512*512鐨凴GB鍥惧儚鐨勮〃寰佹槸 鍥惧儚鍒嗚鲸鐜囧湪鍥惧儚琛ㄥ緛鐨剆hape涓婃湁浣撶幇锛岃繖涔熸槸鐢熸垚澶у浘浼氳€楄垂鏇村鏄惧瓨鍜屾椂闂寸殑鍘熷洜銆傝〃寰佺殑澶у皬鏄浘鐗囧ぇ灏忕殑1/48锛?68*512鐨凴GB鍥惧儚鐨勮〃寰佹槸 VAE file鐢ㄤ簬Stable Diffusion v1涓紝浠ユ敼鍠勭溂鐫涘拰闈㈤儴銆傚畠浠槸缁忚繃杩涗竴姝ュ井璋冪殑VAE decoder锛岃妯″瀷鍙互缁樺埗鏇寸簿缁嗙殑缁嗚妭銆?br>
锛堜箣鍓嶆彁鍒扮殑鍋囪骞堕潪瀹屽叏姝g‘銆傚皢鍥惧儚鍘嬪叆娼滃湪绌洪棿纭疄浼氫涪澶变俊鎭紝鍥犱负鍘熷VAE娌℃湁鎭㈠缁嗚妭銆傚彇鑰屼唬涔嬬殑鏄紝VAE file涓殑decoder璐熻矗缁樺埗绮剧編鐨勭粏鑺傘€傦級 涓婅堪瀵筍table Diffusion鐨勭悊瑙h繕涓嶅畬鏁淬€傛€庝箞閫氳繃鏂囨湰鎻愮ず鏉ユ帶鍒跺嚭鍥惧憿锛?br>
杩欏氨闇€瑕丆onditioning锛屽叾鐩殑鏄?strong>寮曞noise predictor锛屼互渚块娴嬬殑鍣0浠庡浘鍍忎腑鍑忔帀鍚庤兘寰楀嚭鎴戜滑鎯宠鐨勪笢瑗裤€?/p>
涓嬪浘鏄枃鏈彁绀猴紙text prompt锛夎澶勭悊骞跺杺缁檔oise predictor鐨勮繃绋嬨€?strong>Tokenizer鍏堟妸鎻愮ず涓殑姣忎釜璇嶅垏鍒嗗嚭鏉ヤ綔涓簍oken銆傛瘡涓猼oken浼氳杞寲涓轰竴涓?68缁寸殑embedding銆傞殢鍚巈mbedding琚€佸叆2.1 姝e悜鎵╂暎
2.2 閫嗗悜鎵╂暎
3. 閫嗗悜鎵╂暎鎬庝箞瀹炵幇
3.1 璁粌noise predictor
3.2 閫嗗悜鎵╂暎姝ラ
4. Stable Diffusion
4.1 琛ㄥ緛锛圠atent锛夋墿鏁fā鍨?/h2>
4.2 鍥惧儚琛ㄥ緛鍜屽鍘?/h2>
4.3 SD鐨勮缁?/h2>
4*64*64
鐨勩€傚緱鍒拌〃寰佸悗锛屼箣鍓嶈鐨勬墿鏁h繃绋嬮兘鍦ㄨ〃寰佺┖闂磋繘琛屻€傛墍浠ヨ缁冩椂锛屾垜浠敓鎴愮殑鍣0涓嶆槸鍣0鍥捐€屾槸涓€涓〃寰佺┖闂寸殑闅忔満tensor锛岃繖涓猼ersor鍐嶅拰鍥惧儚鐨勮〃寰佽繘琛屼笉鍚岀▼搴︾殑鍙犲姞锛屼互璁粌noise predictor骞惰繘琛岄€嗗悜鎵╂暎锛堥€嗗悜鎵╂暎sampling鍜宻amplers: 璇﹁link锛夈€傞€嗗悜鎵╂暎姝ラ濡備笅锛?/p>
4.4 鍥惧儚鍒嗚鲸鐜?/h2>
4*96*64
鐨勶級銆?br>
StableDiffusion v1鏄湪512*512鐨勫浘鍍忎笂寰皟鐨勶紝鍥犳鐢熸垚姣?12*512鐨勬洿澶х殑鍥句細鍑虹幇閲嶅鐨勭墿浣擄紙姣斿锛屼袱涓剳琚嬶紙with solution锛?锛夈€?/p>
濡傛灉涓€瀹氳鐢熸垚澶у浘锛岄渶瑕佽嚦灏戜繚璇佸/楂樹腑鐨勪竴涓槸512锛屽啀鐢ˋI upscaler鎻愰珮鍒嗚鲸鐜囥€?/li>
4.5 VAE file
5. 鏉′欢鎺у埗锛圕onditioning锛?/h2>
5.1 鏂囨湰鏉′欢锛坱ext-to-image锛?/h2>