shap.DeepExplainer 사용 시 feature value가 gray 색상인 경우

Data science/각종 에러

shap.DeepExplainer 사용 시 feature value가 gray 색상인 경우

yunnaa 2023. 5. 18. 10:20

shap.DeepExplainer — SHAP latest documentation

A tuple of (row_values, row_expected_values, row_mask_shapes), where row_values is an array of the attribution values for each sample, row_expected_values is an array (or single value) representing the expected value of the model for each sample (which is

shap-lrjball.readthedocs.io

우선 위 documentation을 보면 shap.DeepExplainer의 경우, 다음과 같이 model에 따라 data의 타입을 다르게 줘야 한다.

내가 보고싶은 모델은 pytorch 기반의 MLP 네트워크이며 tensor 데이터인 test_loader를 사용했기 때문에 다음과 같이 shap_values를 구했다.

test_explainer = shap.DeepExplainer(model, data=test_loader.dataset.x_data.cuda())
shap_values_test = test_explainer.shap_values(test_loader.dataset.x_data.cuda())

그리고 beeswarm plot을 그리기 위해 다음과 같이 주었다.

이 때에는 tensor 데이터가 아닌 X_test (=tensor로 변경하기 전 dataframe)를 줘야 한다.

summary plot에 똑같이 tensor data (test_loader.dataset.x_data.cuda())를 주면 회색으로 표현된다.

(회색으로 표현되는 이유로 NaN값인 경우, 데이터가 숫자가 아닌 문자 타입인 경우.. 등이 있음)

for i, cls_name in enumerate(['clas0', 'class1', 'class2']):
    plt.figure()
    shap.summary_plot(shap_values_test[i], X_test, re_features, max_display=10, plot_size=(10,6), show=False)
    plt.tight_layout()
    plt.show()

* 참고로 dataframe 데이터를 다음과 같이 커스텀 클래스를 사용하여 tensor 데이터로 변경하였다.

class TensorData(Dataset):
    def __init__(self, x_data, y_data):
        self.x_data = torch.tensor(x_data, dtype=torch.float32)
        self.y_data = torch.tensor(y_data)
        self.len = self.y_data.shape[0]
        
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]
    
    def __len__(self):
        return self.len

from torch.utils.data import DataLoader, Dataset

trainsets = TensorData(X_train.values, y_train.values)
testsets = TensorData(X_test.values, y_test.values)

train_loader = torch.utils.data.DataLoader(trainsets, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(testsets, batch_size=32, shuffle=False)

test_loader.dataset.x_data.cuda()

>>>tensor([[ 0.0000,  1.0000,  0.0000,  ...,  1.7268, -0.9676, -0.3213],
        [ 1.0000,  1.0000,  1.0000,  ..., -0.2597, -0.7196, -0.3317],
        [ 1.0000,  1.0000,  0.0000,  ..., -0.5435, -0.3167, -0.2691],
        ...,
        [ 1.0000,  0.0000,  0.0000,  ..., -0.2597,  2.0129, -0.3082],
        [ 0.0000,  1.0000,  1.0000,  ...,  0.0241, -1.1824, -0.3108],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0241,  0.2458,  0.0671]],
       device='cuda:0')

* 참고 링크

[Python][Pytorch] SHAP 라이브러리 Error 공유

[Task] 시계열 연속형 데이터를 입력변수로 한 머신러닝/딥러닝 기반 예측 (Regression) [Language] Python 3.6.12 [Framework] Pytorch 1.7.0 [Library] SHAP 0.35.0 이 글에서는, 파이썬의 shap 라이브러리를 사용하던 중

good-learning.tistory.com

저작자표시 비영리 변경금지

'Data science > 각종 에러' 카테고리의 다른 글

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 32]) (0)	2023.05.01
error: (-215:Assertion failed) (mtype == CV_8U \|\| mtype == CV_8S) && _mask.sameSize(*psrc1) in function 'cv::binary_op' (0)	2023.04.08
RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 (0)	2023.04.08
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. (0)	2023.04.07
for epoch in range(args.epochs):에서 TypeError: 'int' object is not callable (0)	2023.04.06

현재글shap.DeepExplainer 사용 시 feature value가 gray 색상인 경우

웹개발환경, Shape biase, imputation, attention resnet, WEB2-python, axes color, 인터넷과웹, dead kernel, weakly supervised, img태그속성, 목차태그, attention module, Texture bias, miss forest, 멀티커서, multiple imputation, GMIC, Python, WAMP단종, shap gray color,

Today :
Yesterday :

금붕어의 정리 노트