Text Representation

The process to vectorize the requirements in order to be used for classification is implemented as follows:

Unnecessary characters removal
Lower-casing
Punctuation signs removal
Lemmatization
Stop word removal
Label ecoding
TF-IDF Vectorization

Once the sentences were processed, a further analysis using the CHI squared algorithm for feature selection identified the following most correlated unigrams and bigramas per Quality Attribute:

Quality Attribute	Unigrams	Bigrams
AVAILABILITY	`failure`, `achieve`, `hours`, `availability`, `available`	`system must`, `shall available`
FAULT TOLERANCE	`eg`, `control`, `result`, `failure`, `operate`	`within system`, `system shall`
MAINTAINABILITY	`maintain`, `design`, `new`, `update`, `maintenance`	`use system`, `user able`
PERFORMANCE	`status`, `result`, `less`, `response`, `fast`	`less fast`, `response time`
SCALABILITY	`manner`, `capable`, `support`, `handle`, `number`	`shall support`, `shall capable`
SECURITY	`authorize`, `password`, `security`, `encrypt`, `access`	`user system`, `authorize user`
USABILITY	`use`, `content`, `navigation`, `easy`, `page`	`shall easy`, `use system`

Finally, the features were split and exported with a 17:3 ratio for training and test sets, the pickles X_train, X_test, y_train and y_test are expected to be implemented on the Automated Model Configuration repository. The same split ratio and random state are going to be used for the deep learning classifiers as well.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets @ 3f80a9a		datasets @ 3f80a9a
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
X_test.pickle		X_test.pickle
X_train.pickle		X_train.pickle
features_train.pickle		features_train.pickle
label_encoder.pickle		label_encoder.pickle
labels_train.pickle		labels_train.pickle
requirements.txt		requirements.txt
tf-idf.pickle		tf-idf.pickle
transformation.ipynb		transformation.ipynb
y_test.pickle		y_test.pickle
y_train.pickle		y_train.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Representation

About

Releases

Packages

Languages

quality-attributes/text-representation

Folders and files

Latest commit

History

Repository files navigation

Text Representation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages