Expert answer:I need help with the following homework. I have attached the following document for CS Information Retrieval Homework Help.Please let me know if you need references for assistance, I can provide what I can.Thanks for your help. I greatly appreciate it.
information_retrieval_hw.docx
Unformatted Attachment Preview
Do all the following problems on this sheet and Show All Work.
I.
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
Given the following Term-Document matrix
T1
2
0
0
1
0
0
0
3
0
0
0
0
2
T2
0
3
0
0
0
1
0
0
3
0
0
0
0
T3
0
0
4
0
2
0
0
2
0
3
0
2
0
T4
0
1
0
0
0
4
3
0
0
0
3
0
1
T5
3
0
0
4
0
0
0
1
0
0
0
0
3
T6
0
0
0
1
0
1
0
0
0
2
2
4
0
T7
0
0
3
0
0
3
0
0
1
0
0
0
2
a) Compute Cos(D2, D9)
b) Compute Cos(T1, T5)
c) Given Q = <1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0>
Calculate Cos(Q, D4).
d) Calculate the Tf-Idf vector for D13
T8
1
0
2
0
1
0
3
0
3
0
2
0
0
T9
0
4
0
1
0
1
0
3
0
2
0
3
1
T10
0
2
0
0
3
0
0
0
0
2
0
0
0
T11
0
0
0
3
0
0
4
0
0
0
0
1
0
II.
Given the B-tree of order 3(nodes of size 3 split)
400
236 301
110 145
248
455 700
376 388 408 433 511
766 900
a) Insert in order 122, 395, 800, 405, 711, 812 Redraw the tree.
b) Delete 455 from the original tree.
III.
Draw and fill in the dynamic programming table as in the slides for finding the Levenshtein
distance from substitute and superglue.
IV.
Find the Jaccard coefficient between A and B using 3-grams with
A = tangerine and B = tamberine.
V.
Use Zipf’s Law to find the average number of times the 60th ranked term occurs in a
document where
L (the number of tokens per document) = 10000.
M (the number of terms) = 4,000,000.
VI.
Decode the following Gamma code string
1101111110101111100111111101011111101011111101101111010
write the answer as the sequence of document numbers.
VII.
Given the query : white sox 2018 schedule
And the training set of documents
D1: white Christmas 2018
D2: White sox 2018 roster
D3: metra 2018 schedule
D4: baseball 2018 schedule
D5: white sox schedule
D6: white sox for sell
D7 sox schedule 2018
D1, D2, D3, and D6 are not relevant, while D4, D5 and D7 are relevant.
Calculate the ci value for each term in the query and the Retrieval Status value for the document
D: white sox sell stadium
VIII.
Tsucceed = 1
Tsucceed = 0
Calculate the expected mutual information, I(U, C), for the term “succeed” to the news class
sports where
Csports= 1
300
175
Csports = 0
4400
65, 000
IX.
Draw the precision-Recall graph for the retrieval sequence
r, r, n, r, r, n, n, r, n, n, r, n, n, n, r, n, n, r, n, r, n, n, n, n, r. n, n, n, r, n, n, n, n, r
…
Purchase answer to see full
attachment
You will get a plagiarism-free paper and you can get an originality report upon request.
All the personal information is confidential and we have 100% safe payment methods. We also guarantee good grades
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more