-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
371 lines (341 loc) · 18.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
<!DOCTYPE html>
<html lang="en">
<head>
<title>Introduction to Japanese Natural Language Processing</title>
<!-- Meta -->
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="A thorough guide to engineering problems that come with working with Japanese, from basics like tokenization through cutting edge techniques.">
<meta name="author" content="Masato Hagiwara and Paul O'Leary McCann">
<link rel="shortcut icon" href="favicon.ico">
<meta property="og:site_name" content="Introduction to Japanese Natural Language Processing"/>
<meta property="og:title" content="Introduction to Japanese Natural Language Processing"/>
<meta property="og:description" content="A thorough guide to engineering problems that come with working with Japanese, from basics like tokenization through cutting edge techniques."/>
<meta property="og:image" content="https://www.japanesenlp.com/assets/images/cover-english.png"/>
<meta property="og:type" content="article"/>
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:image" content="https://japanesenlp.com/assets/images/janlp-twitter-card.png">
<!-- Google Font -->
<link href="https://fonts.googleapis.com/css?family=Quicksand:700|Roboto:400,400i,700&display=swap" rel="stylesheet">
<!-- FontAwesome JS-->
<script defer src="assets/fontawesome/js/all.min.js"></script>
<!-- Theme CSS -->
<link id="theme-style" rel="stylesheet" href="assets/css/theme.css">
<style>
.logo-icon {
width: 40px;
}
.content-section .key-points-list li {
margin-top: 0.5em;
margin-bottom: 0.5em;
}
.content-section .subsection-list li {
margin-top: 0;
margin-bottom: 0.2em;
}
</style>
</head>
<body>
<header class="navbar navbar-expand-md py-3">
<a class="navbar-brand" href="index.html"><img class="logo-icon mr-2" src="logo.png" alt="logo" ></a>
<div class="navbar-nav ml-auto">
<div class="nav-item">
English | <a href="index-ja.html">日本語</a>
</div>
</div>
</header>
<section class="hero-section">
<div class="container">
<div class="row">
<div class="col-12 col-md-7 pt-5 mb-5 align-self-center">
<div class="promo pr-md-3 pr-lg-5">
<h1 class="headline mb-3">
Introduction to Japanese Natural Language Processing
</h1><!--//headline-->
<div class="subheadline mb-5">
Masato Hagiwara and Paul O'Leary McCann
</div><!--//subheading-->
<div class="mb-2">
Completion: Winter 2021 (expected).
</div><!--//subheading-->
<div class="mb-2">
Available in both English and Japanese
</div><!--//subheading-->
<div class="cta-holder">
<a class="btn btn-primary mr-lg-2" href="https://leanpub.com/japanesenlp">Buy on Leanpub</a>
<a class="btn btn-secondary scrollto" href="#benefits-section">Learn More</a>
</div><!--//cta-holder-->
</div><!--//promo-->
</div><!--col-->
<div class="col-12 col-md-5 mb-5 align-self-center">
<div class="book-cover-holder">
<img class="img-fluid book-cover" src="assets/images/cover-english.png" alt="book cover" >
</div><!--//book-cover-holder-->
</div><!--col-->
</div><!--//row-->
</div><!--//container-->
</section><!--//hero-section-->
<section id="benefits-section" class="benefits-section theme-bg-light-gradient py-5">
<div class="container py-5">
<h2 class="section-heading text-center mb-3">About This Book</h2>
<div class="section-intro single-col-max mx-auto text-center mb-5">
A thorough guide for programmers working with Japanese text, covering fundamental issues like tokenization and recent research topics like generating natural language texts.
Working examples are accompanied by extensive reference to allow problem solving even without a background in Japanese or Machine Learning.
</div>
<div class="row text-center">
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-globe-asia"></i></div>
<h3 class="item-heading">Basics of Japanese Linguistics</h3>
</div><!--//item-heading-->
<div class="item-desc">
All the background knowledge required for processing Japanese language texts on computers — characters, words, grammar, as well as encodings and emoji.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-tools"></i></div>
<h3 class="item-heading">Open-source Tools</h3>
</div><!--//item-heading-->
<div class="item-desc">
Use open-source tools to analyze Japanese texts, including: word tokenization with MeCab, PoS tagging and parsing with spaCy.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-database"></i></div>
<h3 class="item-heading">Dictionaries & Datasets</h3>
</div><!--//item-heading-->
<div class="item-desc">
A thorough overview of dictionaries, corpora, and other datasets commonly used for Japanese language processing.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-book-reader"></i></div>
<h3 class="item-heading">Word Embeddings</h3>
</div><!--//item-heading-->
<div class="item-desc">
Use word and sentence embeddings to represent, visualize, and retrieve Japanese texts.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-language"></i></div>
<h3 class="item-heading">Language Generation and Conversion</h3>
</div><!--//item-heading-->
<div class="item-desc">
Use neural networks to generate Japanese texts and and convert between Kana and Kanji.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
<div class="item col-12 col-md-6 col-lg-4">
<div class="item-inner p-3 p-lg-4">
<div class="item-header mb-3">
<div class="item-icon"><i class="fas fa-robot"></i></div>
<h3 class="item-heading">Natural Language Understanding</h3>
</div><!--//item-heading-->
<div class="item-desc">
Use transfer learning to understand Japanese texts through sentiment analysis and named entity recognition.
</div><!--//item-desc-->
</div><!--//item-inner-->
</div><!--//item-->
</div><!--//row-->
</div><!--//container-->
</section><!--//benefits-section-->
<section id="audience-section" class="audience-section py-5">
<div class="container">
<h2 class="section-heading text-center mb-4">Who This Book Is For</h2>
<div class="section-intro single-col-max mx-auto text-center mb-5">
This book is written for anyone who's interested in dealing with Japanese texts, including software developers, AI researchers and engineers, and language experts.
</div><!--//section-intro-->
<div class="audience mx-auto">
<div class="item media">
<div class="item-icon mr-3"><i class="fas fa-user-check"></i></div>
<div class="media-body">
<h4 class="item-title">No Math Required</h4>
<div class="item-desc">You don't need to know math to understand the book. We focus on how to use tools to get things done, rather than explaining the theory behind their implementation. </div>
</div><!--//media-body-->
</div><!--//item-->
<div class="item media">
<div class="item-icon mr-3"><i class="fas fa-user-check"></i></div>
<div class="media-body">
<h4 class="item-title">No Japanese Required</h4>
<div class="item-desc">While highly desirable, you don't need to understand Japanese to read the book, and example texts will be thoroughly annotated.</div>
</div><!--//media-body-->
</div><!--//item-->
<div class="item media">
<div class="item-icon mr-3"><i class="fas fa-user-check"></i></div>
<div class="media-body">
<h4 class="item-title">Basic Python</h4>
<div class="item-desc">The only prerequiste for this book is basic Python skills. Extensive code examples are used to show how to approach and solve problems.</div>
</div><!--//media-body-->
</div><!--//item-->
</div><!--//audience-->
</div><!--//container-->
</section><!--//audience-section-->
<section id="content-section" class="content-section">
<div class="container">
<div class="single-col-max mx-auto">
<h2 class="section-heading text-center mb-5">Table of Contents</h2>
<div class="row">
<div class="col-12 col-md-6">
<div class="figure-holder mb-5">
<img class="img-fluid" src="assets/images/cover-back.png" alt="image" >
</div><!--//figure-holder-->
</div><!--//col-->
<div class="col-12 col-md-6 mb-5">
<div class="key-points mb-4 text-center">
<ul class="key-points-list list-unstyled mb-4 mx-auto d-inline-block text-left">
<li><i class="fas fa-globe-asia mr-2"></i>Chapter 1: Basics of Japanese linguistics</li>
<ul class="subsection-list">
<li>1.1 Japanese language overview</li>
<li>1.2 Orthography: What kinds of letters are there?</li>
<li>1.3 Morphology: What kinds of words are there?</li>
<li>1.4 Syntax: How are sentences structured?</li>
<li>1.5 Technical Notes: How are texts represented?</li>
</ul>
<li><i class="fas fa-tools mr-2"></i>Chapter 2: Morphological analysis and open-source tools</li>
<ul class="subsection-list">
<li>2.1 Tokenizers and morphological analyzers: overview and basic use</li>
<li>2.2 Advanced tokenization</li>
<li>2.3 Dependency parsers</li>
</ul>
<li><i class="fas fa-database mr-2"></i>Chapter 3: Datasets</li>
<ul class="subsection-list">
<li>3.1 Overview</li>
<li>3.2 Dictionaries</li>
<li>3.3 General Corpora</li>
<li>3.4 Specialized Corpora</li>
</ul>
<li><i class="fas fa-book-reader mr-2"></i>Chapter 4: Word and sentence embeddings</li>
<ul class="subsection-list">
<li>4.1 Word embeddings</li>
<li>4.2 Sentence embeddings</li>
<li>4.3 Multilingual embeddings</li>
</ul>
<li><i class="fas fa-language mr-2"></i>Chapter 5: Natural language generation and conversion with Transformer</li>
<ul class="subsection-list">
<li>5.1 Introduction to Transformer</li>
<li>5.2 Text generation</li>
<li>5.3 Kana-Kanji conversion / transliteration</li>
</ul>
<li><i class="fas fa-robot mr-2"></i>Chapter 6: Natural language understanding via transfer learning</li>
<ul class="subsection-list">
<li>6.1 Introduction to transfer learning</li>
<li>6.2 Sentiment / document classification</li>
<li>6.3 Named entity recognition</li>
</ul>
</ul>
</div><!--//key-points-->
</div><!--//col-12-->
</div><!--//row-->
</div><!--//single-col-max-->
</div><!--//container-->
</section><!--//content-section-->
<section id="author-section" class="author-section section theme-bg-primary py-5">
<div class="container py-3">
<h2 class="section-heading text-center text-white mb-3">About The Authors</h2>
<div class="row">
<div class="author-bio col-12 col-md-6">
<div class="author-profile text-center mb-5">
<img class="author-pic" src="assets/images/profiles/masato.jpg" alt="image" >
</div><!--//author-profile-->
<p><b>Masato Hagiwara</b> is an independent NLP/ML researcher and engineer at Octanove Labs.
He works on educational and Asian language processing projects with world class startups and research institutes.
He received his Ph.D. degree in Information Science from Nagoya University in 2009, and worked at companies including Google, Microsoft Research, Baidu, and Duolingo.
An author of several best-selling NLP books.</p>
<div class="author-links text-center pt-4">
<h5 class="text-white mb-4">Follow Author</h5>
<ul class="social-list list-unstyled">
<li class="list-inline-item"><a href="https://twitter.com/mhagiwara"><i class="fab fa-twitter"></i></a></li>
<li class="list-inline-item"><a href="https://github.com/mhagiwara"><i class="fab fa-github-alt"></i></a></li>
<li class="list-inline-item"><a href="http://masatohagiwara.net/"><i class="fas fa-globe-asia"></i></a></li>
</ul><!--//social-list-->
</div><!--//author-links-->
</div><!--//author-bio-->
<div class="author-bio col-12 col-md-6">
<div class="author-profile text-center mb-5">
<img class="author-pic" src="assets/images/profiles/paul.png" alt="image" >
</div><!--//author-profile-->
<p><b>Paul O'Leary McCann</b> is a consultant and member of the spaCy development team. Based
in Tokyo since 2011, he maintains the most popular Japanese tokenizer in
Python. Outside of his work on NLP he helps out with Tokyo Indies, a monthly
game developer meetup.</p>
<div class="author-links text-center pt-4">
<h5 class="text-white mb-4">Follow Author</h5>
<ul class="social-list list-unstyled">
<li class="list-inline-item"><a href="https://twitter.com/polm23"><i class="fab fa-twitter"></i></a></li>
<li class="list-inline-item"><a href="https://github.com/polm"><i class="fab fa-github-alt"></i></a></li>
<li class="list-inline-item"><a href="https://dampfkraft.com"><i class="fas fa-globe-asia"></i></a></li>
</ul><!--//social-list-->
</div><!--//author-links-->
</div><!--//author-bio-->
</div><!--//row-->
</div><!--//container-->
</section><!--//author-section-->
<section id="signup-section" class="signup-section py-5">
<div class="container">
<h2 class="section-heading text-center mb-4">Subscribe for updates</h2>
<div class="section-intro single-col-max mx-auto text-center mb-5">
We'll let you know when the book is completed/updated!
</div><!--//section-intro-->
<div class="px-5">
<!-- Begin Mailchimp Signup Form -->
<link href="//cdn-images.mailchimp.com/embedcode/classic-10_7.css" rel="stylesheet" type="text/css">
<style type="text/css">
#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; }
/* Add your own Mailchimp form style overrides in your site stylesheet or in this style block.
We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */
</style>
<div id="mc_embed_signup">
<form action="https://octanove.us6.list-manage.com/subscribe/post?u=59035d87bf54ef6c44e19d14f&id=7a6cdda78c" method="post" id="mc-embedded-subscribe-form" name="mc-embedded-subscribe-form" class="validate" target="_blank" novalidate>
<div id="mc_embed_signup_scroll">
<h2>Subscribe</h2>
<div class="indicates-required"><span class="asterisk">*</span> indicates required</div>
<div class="mc-field-group">
<label for="mce-EMAIL">Email Address <span class="asterisk">*</span>
</label>
<input type="email" value="" name="EMAIL" class="required email" id="mce-EMAIL">
</div>
<div id="mce-responses" class="clear">
<div class="response" id="mce-error-response" style="display:none"></div>
<div class="response" id="mce-success-response" style="display:none"></div>
</div> <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
<div style="position: absolute; left: -5000px;" aria-hidden="true"><input type="text" name="b_59035d87bf54ef6c44e19d14f_7a6cdda78c" tabindex="-1" value=""></div>
<div class="clear"><input type="submit" value="Subscribe" name="subscribe" id="mc-embedded-subscribe" class="button"></div>
</div>
</form>
</div>
<script type='text/javascript' src='//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js'></script><script type='text/javascript'>(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';fnames[2]='LNAME';ftypes[2]='text';fnames[3]='ADDRESS';ftypes[3]='address';fnames[4]='PHONE';ftypes[4]='phone';fnames[5]='BIRTHDAY';ftypes[5]='birthday';}(jQuery));var $mcj = jQuery.noConflict(true);</script>
<!--End mc_embed_signup-->
</div>
</div>
</section>
<footer class="footer">
<div class="footer-bottom text-center py-5">
<!--/* This template is free as long as you keep the footer attribution link. If you'd like to use the template without the attribution link, you can buy the commercial license via our website: themes.3rdwavemedia.com Thank you for your support. :) */-->
<small class="copyright">Book cover by <a class="theme-link" href="https://www.thenomi.com/">Nomi</a></small>
<br>
<small class="copyright">Site template designed with <i class="fas fa-heart" style="color: #fb866a;"></i> by <a class="theme-link" href="http://themes.3rdwavemedia.com" target="_blank">Xiaoying Riley</a> for developers</small>
</div>
</footer>
<!-- Javascript -->
<script src="assets/plugins/jquery-3.4.1.min.js"></script>
<script src="assets/plugins/popper.min.js"></script>
<script src="assets/plugins/bootstrap/js/bootstrap.min.js"></script>
<script src="assets/plugins/jquery.scrollTo.min.js"></script>
<script src="assets/plugins/back-to-top.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>