🌐 Node-RED HTML ノードガイド

📚 目次

1. 概要
2. 設定詳細
3. CSSセレクタの基本
4. 実用的な使用パターン
5. 演習問題
6. まとめ
7. 実務活用例

📖 1. 概要

HTML ノードは、HTMLドキュメントから特定の要素を抽出するノードです。CSSセレクタを使って、Webページから必要なデータを取り出す「Webスクレイピング」に使用します。

🔍 本から特定の章を探すことに例えると：HTML ノードは本の目次を使って特定の章を探すようなものです。「第3章」「著者紹介」など、目的の部分をピンポイントで取り出せます。

HTTP Request → HTML → Debug

💡 ポイント:

CSSセレクタでHTML要素を指定
複数の要素がマッチした場合は配列で出力
Webスクレイピングの基本ツール
テキストだけでなく属性値も取得可能

⚙️ 2. 設定詳細

プロパティ

プロパティ	説明	デフォルト
Selector	CSSセレクタ（抽出対象を指定）	（空）
Output	出力形式（html/text/属性）	html
Property	入力HTMLのプロパティ	msg.payload
Output to	出力先のプロパティ	msg.payload
as	出力形式（単一/複数メッセージ）	single
Prefix	属性プレフィックス文字（出力がcomplの場合）	_

Output オプション

オプション	説明	出力例
the html content	要素の内部HTML	<b>太字</b>テキスト
only the text content	テキストのみ	太字テキスト
属性名を指定	指定した属性の値	href属性なら URL

🎯 3. CSSセレクタの基本

基本セレクタ

セレクタ	説明	例
`要素名`	タグ名で選択	`p`, `div`, `a`
`.クラス名`	class属性で選択	`.article`, `.title`
`#ID`	id属性で選択	`#header`, `#main`
`[属性]`	属性の有無で選択	`[href]`, `[data-id]`
`[属性=値]`	属性値で選択	`[type="text"]`

組み合わせセレクタ

セレクタ	説明	例
`A B`	Aの子孫のB	`div p`（div内の全p）
`A > B`	Aの直接の子のB	`ul > li`（ulの直下のli）
`A, B`	AまたはB	`h1, h2`（h1とh2両方）
`A.class`	Aでクラスがclass	`p.intro`

便利なセレクタ

/* n番目の要素 */
li:nth-child(1)      /* 1番目のli */
li:nth-child(odd)    /* 奇数番目のli */
li:first-child       /* 最初のli */
li:last-child        /* 最後のli */

/* 属性の部分一致 */
[href^="https"]      /* httpsで始まる */
[href$=".pdf"]       /* .pdfで終わる */
[href*="example"]    /* exampleを含む */

🔧 4. 実用的な使用パターン

📥 サンプルフローのインポート方法:

下のサンプルフローJSONをコピー
Node-REDエディタで メニュー → 読み込み を選択
JSONをペーストして「読み込み」をクリック

このサンプルフローには、以下で説明するパターンの実例が含まれています。

📥 サンプルフローJSON（クリックで展開）

[ { "id": "html_sample_tab", "type": "tab", "label": "HTML サンプル", "disabled": false, "info": "" }, { "id": "html_comment1", "type": "comment", "z": "html_sample_tab", "name": "━━━ 基本: HTMLからテキスト抽出 ━━━", "info": "", "x": 190, "y": 40, "wires": [] }, { "id": "html_template1", "type": "template", "z": "html_sample_tab", "name": "サンプルHTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "\n\n

メインタイトル

これは紹介文です。

本文の段落1です。

本文の段落2です。

\n\n", "output": "str", "x": 160, "y": 100, "wires": [["html_extract_h1", "html_extract_p", "html_extract_links"]] }, { "id": "html_inject1", "type": "inject", "z": "html_sample_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 60, "wires": [["html_template1"]] }, { "id": "html_extract_h1", "type": "html", "z": "html_sample_tab", "name": "h1抽出", "property": "payload", "outproperty": "payload", "tag": "h1", "ret": "text", "as": "single", "chr": "_", "x": 350, "y": 60, "wires": [["html_debug1"]] }, { "id": "html_debug1", "type": "debug", "z": "html_sample_tab", "name": "h1: メインタイトル", "active": true, "tosidebar": true, "console": false, "tostatus": true, "complete": "payload", "targetType": "msg", "statusVal": "payload", "statusType": "auto", "x": 550, "y": 60, "wires": [] }, { "id": "html_extract_p", "type": "html", "z": "html_sample_tab", "name": "全p抽出", "property": "payload", "outproperty": "payload", "tag": "p", "ret": "text", "as": "multi", "chr": "_", "x": 350, "y": 100, "wires": [["html_debug2"]] }, { "id": "html_debug2", "type": "debug", "z": "html_sample_tab", "name": "p: 配列で出力", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 540, "y": 100, "wires": [] }, { "id": "html_extract_links", "type": "html", "z": "html_sample_tab", "name": "リンクURL抽出", "property": "payload", "outproperty": "payload", "tag": "a", "ret": "attr", "as": "multi", "chr": "_", "x": 360, "y": 140, "wires": [["html_debug3"]] }, { "id": "html_debug3", "type": "debug", "z": "html_sample_tab", "name": "href属性", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 540, "y": 140, "wires": [] }, { "id": "html_comment2", "type": "comment", "z": "html_sample_tab", "name": "━━━ クラスやIDで絞り込み ━━━", "info": "", "x": 180, "y": 200, "wires": [] }, { "id": "html_inject2", "type": "inject", "z": "html_sample_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 260, "wires": [["html_template2"]] }, { "id": "html_template2", "type": "template", "z": "html_sample_tab", "name": "クラス付きHTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "

重要ニュース

重要な出来事の概要

通常ニュース1

概要1

通常ニュース2

概要2

", "output": "str", "x": 170, "y": 300, "wires": [["html_extract_class"]] }, { "id": "html_extract_class", "type": "html", "z": "html_sample_tab", "name": ".news-item .title", "property": "payload", "outproperty": "payload", "tag": ".news-item .title", "ret": "text", "as": "multi", "chr": "_", "x": 370, "y": 300, "wires": [["html_debug4"]] }, { "id": "html_debug4", "type": "debug", "z": "html_sample_tab", "name": "ニュースタイトル", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 570, "y": 300, "wires": [] }, { "id": "html_comment3", "type": "comment", "z": "html_sample_tab", "name": "━━━ テーブルデータ抽出 ━━━", "info": "", "x": 180, "y": 360, "wires": [] }, { "id": "html_inject3", "type": "inject", "z": "html_sample_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 420, "wires": [["html_template3"]] }, { "id": "html_template3", "type": "template", "z": "html_sample_tab", "name": "テーブルHTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "\n \n \n \n \n

名前	年齢	職業
田中	30	エンジニア
鈴木	25	デザイナー
佐藤	35	マネージャー

", "output": "str", "x": 160, "y": 460, "wires": [["html_extract_table"]] }, { "id": "html_extract_table", "type": "html", "z": "html_sample_tab", "name": "td抽出", "property": "payload", "outproperty": "payload", "tag": "table tr td", "ret": "text", "as": "multi", "chr": "_", "x": 350, "y": 460, "wires": [["html_debug5"]] }, { "id": "html_debug5", "type": "debug", "z": "html_sample_tab", "name": "テーブルデータ", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 540, "y": 460, "wires": [] } ]

使用パターン

パターン1: 基本テキスト抽出（h1、p、リンク）

サンプルフローの「基本: HTMLからテキスト抽出」を参照してください。

サンプルHTML → h1抽出 → メインタイトル

ポイント:

h1 → 見出しテキストを抽出
p → 段落テキストを配列で抽出
a + Output: attr → href属性（URL）を抽出

パターン2: クラスやIDで絞り込み

サンプルフローの「クラスやIDで絞り込み」を参照してください。

クラス付きHTML → .news-item .title → ニュースタイトル

ポイント:

.クラス名 でクラス指定
#ID名 でID指定
入れ子は .parent .child のように空白で区切る

パターン3: テーブルデータ抽出

サンプルフローの「テーブルデータ抽出」を参照してください。

テーブルHTML → table tr td → テーブルデータ

ポイント:

table tr td でテーブルセルを抽出
Output: text でテキストのみ取得
配列で返されるため、後続のFunctionノードで整形可能

📝 5. 演習問題

演習1: 見出し抽出初級

📋 課題: HTMLから全ての h2 タグのテキストを抽出してください。

✅ 成功の条件:

デバッグパネルに複数のメッセージが表示され、各payloadが文字列型（string）になっている
「セクション1」「セクション2」「セクション3」の3つのテキストがそれぞれ別メッセージとして出力される
h1 タグのテキスト（「タイトル」）や p タグのテキスト（「本文」）は含まれていない
HTMLタグを含まない純粋なテキストのみが出力されている

💡 ヒント

Selector に h2、Output に text を設定します。

✅ 解答例フロー

[ {"id": "ex1_tab", "type": "tab", "label": "演習1"}, {"id": "ex1_inject", "type": "inject", "z": "ex1_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 80, "wires": [["ex1_template"]]}, {"id": "ex1_template", "type": "template", "z": "ex1_tab", "name": "HTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "

タイトル

セクション1

本文1

セクション2

本文2

セクション3

本文3

", "output": "str", "x": 270, "y": 80, "wires": [["ex1_html"]]}, {"id": "ex1_html", "type": "html", "z": "ex1_tab", "name": "h2抽出", "property": "payload", "outproperty": "payload", "tag": "h2", "ret": "text", "as": "multi", "chr": "_", "x": 410, "y": 80, "wires": [["ex1_debug"]]}, {"id": "ex1_debug", "type": "debug", "z": "ex1_tab", "name": "結果", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 550, "y": 80, "wires": []} ]

演習2: リンクURL抽出初級

📋 課題: HTML内の全ての a タグから href 属性を抽出してください。

✅ 成功の条件:

デバッグパネルに複数のメッセージが表示され、各payloadが文字列型（string）になっている
"/home"、"/about"、"/contact"、"https://external.com" の4件のURLが出力される
リンクのテキスト（「ホーム」「会社概要」等）は出力されず、href 属性値のみが出力されている
4つのメッセージがそれぞれ別々に出力される（多数出力モードの場合）

💡 ヒント

Output を「属性名を指定」にして href と入力します。

✅ 解答例フロー

[ {"id": "ex2_tab", "type": "tab", "label": "演習2"}, {"id": "ex2_inject", "type": "inject", "z": "ex2_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 80, "wires": [["ex2_template"]]}, {"id": "ex2_template", "type": "template", "z": "ex2_tab", "name": "リンクHTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "", "output": "str", "x": 280, "y": 80, "wires": [["ex2_html"]]}, {"id": "ex2_html", "type": "html", "z": "ex2_tab", "name": "href抽出", "property": "payload", "outproperty": "payload", "tag": "a", "ret": "attr", "as": "multi", "chr": "_", "x": 430, "y": 80, "wires": [["ex2_debug"]]}, {"id": "ex2_debug", "type": "debug", "z": "ex2_tab", "name": "URL一覧", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 570, "y": 80, "wires": []} ]

演習3: クラスで絞り込み中級

📋 課題: class="highlight" を持つ要素のテキストのみを抽出してください。

✅ 成功の条件:

デバッグパネルに「重要項目1」「重要項目2」の2件のテキストのみが出力される
「通常項目1」「通常項目2」「通常項目3」は出力されていない
各payloadが文字列型（string）で、HTMLタグを含まない純粋なテキストになっている
CSSクラスセレクタ（.highlight）による絞り込みが正しく機能している

💡 ヒント

Selector に .highlight を指定します。

✅ 解答例フロー

[ {"id": "ex3_tab", "type": "tab", "label": "演習3"}, {"id": "ex3_inject", "type": "inject", "z": "ex3_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 80, "wires": [["ex3_template"]]}, {"id": "ex3_template", "type": "template", "z": "ex3_tab", "name": "HTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "

通常項目1
重要項目1
通常項目2
重要項目2
通常項目3

", "output": "str", "x": 270, "y": 80, "wires": [["ex3_html"]]}, {"id": "ex3_html", "type": "html", "z": "ex3_tab", "name": ".highlight", "property": "payload", "outproperty": "payload", "tag": ".highlight", "ret": "text", "as": "multi", "chr": "_", "x": 420, "y": 80, "wires": [["ex3_debug"]]}, {"id": "ex3_debug", "type": "debug", "z": "ex3_tab", "name": "重要項目のみ", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 580, "y": 80, "wires": []} ]

演習4: 複合セレクタ上級

📋 課題: id="main" の div 内にある、class="item" を持つ要素のテキストを抽出してください。

✅ 成功の条件:

デバッグパネルに「メイン項目1」「メイン項目2」の2件のみが出力される
id="sidebar" 内にある「サイドバー項目」は出力されていない
class="item" を持たない「その他」テキストも出力されていない
複合セレクタ（#main .item）によって、スコープを正確に絞り込めている

💡 ヒント

複合セレクタ: #main .item

✅ 解答例フロー

[ {"id": "ex4_tab", "type": "tab", "label": "演習4"}, {"id": "ex4_inject", "type": "inject", "z": "ex4_tab", "name": "実行", "props": [], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "", "x": 130, "y": 80, "wires": [["ex4_template"]]}, {"id": "ex4_template", "type": "template", "z": "ex4_tab", "name": "HTML", "field": "payload", "fieldType": "msg", "format": "html", "syntax": "plain", "template": "\n

\n メイン項目1\n メイン項目2\n その他\n

", "output": "str", "x": 270, "y": 80, "wires": [["ex4_html"]]}, {"id": "ex4_html", "type": "html", "z": "ex4_tab", "name": "#main .item", "property": "payload", "outproperty": "payload", "tag": "#main .item", "ret": "text", "as": "multi", "chr": "_", "x": 420, "y": 80, "wires": [["ex4_debug"]]}, {"id": "ex4_debug", "type": "debug", "z": "ex4_tab", "name": "メイン内のみ", "active": true, "tosidebar": true, "console": false, "tostatus": false, "complete": "payload", "targetType": "msg", "x": 580, "y": 80, "wires": []} ]

✅ 6. まとめ

🎯 重要ポイント:

CSSセレクタでHTML要素を指定して抽出
text, html, 属性値の3種類の出力形式
複数要素がマッチした場合は配列で出力
Webスクレイピングの基本ツール

⚠️ 注意事項:

Webスクレイピングは対象サイトの利用規約を確認
動的に生成されるコンテンツ（JavaScript）は取得できない場合がある
サイト構造が変わるとセレクタが機能しなくなる可能性

🏭 7. 実務活用例

ケース1: ニュースサイトからの情報収集

ニュースサイトから記事タイトルとリンクを定期的に取得してデータベースに保存。

ケース2: 価格監視

ECサイトから商品価格を抽出し、価格変動を監視・通知。

ケース3: データ変換

HTMLレポートから表データを抽出してJSON形式に変換。

ケース4: コンテンツ集約

複数サイトから特定のコンテンツを抽出してダッシュボードに表示。

📚 Node-RED 公式ドキュメント

🏠

🌐 Node-RED HTML ノードガイド

📚 目次

📖 1. 概要

⚙️ 2. 設定詳細

プロパティ

Output オプション

🎯 3. CSSセレクタの基本

基本セレクタ

組み合わせセレクタ

便利なセレクタ

🔧 4. 実用的な使用パターン

メインタイトル

重要ニュース

通常ニュース1

通常ニュース2

使用パターン

パターン1: 基本テキスト抽出（h1、p、リンク）

パターン2: クラスやIDで絞り込み

パターン3: テーブルデータ抽出

📝 5. 演習問題

演習1: 見出し抽出 初級

タイトル

セクション1

セクション2

セクション3

演習2: リンクURL抽出 初級

演習3: クラスで絞り込み 中級

演習4: 複合セレクタ 上級

✅ 6. まとめ

🏭 7. 実務活用例

ケース1: ニュースサイトからの情報収集

ケース2: 価格監視

ケース3: データ変換

ケース4: コンテンツ集約

演習1: 見出し抽出初級

演習2: リンクURL抽出初級

演習3: クラスで絞り込み中級

演習4: 複合セレクタ上級