Difference between revisions of "Snowplow (software)"
Karl Jones (Talk | contribs) (→External links) |
Karl Jones (Talk | contribs) (→External links) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 27: | Line 27: | ||
# Set up enrich; | # Set up enrich; | ||
# Set up alternative data stores. | # Set up alternative data stores. | ||
+ | |||
+ | == Iglu repository == | ||
+ | |||
+ | An [[Iglu repository]] acts as a store of data schemas (Snowplow, currently JSON Schemas only). | ||
+ | |||
+ | Hosting JSON Schemas in an Iglu repository allows you to use those schemas in Iglu-capable systems such as Snowplow. | ||
+ | |||
+ | == Enrich applications == | ||
+ | |||
+ | A '''Snowplow Enrich application''' processes data from a Snowplow Collector, and stores enriched data in a persistent database. | ||
+ | |||
+ | There are currently two Enrichment processes available for setup: | ||
+ | |||
+ | * EmrEtlRunner An application that parses logs from a Collector and stores enriched events to S3 | ||
+ | * Stream Enrich A Scala application that reads Thrift events from a Kinesis stream and outputs back to a Kinesis stream | ||
+ | |||
+ | == EmrEtlRunner == | ||
+ | |||
+ | '''Snowplow EmrEtlRunner''' is an application that parses the log files generated by your Snowplow collector and | ||
+ | |||
+ | * Cleans up the data into a format that is easier to parse / analyse | ||
+ | * Enriches the data (e.g. infers the location of the visitor from his / her IP address and infers the search engine keywords from the query string) | ||
+ | * Stores that cleaned, enriched data in S3 | ||
+ | |||
+ | See: | ||
+ | |||
+ | * [https://github.com/snowplow/snowplow/tree/master/3-enrich/emr-etl-runner emr-etl-runner] | ||
+ | * [https://github.com/snowplow/snowplow/wiki/setting-up-EmrEtlRunner Setting up EmrEtlRunner] | ||
+ | |||
+ | == Discourse forums == | ||
+ | |||
+ | See: | ||
+ | |||
+ | http://discourse.snowplowanalytics.com/users/karl_jones/activity | ||
== See also == | == See also == | ||
Line 32: | Line 66: | ||
* [[Amazon Redshift]] - a hosted data warehouse product, which is part of the larger cloud computing platform [[Amazon Web Services]]. | * [[Amazon Redshift]] - a hosted data warehouse product, which is part of the larger cloud computing platform [[Amazon Web Services]]. | ||
* [[Amazon Web Services]] | * [[Amazon Web Services]] | ||
+ | * [[Iglu repository]] - a store of data schemas for [[Snowplow (software)]], currently (August 2016) supporting [[JSON Schema|JSON Schemas]] only. | ||
+ | * [[JSON Schema]] - a [[JSON]]-based format to define the structure of JSON data for validation, documentation, and interaction control. | ||
* [[Web application]] | * [[Web application]] | ||
Line 39: | Line 75: | ||
* [https://github.com/snowplow/snowplow/wiki/Configure-the-Scala-Stream-Collector Configure the Scala Stream Collector] - see [[Scala (programming language)]]. | * [https://github.com/snowplow/snowplow/wiki/Configure-the-Scala-Stream-Collector Configure the Scala Stream Collector] - see [[Scala (programming language)]]. | ||
* [https://github.com/snowplow/snowplow/wiki/1-General-parameters-for-the-Javascript-tracker General parameters for the Javascript tracker] | * [https://github.com/snowplow/snowplow/wiki/1-General-parameters-for-the-Javascript-tracker General parameters for the Javascript tracker] | ||
+ | * [http://stackoverflow.com/questions/37476726/snowplow-warning-no-tracker-configured Snowplow: Warning: No tracker configured] @ Stack Overflow - code example using callback. | ||
+ | * [https://github.com/snowplow/snowplow/wiki/2-Specific-event-tracking-with-the-Javascript-tracker-v2.5#custom-structured-events 2 Specific event tracking with the Javascript tracker v2.5] | ||
+ | * [https://github.com/snowplow/snowplow/wiki/3-Advanced-usage-of-the-JavaScript-Tracker 3 Advanced usage of the JavaScript Tracker] | ||
[[Category:Software]] | [[Category:Software]] | ||
[[Category:Web design and development]] | [[Category:Web design and development]] |
Latest revision as of 10:02, 15 September 2016
Snowplow is a marketing and product analytics platform.
Contents
Description
According to the official website, Snowplow does three things:
- Identifies website users, and tracks the way they engage with a website or web application;
- Stores users' behavioral data in a scalable "event data warehouse" you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres;
- Leverages the biggest range of tools to analyze that data, including big data tools (e.g. Hive, Pig, Mahout) via EMR or more traditional tools e.g. Tableau, R, Looker, Chartio to analyze that behavioral data.
Core concepts
Snowplow is built around the following core concepts:
- Events
- Dictionaries and schemas
- Contexts
- Iglu
- Stages in the Snowplow data pipeline
Setting up Snowplow
The process of setting up Snowplow consists of:
- Set up a collector;
- Set up a tracker or webhook;
- Set up enrich;
- Set up alternative data stores.
Iglu repository
An Iglu repository acts as a store of data schemas (Snowplow, currently JSON Schemas only).
Hosting JSON Schemas in an Iglu repository allows you to use those schemas in Iglu-capable systems such as Snowplow.
Enrich applications
A Snowplow Enrich application processes data from a Snowplow Collector, and stores enriched data in a persistent database.
There are currently two Enrichment processes available for setup:
- EmrEtlRunner An application that parses logs from a Collector and stores enriched events to S3
- Stream Enrich A Scala application that reads Thrift events from a Kinesis stream and outputs back to a Kinesis stream
EmrEtlRunner
Snowplow EmrEtlRunner is an application that parses the log files generated by your Snowplow collector and
- Cleans up the data into a format that is easier to parse / analyse
- Enriches the data (e.g. infers the location of the visitor from his / her IP address and infers the search engine keywords from the query string)
- Stores that cleaned, enriched data in S3
See:
Discourse forums
See:
http://discourse.snowplowanalytics.com/users/karl_jones/activity
See also
- Amazon Redshift - a hosted data warehouse product, which is part of the larger cloud computing platform Amazon Web Services.
- Amazon Web Services
- Iglu repository - a store of data schemas for Snowplow (software), currently (August 2016) supporting JSON Schemas only.
- JSON Schema - a JSON-based format to define the structure of JSON data for validation, documentation, and interaction control.
- Web application
External links
- Official website
- Configure the Scala Stream Collector - see Scala (programming language).
- General parameters for the Javascript tracker
- Snowplow: Warning: No tracker configured @ Stack Overflow - code example using callback.
- 2 Specific event tracking with the Javascript tracker v2.5
- 3 Advanced usage of the JavaScript Tracker