说明
从上一阶段的结果中返回一个非确定性样本。
支持两种模式:
DOCUMENTS模式允许对指定数量的文档进行抽样- 此模式与
GoogleSQL.RESERVOIR类似,它会输出大小为n的样本,其中,任何大小为n的样本出现的可能性都相同。
- 此模式与
PERCENT模式允许对指定百分比的文档进行抽样- 此模式与
GoogleSQL.BERNOULLI类似,每个文档都是以相同的percent概率独立选择的。这样一来,平均会返回#documents * percent / 100个文档。
- 此模式与
语法
Node.js
const sampled = await db.pipeline()
.database()
.sample(50)
.execute();
const sampled = await db.pipeline()
.database()
.sample({ percent: 0.5 })
.execute();
行为
文档模式
文档模式会以随机顺序检索指定数量的文档。
指定的数字必须是非负 INT64 值。
例如,对于以下集合:
Node.js
await db.collection('cities').doc('SF').set({name: 'San Francsico', state: 'California'});
await db.collection('cities').doc('NYC').set({name: 'New York City', state: 'New York'});
await db.collection('cities').doc('CHI').set({name: 'Chicago', state: 'Illinois'});
文档模式下的抽样阶段可用于从该集合中检索非确定性的结果子集。
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample(1)
.execute();
在此示例中,系统将仅随机返回 1 个文档。
{name: 'New York City', state: 'New York'}
如果提供的数字大于返回的文档总数,则会以随机顺序返回所有文档。
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample(5)
.execute();
这将生成以下文档:
{name: 'New York City', state: 'New York'}
{name: 'Chicago', state: 'Illinois'}
{name: 'San Francisco', state: 'California'}
其他示例
Web
let results; // Get a sample of 100 documents in a database results = await execute(db.pipeline() .database() .sample(100) ); // Randomly shuffle a list of 3 documents results = await execute(db.pipeline() .documents([ doc(db, "cities", "SF"), doc(db, "cities", "NY"), doc(db, "cities", "DC"), ]) .sample(3) );
Swift
var results: Pipeline.Snapshot // Get a sample of 100 documents in a database results = try await db.pipeline() .database() .sample(count: 100) .execute() // Randomly shuffle a list of 3 documents results = try await db.pipeline() .documents([ db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC"), ]) .sample(count: 3) .execute()
Kotlin
var results: Task<Pipeline.Snapshot> // Get a sample of 100 documents in a database results = db.pipeline() .database() .sample(100) .execute() // Randomly shuffle a list of 3 documents results = db.pipeline() .documents( db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC") ) .sample(3) .execute()
Java
Task<Pipeline.Snapshot> results; // Get a sample of 100 documents in a database results = db.pipeline() .database() .sample(100) .execute(); // Randomly shuffle a list of 3 documents results = db.pipeline() .documents( db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC") ) .sample(3) .execute();
Python
# Get a sample of 100 documents in a database results = client.pipeline().database().sample(100).execute() # Randomly shuffle a list of 3 documents results = ( client.pipeline() .documents( client.collection("cities").document("SF"), client.collection("cities").document("NY"), client.collection("cities").document("DC"), ) .sample(3) .execute() )
百分比模式
在百分比模式下,每个文档都有指定的 percent 返回几率。与文档模式不同,此处的顺序不是随机的,而是保留了预先存在的文档顺序。此百分比输入必须是介于 0.0 和 1.0 之间的双精度值。
由于每个文档都是独立选择的,因此输出是非确定性的,平均会返回 #documents * percent / 100 个文档。
例如,对于以下集合:
Node.js
await db.collection('cities').doc('SF').set({name: 'San Francsico', state: 'California'});
await db.collection('cities').doc('NYC').set({name: 'New York City', state: 'New York'});
await db.collection('cities').doc('CHI').set({name: 'Chicago', state: 'Illinois'});
await db.collection('cities').doc('ATL').set({name: 'Atlanta', state: 'Georgia'});
百分比模式下的抽样阶段可用于从集合阶段检索(平均)50% 的文档。
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample({ percent: 0.5 })
.execute();
这将生成一个非确定性样本,其中包含 cities 集合中(平均)50% 的文档。以下是一种可能的输出。
{name: 'New York City', state: 'New York'}
{name: 'Chicago', state: 'Illinois'}
在百分比模式下,由于每个文档被选中的概率相同,因此可能会返回零个文档或所有文档。
其他示例
Web
// Get a sample of on average 50% of the documents in the database const results = await execute(db.pipeline() .database() .sample({ percentage: 0.5 }) );
Swift
// Get a sample of on average 50% of the documents in the database let results = try await db.pipeline() .database() .sample(percentage: 0.5) .execute()
Kotlin
// Get a sample of on average 50% of the documents in the database val results = db.pipeline() .database() .sample(SampleStage.withPercentage(0.5)) .execute()
Java
// Get a sample of on average 50% of the documents in the database Task<Pipeline.Snapshot> results = db.pipeline() .database() .sample(SampleStage.withPercentage(0.5)) .execute();
Python
from google.cloud.firestore_v1.pipeline_stages import SampleOptions # Get a sample of on average 50% of the documents in the database results = ( client.pipeline().database().sample(SampleOptions.percentage(0.5)).execute() )